Deprecate Float-defaulted Factory functions (zeros/ones/eye/identity/tri)#408
Closed
john-rocky wants to merge 1 commit into
Closed
Deprecate Float-defaulted Factory functions (zeros/ones/eye/identity/tri)#408john-rocky wants to merge 1 commit into
john-rocky wants to merge 1 commit into
Conversation
Closes ml-explore#390. Several `MLX.<factory>(...)` and `MLXArray.<factory>(...)` overloads silently default `type:` to `Float.self`, returning float32 arrays even when they participate in bfloat16 / float16 graphs (each implicit float32 value then triggers an AsType cast in MLX's C++ engine, with the bandwidth penalty described in the issue body). This change: 1. **Adds a deprecated no-`type:` overload** for each affected factory (`zeros`, `ones`, `eye`, `identity`, `tri`) on both `MLXArray` and the free function. The deprecated overload still returns float32, so source compatibility is preserved, but callers that elided the `type:` argument now see a `#DeprecatedDeclaration` warning pointing to the explicit `dtype:` / `type:` variants and to issue ml-explore#390. 2. **Removes `= Float.self` from the existing `type:` overloads.** Because the new no-type overload is more specific than the default-eliding variant, Swift's overload resolution prefers the deprecated one for `zeros(shape)` / `ones(shape)` etc. The `= Float.self` default on the type-explicit overload became unreachable, so the cleanup removes it without behavior change. Callers passing `type:` explicitly continue to compile silently. 3. **Removes the unreachable `= Float.self` default from `full(shape, values, type:)`.** `full(shape, values)` already exists and inherits the dtype of `values`, so the defaulted variant was never selected by overload resolution. 4. **Threads `dtype: .float32` through internal call sites in `MLXNN` (Convolution / ConvolutionTransposed / Normalization / Quantized) and `Examples/Tutorial`** so the in-tree build stays warning-free. These call sites previously relied on the silent default; tightening them is a no-op behaviorally but documents the dtype choice at the layer-construction site. What's intentionally *not* in this PR: - `arange` (`Double` overloads with `dtype: DType = .float32`): matches Python's behavior (`mx.arange(3.0) -> float32`) so leaving the default in place keeps parity. - `MLXRandom` factories (`uniform` / `normal` / `multivariateNormal` / `truncatedNormal` / `gumbel` / `laplace`): these need a follow-up PR; the existing call sites would otherwise produce a wave of warnings before the inference-from-low/high direction is decided. - Threading a dtype through `Linear` / `Conv*` / `*Norm` initialisers so that constructed weights / biases inherit a layer-wide dtype. That is the next logical step (the bandwidth issue surfaced in ml-explore/mlx-swift-lm#124), but it is a much larger surface change and is best discussed separately. Build verification: swift build # Build complete!, no ml-explore#390 warnings
Author
|
Apologies — I missed your existing draft #391 when I surveyed the issue tracker, and opened this on the same target. Closing in favor of #391. For what it's worth, the only piece in here that wasn't in #391 was the in-tree call-site cleanup in MLXNN ( Sorry for the noise. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #390.
Implements the path called out in the issue: factory functions whose dtype silently defaults to
Float.selfare now deprecated, while the existingdtype:/type:overloads are kept as the recommended replacements.API change
For each of
zeros,ones,eye,identity,tri(both `MLXArray.` and free `MLX.` forms):```swift
// Already existed — keep as the recommended explicit-dtype paths:
zeros(shape, dtype: .float32)
zeros(shape, type: Float.self)
// Existed but silently returned float32 — now deprecated:⚠️ #DeprecatedDeclaration
zeros(shape) //
```
The deprecated overload still returns float32, so the change is source-compatible. Callers that omitted the type argument get a Swift warning pointing them at the explicit-dtype form and at this issue.
A small mechanical cleanup also runs alongside:
In-tree callers tightened to keep the build warning-free
The deprecation surfaced silent float32 promotion at six existing internal sites in MLXNN and the tutorial example:
All become `...zeros([d], dtype: .float32)` / `...ones([d], dtype: .float32)`. No behavior change — the original implicit dtype was `.float32` — but the dtype is now visible at the layer-construction site, which is also a stepping-stone toward the broader "thread dtype through layer initializers" follow-up that #390 alludes to via ml-explore/mlx-swift-lm#124.
Intentional non-goals (separate PRs welcome)
Verification
```
swift build # Build complete!
# 0 #390-related warnings remain in-tree
```
`swift test` could not be run end-to-end on this machine: the SPM test target hits the pre-existing "Failed to load the default metallib" issue (#349) before any of my changes get exercised. Plain library / example builds succeed without warnings, and the changes are purely additive overloads plus default-argument removals, so there is no runtime path affected by this PR that isn't already covered by the build.
I'm happy to:
Whichever direction you prefer, this PR can be reshaped to match.