Expose Cmlx product + enable jaccl distributed backend by anupsv · Pull Request #2 · Layr-Labs/mlx-swift

anupsv · 2026-05-21T02:00:32Z

Summary

Enables the Swift package layer of MLX to compile the mlx-c/mlx/c/distributed*.cpp C wrappers and the jaccl backend, and exposes the existing Cmlx target as a public library product. This lets downstream Swift consumers (e.g. d-inference provider-swift) link against the C distributed-group API and use jaccl for Mac-to-Mac cluster pipeline inference over Thunderbolt 5.

Upstream context (worth reading before merging)

This change deviates from upstream's stance — the original Package.swift comment reads // do not build distributed support (yet). The "(yet)" tracks ml-explore/mlx-swift#371 ("Add distributed communication framework for multi-device tensor parallelism", open since 2026-03-15), which un-excludes the same files and adds polished Swift bindings (DistributedGroup, MLXDistributed.allSum / .send / .recv / etc., plus sharded NN layers).

We're enabling the backend on our fork ahead of ml-explore#371 merging because the d-inference cluster work already depends on it (commit d1266de3 in d-inference, the encrypted pipeline inference stack). If/when ml-explore#371 lands upstream, we'll rebase our fork onto the upstream API and drop any local Swift bindings that overlap.

Known jaccl issues we accept

Issue	Impact for our use
ml-explore/mlx#3149 — JACCL point-to-point send/recv with varying shape produces wrong data or hangs	Low risk for our shape. d-inference uses jaccl only for small collective-op synchronization; activation/token transfer goes over plain TCP with AES-256-GCM, not jaccl.
ml-explore/mlx#3467 — RTR failure on Apple Thunderbolt RDMA, GID-selection regression	Track upstream fix; affects connection setup.
ml-explore/mlx#3442 — `backend="any"` picks ring instead of jaccl	We invoke jaccl explicitly, not via "any".

What's in this PR

File	Change
`Package.swift`	Un-exclude `mlx-c/mlx/c/distributed.cpp` and `distributed_group.cpp` (the C API wrappers that back `mlx_c_distributed_group_*`); un-exclude jaccl backend sources (`jaccl.cpp`, `mesh.cpp`, `ring.cpp`, `utils.cpp`) and exclude the `no_jaccl.cpp` stub instead; add `.library(name: "Cmlx", targets: ["Cmlx"])` to the `products:` list. Other backends (mpi, ring, nccl) remain excluded.
`Source/Cmlx/include-framework/Cmlx.h`	Surface `mlx-c-distributed_group.h` and `mlx-c-distributed.h` in the Cmlx umbrella header so the distributed-group symbols are reachable when downstream code imports `Cmlx`.

Net: 2 files, +7 / -8 lines. No behavior change for existing MLX, MLXNN, MLXRandom, MLXFast, MLXOptimizers, MLXFFT, MLXLinalg consumers. Build size grows by the jaccl sources (~600 LOC of C++).

Test plan

swift build succeeds on macOS 14+ (Apple Silicon)
Existing MLX consumers compile against this package (verified via d-inference provider-swift)
Downstream import of Cmlx + mlx_c_distributed_group_* calls succeed in d-inference's ClusterDiscovery / MLXDistributed modules
Two-Mac Thunderbolt 5 smoke test runs jaccl collective op without hitting #3467

Picks up Layr-Labs/mlx-swift#2 which exposes the Cmlx library product and enables the jaccl distributed backend that provider-swift's ClusterSession / EncryptedPipelineInference depend on. Without this bump, CI's fresh `swift build -c debug` fails with: product 'Cmlx' required by package 'provider-swift' target 'ProviderCore' not found in package 'mlx-swift'. Tracking issue for the upstream deviation: #193. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Downstream consumers (e.g. d-inference's provider-swift) need direct access to the underlying mlx-c symbols for distributed-group setup and custom collective ops. This commit: 1. Adds `Cmlx` to the package's library products so callers can declare `.product(name: "Cmlx", package: "mlx-swift")`. The target already exists; only the public product entry was missing. 2. Enables the jaccl distributed backend by un-excluding its source files (jaccl.cpp, mesh.cpp, ring.cpp, utils.cpp) and excluding the `no_jaccl.cpp` stub instead. jaccl is the Apple-Silicon-friendly collective backend used for cluster pipeline inference over Thunderbolt 5; the other distributed backends (mpi, ring, nccl) remain excluded since they're not supported on macOS. 3. Surfaces `mlx-c-distributed_group.h` and `mlx-c-distributed.h` in the umbrella `Cmlx.h` so the C distributed-group API is reachable from Swift via the Cmlx module. No behavior change for existing consumers — MLX, MLXNN, MLXRandom etc. continue to work exactly as before. Build size grows by the jaccl sources (~600 LOC of C++).

Picks up Layr-Labs/mlx-swift#2 which exposes the Cmlx library product and enables the jaccl distributed backend that provider-swift's ClusterSession / EncryptedPipelineInference depend on. Without this bump, CI's fresh `swift build -c debug` fails with: product 'Cmlx' required by package 'provider-swift' target 'ProviderCore' not found in package 'mlx-swift'. Tracking issue for the upstream deviation: #193.

Force-pushes on Layr-Labs/mlx-swift#2 and Layr-Labs/mlx-swift-lm#24 landed new SHAs (fa6a4e8, c2fbbdc) — bump the submodule pointers to match.

anupsv requested a review from Gajesh2007 May 21, 2026 02:02

anupsv mentioned this pull request May 21, 2026

Track upstream mlx-swift distributed-backend status (jaccl deviation) Layr-Labs/d-inference#193

Open

anupsv mentioned this pull request May 21, 2026

Add Llama callPartial for pipeline-parallel inference Layr-Labs/mlx-swift-lm#24

Open

3 tasks

anupsv force-pushed the feat/cmlx-jaccl-distributed branch from edefe35 to fa6a4e8 Compare May 21, 2026 04:21

This was referenced May 21, 2026

Port sharded linear primitives + DistributedGroup from upstream #371 #3

Open

Add LlamaModelTP: tensor-parallel variant of LlamaModel Layr-Labs/mlx-swift-lm#25

Open

feat(cluster): TP-default dispatch + TensorParallelEngine scaffold Layr-Labs/d-inference#194

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose Cmlx product + enable jaccl distributed backend#2

Expose Cmlx product + enable jaccl distributed backend#2
anupsv wants to merge 1 commit into
mainfrom
feat/cmlx-jaccl-distributed

anupsv commented May 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anupsv commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Upstream context (worth reading before merging)

Known jaccl issues we accept

What's in this PR

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

anupsv commented May 21, 2026 •

edited

Loading