Skip to content

Expose Cmlx product + enable jaccl distributed backend#2

Open
anupsv wants to merge 1 commit into
mainfrom
feat/cmlx-jaccl-distributed
Open

Expose Cmlx product + enable jaccl distributed backend#2
anupsv wants to merge 1 commit into
mainfrom
feat/cmlx-jaccl-distributed

Conversation

@anupsv
Copy link
Copy Markdown

@anupsv anupsv commented May 21, 2026

Summary

Enables the Swift package layer of MLX to compile the mlx-c/mlx/c/distributed*.cpp C wrappers and the jaccl backend, and exposes the existing Cmlx target as a public library product. This lets downstream Swift consumers (e.g. d-inference provider-swift) link against the C distributed-group API and use jaccl for Mac-to-Mac cluster pipeline inference over Thunderbolt 5.

Upstream context (worth reading before merging)

This change deviates from upstream's stance — the original Package.swift comment reads // do not build distributed support (yet). The "(yet)" tracks ml-explore/mlx-swift#371 ("Add distributed communication framework for multi-device tensor parallelism", open since 2026-03-15), which un-excludes the same files and adds polished Swift bindings (DistributedGroup, MLXDistributed.allSum / .send / .recv / etc., plus sharded NN layers).

We're enabling the backend on our fork ahead of ml-explore#371 merging because the d-inference cluster work already depends on it (commit d1266de3 in d-inference, the encrypted pipeline inference stack). If/when ml-explore#371 lands upstream, we'll rebase our fork onto the upstream API and drop any local Swift bindings that overlap.

Known jaccl issues we accept

Issue Impact for our use
ml-explore/mlx#3149 — JACCL point-to-point send/recv with varying shape produces wrong data or hangs Low risk for our shape. d-inference uses jaccl only for small collective-op synchronization; activation/token transfer goes over plain TCP with AES-256-GCM, not jaccl.
ml-explore/mlx#3467 — RTR failure on Apple Thunderbolt RDMA, GID-selection regression Track upstream fix; affects connection setup.
ml-explore/mlx#3442backend="any" picks ring instead of jaccl We invoke jaccl explicitly, not via "any".

What's in this PR

File Change
Package.swift Un-exclude mlx-c/mlx/c/distributed.cpp and distributed_group.cpp (the C API wrappers that back mlx_c_distributed_group_*); un-exclude jaccl backend sources (jaccl.cpp, mesh.cpp, ring.cpp, utils.cpp) and exclude the no_jaccl.cpp stub instead; add .library(name: "Cmlx", targets: ["Cmlx"]) to the products: list. Other backends (mpi, ring, nccl) remain excluded.
Source/Cmlx/include-framework/Cmlx.h Surface mlx-c-distributed_group.h and mlx-c-distributed.h in the Cmlx umbrella header so the distributed-group symbols are reachable when downstream code imports Cmlx.

Net: 2 files, +7 / -8 lines. No behavior change for existing MLX, MLXNN, MLXRandom, MLXFast, MLXOptimizers, MLXFFT, MLXLinalg consumers. Build size grows by the jaccl sources (~600 LOC of C++).

Test plan

  • swift build succeeds on macOS 14+ (Apple Silicon)
  • Existing MLX consumers compile against this package (verified via d-inference provider-swift)
  • Downstream import of Cmlx + mlx_c_distributed_group_* calls succeed in d-inference's ClusterDiscovery / MLXDistributed modules
  • Two-Mac Thunderbolt 5 smoke test runs jaccl collective op without hitting #3467

@anupsv anupsv requested a review from Gajesh2007 May 21, 2026 02:02
anupsv added a commit to Layr-Labs/d-inference that referenced this pull request May 21, 2026
Picks up Layr-Labs/mlx-swift#2 which exposes the Cmlx library product
and enables the jaccl distributed backend that provider-swift's
ClusterSession / EncryptedPipelineInference depend on.

Without this bump, CI's fresh `swift build -c debug` fails with:
  product 'Cmlx' required by package 'provider-swift' target 'ProviderCore'
  not found in package 'mlx-swift'.

Tracking issue for the upstream deviation: #193.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Downstream consumers (e.g. d-inference's provider-swift) need direct
access to the underlying mlx-c symbols for distributed-group setup and
custom collective ops. This commit:

1. Adds `Cmlx` to the package's library products so callers can declare
   `.product(name: "Cmlx", package: "mlx-swift")`. The target already
   exists; only the public product entry was missing.

2. Enables the jaccl distributed backend by un-excluding its source
   files (jaccl.cpp, mesh.cpp, ring.cpp, utils.cpp) and excluding the
   `no_jaccl.cpp` stub instead. jaccl is the Apple-Silicon-friendly
   collective backend used for cluster pipeline inference over
   Thunderbolt 5; the other distributed backends (mpi, ring, nccl) remain
   excluded since they're not supported on macOS.

3. Surfaces `mlx-c-distributed_group.h` and `mlx-c-distributed.h` in
   the umbrella `Cmlx.h` so the C distributed-group API is reachable
   from Swift via the Cmlx module.

No behavior change for existing consumers — MLX, MLXNN, MLXRandom etc.
continue to work exactly as before. Build size grows by the jaccl
sources (~600 LOC of C++).
anupsv added a commit to Layr-Labs/d-inference that referenced this pull request May 21, 2026
Picks up Layr-Labs/mlx-swift#2 which exposes the Cmlx library product
and enables the jaccl distributed backend that provider-swift's
ClusterSession / EncryptedPipelineInference depend on.

Without this bump, CI's fresh `swift build -c debug` fails with:
  product 'Cmlx' required by package 'provider-swift' target 'ProviderCore'
  not found in package 'mlx-swift'.

Tracking issue for the upstream deviation: #193.
@anupsv anupsv force-pushed the feat/cmlx-jaccl-distributed branch from edefe35 to fa6a4e8 Compare May 21, 2026 04:21
anupsv added a commit to Layr-Labs/d-inference that referenced this pull request May 21, 2026
Force-pushes on Layr-Labs/mlx-swift#2 and Layr-Labs/mlx-swift-lm#24
landed new SHAs (fa6a4e8, c2fbbdc) — bump the submodule pointers to
match.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant