Add ragged sort kernel fallback mechanism and version guard by NuojCheng · Pull Request #4187 · AI-Hypercomputer/maxtext

NuojCheng · 2026-06-17T17:31:37Z

Description

This PR

Introduces two flags, ragged_gather_fallback and ragged_gather_reduce_fallback. When they are true, a fully JAX, non-ragged version instead of the kernel version gonna be used
Add version guard protecting ragged gather kernel.

Tests

Based on xprofs, both flags work effectively.

Fall back ragged gather reduce only

smoke_train model_name=deepseek2-16b ici_expert_parallelism=4 per_device_batch_size=1 max_target_length=4096 use_random_routing=true use_ring_of_experts=true use_ragged_sort=true ragged_gather_reduce_fallback=true debug_sharding=false profiler=xplane ragged_gather_fallback=false enable_tpu_profiling_options=true

xprof

Fall back ragged gather only

smoke_train model_name=deepseek2-16b ici_expert_parallelism=4 per_device_batch_size=1 max_target_length=4096 use_random_routing=true use_ring_of_experts=true use_ragged_sort=true ragged_gather_reduce_fallback=false debug_sharding=false profiler=xplane ragged_gather_fallback=true enable_tpu_profiling_options=true

xprof

Fall back both kernels

smoke_train model_name=deepseek2-16b ici_expert_parallelism=4 per_device_batch_size=1 max_target_length=4096 use_random_routing=true use_ring_of_experts=true use_ragged_sort=true ragged_gather_reduce_fallback=true debug_sharding=false profiler=xplane ragged_gather_fallback=true enable_tpu_profiling_options=true

xprof

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-06-17T17:36:00Z

Codecov Report

❌ Patch coverage is 93.33333% with 1 line in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/maxtext/kernels/ragged/ragged_gather_reduce.py	66.66%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

gobbleturk · 2026-06-18T17:15:08Z

-    return out
+  if sc_info is None or enforce_fallback:
+    # Sparse core is not available or fallback is enforced. Use JAX reference.
+    return _fallback_implementation(x, indices, weights, has_weights)


this is a bit surprising to see a non ragged fallback inside of a function called ragged_gather - should the decision to use ragged_gather vs a fallback happen at a higher level function (e.g. somewhere higher in the call stack?)

The new ragged_gather_fallback and ragged_gather_reduce_fallback flags control kernel-level fallback logic. We avoid using these flags directly in moe.py because we may need the native JAX implementation for the forward pass and the custom kernel for the backward pass. While we could theoretically use them in ragged_sort.py, a fallback mechanism already exists in the primary kernel wrapper. We leverage that existing structure instead of introducing redundant if-else conditions.

gobbleturk · 2026-06-18T17:15:41Z

    self._run_ragged_sort_loss_and_grad(use_ring_of_experts=True, ragged_buffer_factor=1.5)

+  @pytest.mark.tpu_only
+  @pytest.mark.skip_on_tpu7x


why skip on tpu7x?

I think there were some issues using ragged sort on bloom. @darisoy do we have a buganizer tracking this?

gobbleturk · 2026-06-18T17:16:21Z

+    weights: jax.Array | None = None,
+    has_weights: bool = False,
+) -> jax.Array:
+  """Fallback to JAX implementation for ragged gather."""


is this a ragged implementation? Doesn't this grow with the full buffer size or no?

There are only two options:

Jax sort, no raggedness

sparse core kernels, ragged
Update the comments to better reflect JAX is non-ragged.

Shuwen-Fang · 2026-06-18T17:19:18Z

                       # without `use_ring_of_experts` (with EP > 1). When `use_ring_of_experts=True` the kernels run
                       # inside `permute`/`unpermute`; otherwise they run inside `local_permute`/local-unpermute.
 use_gather_mosaic_kernel: false # whether to use a custom mosaic kernel for token gather ops
+ragged_gather_fallback: false # when true, unconditionally use the JAX reference implementation instead of the


could this just be:
ragged_gather: true --> use ragged kernel
ragged_gather:false --> use non ragged fallback

There are two operations:
1: ragged permute (fwd: ragged gather; bwd: ragged gather reduce)
2. ragged unpermute (fwd: ragged gather reduce, bwd: ragged gather)

Technically we can introduce 4 flags, controlling whether we want kernel version/JAX version respectively. However, it is probably not necessary. Instead, we have one flag use_ragged_sort indicating whether we want to use kernel version for all, and ragged_gather_fallback and ragged_gather_reduce_fallback controlling kernels respectively.

I think you are suggesting re-naming these two flags, with opposite meanings? I don't have strong opinion on this, but I can update if you think they are much better...

NuojCheng force-pushed the chengnuojin-ragged-guard branch 3 times, most recently from 74e1341 to eb7f846 Compare June 17, 2026 21:30

NuojCheng marked this pull request as ready for review June 17, 2026 23:20

NuojCheng force-pushed the chengnuojin-ragged-guard branch from eb7f846 to a0a7096 Compare June 17, 2026 23:31

gobbleturk reviewed Jun 18, 2026

View reviewed changes

Shuwen-Fang reviewed Jun 18, 2026

View reviewed changes

add ragged sort kernel fallback mechanism and version guard

e76b469

NuojCheng force-pushed the chengnuojin-ragged-guard branch from a0a7096 to e76b469 Compare June 19, 2026 00:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ragged sort kernel fallback mechanism and version guard#4187

Add ragged sort kernel fallback mechanism and version guard#4187
NuojCheng wants to merge 1 commit into
mainfrom
chengnuojin-ragged-guard

NuojCheng commented Jun 17, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 17, 2026 •

edited

Loading

Uh oh!

gobbleturk Jun 18, 2026

Uh oh!

NuojCheng Jun 18, 2026

Uh oh!

gobbleturk Jun 18, 2026

Uh oh!

NuojCheng Jun 18, 2026

Uh oh!

gobbleturk Jun 18, 2026

Uh oh!

NuojCheng Jun 18, 2026

Uh oh!

Shuwen-Fang Jun 18, 2026

Uh oh!

NuojCheng Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

NuojCheng commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Fall back ragged gather reduce only

Fall back ragged gather only

Fall back both kernels

Checklist

Uh oh!

codecov Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

NuojCheng commented Jun 17, 2026 •

edited

Loading

codecov Bot commented Jun 17, 2026 •

edited

Loading