Fix routing replay split sizes for attention by vivekkalyan · Pull Request #721 · OpenPipe/ART

vivekkalyan · 2026-06-05T17:29:48Z

Summary

preserve the full attention token layout when recording routing replay token UID sets
keep the compacted GDN token UID layout only for GDN replay
add a unit test that covers the differing attention-vs-GDN replay shapes

Why

The Megatron routing replay path could fail with split_with_sizes expects split_sizes to sum exactly ... when attention replay used compacted GDN token UID sets. Attention needs the original flattened token layout, while GDN uses the compact routed-token layout.

Validation

uv run --with torch --with safetensors --with megatron-core==0.17.0 --with transformers==5.2.0 --group dev pytest tests/unit/test_moe_routing_replay.py tests/unit/test_dedicated_config.py
Sky 2x H200 Bonnie Megatron repro against this fix completed 1 training step without the split-size crash
Stacked LoRA PR smoke run against this branch also completed 1 Megatron training step successfully

FurtherAI · 2026-06-06T06:54:36Z

Looks good, seems like a misconception from Codex and not caught by tests because they fill the packed seq and don't have padding.

fix: preserve attention layout for routing replay

85081af

vivekkalyan force-pushed the fix/routing-replay-split-sizes branch from 4834baf to 85081af Compare June 8, 2026 18:08

test: narrow routing replay token uid assertions

16cc810

vivekkalyan merged commit f8eaa6d into main Jun 8, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix routing replay split sizes for attention#721

Fix routing replay split sizes for attention#721
vivekkalyan merged 2 commits into
mainfrom
fix/routing-replay-split-sizes

vivekkalyan commented Jun 5, 2026

Uh oh!

FurtherAI commented Jun 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

vivekkalyan commented Jun 5, 2026

Summary

Why

Validation

Uh oh!

FurtherAI commented Jun 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants