Fix/nnx sharding by hsuan-lun-chiang · Pull Request #4199 · AI-Hypercomputer/maxtext

hsuan-lun-chiang · 2026-06-18T02:43:10Z

Description

Start with a short description of what the PR does and how this is a change from
the past.

The rest of the description includes relevant details and context, examples:

why is this change being made,
the problem being solved and any relevant context,
why this is a good solution,
some information about the specific implementation,
shortcomings of the solution and possible future improvements.

If the change fixes a bug or a Github issue, please include a link, e.g.,:
FIXES: b/123456
FIXES: #123456

You can also provide a comma-separated list. If you don't want to close a bug but
simply to reference it, use BUGS, e.g.:
BUGS: b/123456

Notice 1: Once all tests pass, the "pull ready" label will automatically be assigned.
This label is used for administrative purposes. Please do not add it manually.

Notice 2: For external contributions, our settings currently require an approval from a MaxText maintainer to trigger CI tests.

Tests

Please describe how you tested this change, and include any instructions and/or
commands to reproduce.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

google-cla · 2026-06-18T02:43:26Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

PR6-PR10 promoted every routed-to-Linen feature to NNX-native. This PR flips the three defaults in base.yml so NNX is the production path, pins Linen-coupled tests so the flip doesn't silently swap their backend, and bundles the NNX-only fixes that surface once pure_nnx=True (DiLoCo merge/checkpoint, Zero-1 input shardings on flat nnx.State, MTP sown-Variable handling, generate_param_only_checkpoint NNX flow, maxengine Linen-parity removal). NNX pipeline parallelism deferred to PR11.5; train_compile fails fast under pure_nnx=True with pipeline configured.

codecov · 2026-06-18T03:03:29Z

Codecov Report

❌ Patch coverage is 51.02041% with 72 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/maxtext/trainers/pre_train/train.py	8.10%	30 Missing and 4 partials ⚠️
src/maxtext/common/checkpointing.py	35.71%	8 Missing and 1 partial ⚠️
src/maxtext/trainers/diloco/diloco.py	75.00%	4 Missing and 4 partials ⚠️
...axtext/trainers/post_train/sft/train_sft_native.py	64.28%	3 Missing and 2 partials ⚠️
src/maxtext/checkpoint_conversion/to_maxtext.py	0.00%	4 Missing ⚠️
src/maxtext/utils/train_utils.py	33.33%	3 Missing and 1 partial ⚠️
src/maxtext/utils/sharding.py	78.57%	1 Missing and 2 partials ⚠️
src/maxtext/layers/nnx_wrappers.py	60.00%	2 Missing ⚠️
...rc/maxtext/utils/generate_param_only_checkpoint.py	80.00%	1 Missing and 1 partial ⚠️
src/maxtext/checkpoint_conversion/utils/utils.py	0.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

hsuan-lun-chiang force-pushed the fix/NNX-Sharding branch from 9941994 to 4229e4f Compare June 18, 2026 02:55

ecnal-cienet and others added 2 commits June 18, 2026 02:59

Fix NNX parameter sharding bug

14a049e

hsuan-lun-chiang force-pushed the fix/NNX-Sharding branch from 4229e4f to 14a049e Compare June 18, 2026 02:59

hsuan-lun-chiang and others added 3 commits June 18, 2026 07:19

Fix TestNNXAbstractState

68194b9

Fix NNX parameter sharding bug (complete fix)

85ba4bf

Fix TrainCompile Unit test

fc53640

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/nnx sharding#4199

Fix/nnx sharding#4199
hsuan-lun-chiang wants to merge 5 commits into
AI-Hypercomputer:mainfrom
CIeNET-International:fix/NNX-Sharding

hsuan-lun-chiang commented Jun 18, 2026

Uh oh!

google-cla Bot commented Jun 18, 2026

Uh oh!

codecov Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hsuan-lun-chiang commented Jun 18, 2026

Description

Tests

Checklist

Uh oh!

google-cla Bot commented Jun 18, 2026

Uh oh!

codecov Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Jun 18, 2026 •

edited

Loading