12 Jun 22:17

SurbhiJainUSC

3f36aef

maxtext-v0.2.3 Latest

Latest

Changes

Upgraded JAX to version 0.10.0 for pre-training and 0.10.1 for post-training.
New vLLM-Powered Evaluation Framework: Introduced an eval framework for running lm-eval, evalchemy, and custom benchmarking against MaxText checkpoints. See the evaluation guide for details.
Added support for pre-training new models:
- Qwen3.5: Qwen3.5 35B & 397B is now supported.
- Qwen3-Omni: Support for multimodal SFT (PR #3863).
Direct Preference Optimization (DPO/ORPO) Support: Full support for DPO and ORPO alignment pipelines. See the DPO tutorial for details.
Reinforcement Learning (RL) Recipe: Added a pre-configured RL recipe for Qwen3-30b-a3b.
Iterative Quality Monitoring (RL): Added intermediate evaluation hooks to automatically run quality benchmarks during RL training (every eval_interval steps), optimized with a new eval_batch_size configuration knob.
Developer Extensibility: Added dataset_processor_path CLI knob for custom dataset integration, and refactored shared post-training hooks to simplify custom SFT, DPO, and RL workflow development.
Generalized Learn-to-Init (LTI) for Distillation: Enhanced post-training distillation capabilities with generalized LTI support.
Added support for recording elastic goodput events during training to track efficiency (PR #3901).
Installation Updates: Updated the [tpu-post-train] installation command to require UV_TORCH_BACKEND=cpu(see Installation Guide).
Zero1 AOT Compilation: Added zero1 support to Ahead-Of-Time (AOT) compilation in train compile, improving compilation capabilities for zero1 config.
MoE Performance Optimization: Integrated ragged gather reduce into Mixture of Experts (MoE) layers to optimize memory and performance by replacing ragged scatter and supporting backward pass.
Added E2E scripts to run checkpoint conversion, pre-training and post-training (SFT, RL) with Gemma3-4B model.
Bug Fixes and Usability Enhancements:
- Attention Masking Fix in RL: Fixed an issue in TunixMaxTextAdapter where queries at non-pad positions could attend to pad-position keys during training, which was corrupting log-probabilities and affecting GRPO training reward trajectories (PR #4016).
- JAX/NNX Gradient Mutation Fix: Refactored post-training loops (train_distill, train_sft, train_rl) to use jax.value_and_grad with explicit NNX state split/merge instead of nesting nnx.value_and_grad inside nnx.jit (PR #3652).
- Qwen3-MoE Checkpoint Conversion: Fixed checkpoint conversion issues for Qwen3-MoE models (PR #3868).
- Duplicate Configuration Failures Fix: Allowed identical config overrides and handled configuration exceptions cleanly (PR #3933).
Documentation Improvements: Updated Getting started guide, including new guides for the evaluation framework and the DPO tutorial.

Deprecations

Deleted legacy DPO implementation in favor of the integrated DPO trainer.
Removed stack trace collection feature.

Assets 2

08 May 18:03

SurbhiJainUSC

maxtext-v0.2.2

348355c

maxtext-v0.2.2

Changes

Upgraded JAX to version 0.9.2, improving support for both pre-training and post-training.
Introduced simplified APIs for accessing MaxText models.
Included maxtext_with_gepa.ipynb, a new notebook demonstrating AIME prompt optimization using the GEPA framework within MaxText.
Added support for Kimi-K2 models and the MuonClip optimizer. Users can explore this with the kimi-k2-1t config (see user guide for details).
Kimi-K2-Thinking, Kimi-K2.5 (text), and Kimi-K2.6 (text) are now supported. See Run_Kimi.md for details.
DeepSeek-V3.2 is now supported, including DeepSeek Sparse Attention for handling long contexts. Use the deepseek3.2-671b config to try it out (refer to the user guide for more information).
Support has been added for Gemma 4 multi-modal models (26B MoE and 31B dense). These can be used with the gemma4-26b and gemma4-31b configs. See Run_Gemma4.md for further details.
Support has been added for Gemma 4 inference using MaxText on vLLM plugin.
Enhanced RL capabilities with support for the open-r1/OpenR1-Math-220k dataset and nvidia/OpenMathReasoning.
Added more evaluation modes for RL like majority voting and pass@1 estimation.
Sync weights to vllm prior to pre RL evaluation.
More robust usage of math-verify in RL.
MaxText's Supervised Fine-Tuning (SFT) now supports non-instruct models.
Added support for tensor parallelism using the Fused MoE kernel for MaxText on vLLM inference.
Added support for MaxText to vllm converters for Qwen3 and Gemma4 family of models.
validate_converter.py now runs on multislice environment to test larger models with utilities to compare maxtext and vllm weights.

Deprecations

Legacy MaxText.* shims have been removed. Please refer to src/MaxText/README.md for details on the new command locations and how to migrate.
Sequence parallelism has been deprecated, please use context parallelism instead.
The flag expert_shard_attention_option is deprecated, use custom_mesh_and_rule=ep-as-cp for the same functionality.

Assets 2

23 Mar 22:05

SurbhiJainUSC

maxtext-v0.2.1

61fa4f3

maxtext-v0.2.1

Use the new maxtext[runner] installation option to build Docker images without cloning the repository. This can be used for scheduling jobs through XPK. See the MaxText installation instructions for more info.
Config can now be inferred for most MaxText commands. If you choose not to provide a config, MaxText will now select an appropriate one.
Configs in MaxText PyPI will now be picked up without storing them locally.
New features from DeepSeek-AI are now supported: Conditional Memory via Scalable Lookup (Engram) and Manifold-Constrained Hyper-Connections (mHC). Try them out with our deepseek-custom starter config.
MaxText now supports customizing your own mesh and logical rules. Two examples guiding how to use your own mesh and rules for sharding are provided in the custom_mesh_and_rule directory.

Assets 2

06 Mar 07:15

bvandermoon

maxtext-v0.2.0

77edafe

maxtext-v0.2.0

Changes

Qwen3-Next is now supported.
New tpu-post-train target in PyPI. Please also use this installation option for running vllm_decode. See the MaxText installation instructions for more info.
New MaxText structure! MaxText has been restructured according to RESTRUCTURE.md. Please feel free to share your thoughts and feedback.
Muon optimizer is now supported.
DeepSeek V3.1 is now supported. Use existing configs for DeepSeek V3 671B and load in V3.1 checkpoint to use model.
New RL and SFT Notebook tutorials are available.
The ReadTheDocs documentation site has been reorganized.
Multi-host support for GSPO and GRPO is now available via new RL tutorials.
A new guide, What is Post Training in MaxText?, is now available.
Ironwood TPU co-designed AI stack announced. Read the blog post on its co-design with MaxText.
Optimized models tiering documentation has been refreshed.
Added Versioning. Check out our first set of release notes!
Post-Training (SFT, RL) via Tunix is now available.
Vocabulary tiling (PR) is now supported in MaxText! Adjust config num_vocab_tiling to unlock more efficient memory usage.
The GPT-OSS family of models (20B, 120B) is now supported.

Deprecations

Many MaxText modules have changed locations. Core commands like train, decode, sft, etc. will still work as expected temporarily. Please update your commands to the latest file locations
install_maxtext_github_deps installation script replaced with install_maxtext_tpu_github_deps
tools/setup/setup_post_training_requirements.sh for post training dependency installation is deprecated in favor of pip installation

Assets 2

30 Dec 21:33

SurbhiJainUSC

maxtext-tutorial-v1.5.0

998b3e3

maxtext-tutorial-v1.5.0

Merge pull request #2898 from AI-Hypercomputer:tests_docker_image

PiperOrigin-RevId: 850456883

Assets 2

12 Dec 19:49

RissyRan

maxtext-tutorial-v1.4.0

79eecc9

maxtext-tutorial-v1.4.0

Assets 2

20 Nov 07:19

RissyRan

maxtext-tutorial-v1.3.0

05abc90

maxtext-tutorial-v1.3.0

Merge pull request #2706 from AI-Hypercomputer:mohit/tokamax_quant_gmm

PiperOrigin-RevId: 834605168

Assets 2

14 Nov 21:00

shralex

maxtext-tutorial-v1.2.0

3c0fe16

maxtext-tutorial-v1.2.0: Merge pull request #2676 from AI-Hypercomputer:pypi_release

PiperOrigin-RevId: 832378885

Assets 2

25 Oct 03:54

Obliviour

maxtext-tutorial-v1.1.0

a8499dd

Recipe Branch for TPU performance results

Merge pull request #2539 from AI-Hypercomputer:qinwen/latest-tokamax

PiperOrigin-RevId: 823749360

Assets 2

24 Oct 01:25

shralex

maxtext-tutorial-v1.0.0

5b01873

maxtext-tutorial-v1.0.0

Merge pull request #2538 from AI-Hypercomputer:mohit/fix_docker

PiperOrigin-RevId: 822796389

Assets 2

Releases: AI-Hypercomputer/maxtext

maxtext-v0.2.3

Changes

Deprecations

Uh oh!

maxtext-v0.2.2

Changes

Deprecations

Uh oh!

maxtext-v0.2.1

Uh oh!

maxtext-v0.2.0

Changes

Deprecations

Uh oh!

maxtext-tutorial-v1.5.0

Uh oh!

maxtext-tutorial-v1.4.0

Uh oh!

maxtext-tutorial-v1.3.0

Uh oh!

maxtext-tutorial-v1.2.0: Merge pull request #2676 from AI-Hypercomputer:pypi_release

Uh oh!

Recipe Branch for TPU performance results

Uh oh!

maxtext-tutorial-v1.0.0

Uh oh!