Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@

## 📣 News

- [04/12/2026] [**MiniMax-M2.5 / M2.7**](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/examples/models/minimax_m2) are now supported! Both models share the same architecture as MiniMax-M2 and work with the existing bridge out of the box — checkpoint conversion and inference verified on real FP8 checkpoints.

- [04/09/2026] [**Bailing MoE V2**](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/examples/models/bailing) is now supported! Checkpoint conversion and inference for the Bailing MoE V2 model are available on **main**. Thank you to [@ccclyu](https://github.com/ccclyu) for the community contribution!

- [04/07/2026] Megatron Bridge’s PEFT support was featured at [PyTorch Conference Europe 2026 Talk](https://pytorchconferenceeu2026.sched.com/event/2Juce/optimizing-reinforcement-learning-at-trillion-parameter-scale-songlin-jiang-aalto-university-mind-lab).
Expand Down Expand Up @@ -179,7 +181,7 @@ Megatron Bridge provides out-of-the-box bridges and training recipes for a wide
- [Mamba](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/src/megatron/bridge/models/mamba)
- [Ministral](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/src/megatron/bridge/models/ministral3) — [recipes (3B/8B/14B)](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/ministral3/ministral3.py)
- [Mistral](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/src/megatron/bridge/models/mistral)
- [MiniMax-M2](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/src/megatron/bridge/models/minimax_m2)
- [MiniMax-M2 / M2.5 / M2.7](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/src/megatron/bridge/models/minimax_m2)
- [Moonlight](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/src/megatron/bridge/models/deepseek) — [recipes (16B)](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/moonlight/moonlight_16b.py)
- [OlMoE](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/src/megatron/bridge/models/olmoe) — [recipes (7B)](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/olmoe/olmoe_7b.py)
- [Qwen2 / Qwen2.5](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/src/megatron/bridge/models/qwen) — [recipes](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/qwen/qwen2.py)
Expand Down
1 change: 1 addition & 0 deletions docs/models/llm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ Megatron Bridge supports the following LLM families:
| **Gemma 3** | [gemma3.md](gemma3.md) | Google Gemma 3 models |
| **GLM-4.5** | [glm45.md](glm45.md) | GLM-4.5 model family |
| **GPT-OSS** | [gpt-oss.md](gpt-oss.md) | Open-source GPT-style models |
| **MiniMax-M2** | — | MiniMax-M2 / M2.5 / M2.7 (456B MoE, FP8) |
| **LLaMA 3** | [llama3.md](llama3.md) | Meta LLaMA 3 models |
| **LLaMA Nemotron** | [llama-nemotron.md](llama-nemotron.md) | NVIDIA LLaMA Nemotron models |
| **Mistral** | [mistral.md](mistral.md) | Mistral AI models |
Expand Down
2 changes: 2 additions & 0 deletions examples/models/minimax_m2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

This directory contains example scripts for [MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2), a large sparse MoE model with 456B total parameters (45.9B active), 256 experts, and FP8 quantization.

> **M2.5 / M2.7 compatibility:** [MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) and [MiniMax-M2.7](https://huggingface.co/MiniMaxAI/MiniMax-M2.7) share the same architecture (`MiniMaxM2ForCausalLM`) and work with the same bridge. Replace the model ID in the scripts below (e.g. `MiniMaxAI/MiniMax-M2.5`).

## Hardware Requirements

MiniMax-M2 requires **at least 2 nodes (16 GPUs)** for inference and conversion. The model cannot fit on a single 8-GPU node because:
Expand Down
3 changes: 3 additions & 0 deletions src/megatron/bridge/models/minimax_m2/minimax_m2_bridge.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,9 @@ class MiniMaxM2Bridge(MegatronModelBridge):
"""
Megatron Bridge for MiniMax-M2 MoE Causal LM.
Also supports MiniMax-M2.5 and MiniMax-M2.7, which share the same
``model_type`` (``minimax_m2``) and ``MiniMaxM2ForCausalLM`` architecture.
MiniMax-M2 is a sparse MoE model (256 experts, top-8 routing with sigmoid
scoring and expert bias correction). Use the native transformers >= 5.0
implementation (no ``trust_remote_code`` required).
Expand Down
Loading