Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@

## 📣 News

- [04/12/2026] [**MiniMax-M2.5 / M2.7**](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/examples/models/minimax_m2) are now supported! Both models share the same architecture as MiniMax-M2 and work with the existing bridge out of the box — checkpoint conversion and inference verified on real FP8 checkpoints.

- [04/10/2026] [**Qwen3-ASR**](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/examples/models/audio_lm/qwen3_asr) is now supported! Checkpoint conversion and inference for [Qwen3's ASR model](https://github.com/QwenLM/Qwen3-ASR) are available on **main**.

- [04/09/2026] [**Bailing MoE V2**](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/examples/models/bailing) is now supported! Checkpoint conversion and inference for the Bailing MoE V2 model are available on **main**. Thank you to [@ccclyu](https://github.com/ccclyu) for the community contribution!
Expand Down Expand Up @@ -181,7 +183,7 @@ Megatron Bridge provides out-of-the-box bridges and training recipes for a wide
- [Mamba](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/src/megatron/bridge/models/mamba)
- [Ministral](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/src/megatron/bridge/models/ministral3) — [recipes (3B/8B/14B)](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/ministral3/ministral3.py)
- [Mistral](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/src/megatron/bridge/models/mistral)
- [MiniMax-M2](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/src/megatron/bridge/models/minimax_m2)
- [MiniMax-M2 / M2.5 / M2.7](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/src/megatron/bridge/models/minimax_m2)
- [Moonlight](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/src/megatron/bridge/models/deepseek) — [recipes (16B)](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/moonlight/moonlight_16b.py)
- [OlMoE](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/src/megatron/bridge/models/olmoe) — [recipes (7B)](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/olmoe/olmoe_7b.py)
- [Qwen2 / Qwen2.5](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/src/megatron/bridge/models/qwen) — [recipes](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/qwen/qwen2.py)
Expand Down
1 change: 1 addition & 0 deletions docs/models/llm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ Megatron Bridge supports the following LLM families:
| **Gemma 3** | [gemma3.md](gemma3.md) | Google Gemma 3 models |
| **GLM-4.5** | [glm45.md](glm45.md) | GLM-4.5 model family |
| **GPT-OSS** | [gpt-oss.md](gpt-oss.md) | Open-source GPT-style models |
| **MiniMax-M2** | — | MiniMax-M2 / M2.5 / M2.7 (456B MoE, FP8) |
| **LLaMA 3** | [llama3.md](llama3.md) | Meta LLaMA 3 models |
| **LLaMA Nemotron** | [llama-nemotron.md](llama-nemotron.md) | NVIDIA LLaMA Nemotron models |
| **Mistral** | [mistral.md](mistral.md) | Mistral AI models |
Expand Down
2 changes: 2 additions & 0 deletions examples/models/minimax_m2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

This directory contains example scripts for [MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2), a large sparse MoE model with 456B total parameters (45.9B active), 256 experts, and FP8 quantization.

> **M2.5 / M2.7 compatibility:** [MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) and [MiniMax-M2.7](https://huggingface.co/MiniMaxAI/MiniMax-M2.7) share the same architecture (`MiniMaxM2ForCausalLM`) and work with the same bridge. Replace the model ID in the scripts below (e.g. `MiniMaxAI/MiniMax-M2.5`).

## Hardware Requirements

MiniMax-M2 requires **at least 2 nodes (16 GPUs)** for inference and conversion. The model cannot fit on a single 8-GPU node because:
Expand Down
3 changes: 3 additions & 0 deletions src/megatron/bridge/models/minimax_m2/minimax_m2_bridge.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,9 @@ class MiniMaxM2Bridge(MegatronModelBridge):
"""
Megatron Bridge for MiniMax-M2 MoE Causal LM.
Also supports MiniMax-M2.5 and MiniMax-M2.7, which share the same
``model_type`` (``minimax_m2``) and ``MiniMaxM2ForCausalLM`` architecture.
MiniMax-M2 is a sparse MoE model (256 experts, top-8 routing with sigmoid
scoring and expert bias correction). Use the native transformers >= 5.0
implementation (no ``trust_remote_code`` required).
Expand Down
Loading