[AMD] Add MiniMax-M3-MXFP4 MI355X vLLM disagg recipe by Duyi-Wang · Pull Request #1912 · SemiAnalysisAI/InferenceX

Duyi-Wang · 2026-06-24T05:39:16Z

Disaggregated (prefill/decode) vLLM recipe for amd/MiniMax-M3-MXFP4 on MI355X over the MoRI-IO KV connector.

Recipe

benchmarks/multi_node/minimaxm3_fp4_mi355x_vllm-disagg.sh: launcher.
models_vllm.yaml: MiniMax-M3-MXFP4 entry. --block-size 128 (MSA), TRITON_ATTN, --language-model-only, AITER MoE, minimax_m3 parsers, --max-num-seqs 512.
amd-master.yaml: minimaxm3-fp4-mi355x-vllm-disagg config, 8k1k, two TP4 layouts (1P1D and 2P1D), conc 1..512.

Supporting fixes to the shared vllm-disagg path

server_vllm.sh: count prefill/decode GPUs from the per-worker TP size (PREFILL_TP_SIZE*xP / DECODE_TP_SIZE*yD) instead of GPUS_PER_NODE*xP. With TP < node GPU count (e.g. TP4 on an 8-GPU node) the old expression over-counted, corrupting PREFILL_GPUS/DECODE_GPUS and halving tput_per_gpu.
env.sh / job.slurm: set the MoRI-IO RDMA QP knobs (MORI_IO_QP_MAX_SEND_WR etc.) for the vllm-disagg path. They were only set in the SGLang branch, so vllm-disagg ran at the default send-queue depth and stalled at high concurrency ("SQ full"). Injected via docker -e so they reach the vLLM worker processes.

Disaggregated (prefill/decode) vLLM recipe for amd/MiniMax-M3-MXFP4 on MI355X over the MoRI-IO KV connector. Recipe: - benchmarks/multi_node/minimaxm3_fp4_mi355x_vllm-disagg.sh: launcher. - models_vllm.yaml: MiniMax-M3-MXFP4 entry. block-size 128 (MSA), TRITON_ATTN, --language-model-only, AITER MoE, minimax_m3 parsers. No --kv-cache-dtype fp8 (the checkpoint ships no calibrated FP8 KV scales). - amd-master.yaml: minimaxm3-fp4-mi355x-vllm-disagg config, 8k1k, two layouts (1P1D TP4 and 2P1D TP4), conc 1..512. Supporting fixes to the shared vllm-disagg path: - server_vllm.sh: count prefill/decode GPUs from the per-worker TP size (PREFILL_TP_SIZE*xP / DECODE_TP_SIZE*yD) instead of GPUS_PER_NODE*xP. With TP < node GPU count (e.g. TP4 on an 8-GPU node) the old expression over-counted, corrupting PREFILL_GPUS/DECODE_GPUS and halving tput_per_gpu. - env.sh / job.slurm: set the MoRI-IO RDMA QP knobs (MORI_IO_QP_MAX_SEND_WR etc.) for the vllm-disagg path. They were only set in the SGLang branch, so vllm-disagg ran at the default send-queue depth and stalled at high concurrency ("SQ full").

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Duyi-Wang · 2026-06-24T06:22:50Z

Superseded by #1914 (re-opened from a branch in this repo instead of the fork).

Duyi-Wang requested a review from a team June 24, 2026 05:39

Duyi-Wang requested review from 1am9trash, billishyahao, chunfangamd, seungrokj and yctseng0211 as code owners June 24, 2026 05:39

github-project-automation Bot added this to InferenceMAX Board Jun 24, 2026

claude Bot reviewed Jun 24, 2026

View reviewed changes

Merge branch 'main' into feat/minimaxm3-fp4-mi355x-vllm-disagg

685083e

Duyi-Wang closed this Jun 24, 2026

github-project-automation Bot moved this to Done in InferenceMAX Board Jun 24, 2026

Duyi-Wang deleted the feat/minimaxm3-fp4-mi355x-vllm-disagg branch June 24, 2026 06:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMD] Add MiniMax-M3-MXFP4 MI355X vLLM disagg recipe#1912

[AMD] Add MiniMax-M3-MXFP4 MI355X vLLM disagg recipe#1912
Duyi-Wang wants to merge 2 commits into
SemiAnalysisAI:mainfrom
Duyi-Wang:feat/minimaxm3-fp4-mi355x-vllm-disagg

Duyi-Wang commented Jun 24, 2026 •

edited

Loading

Uh oh!

claude Bot left a comment

Uh oh!

Duyi-Wang commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Duyi-Wang commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Recipe

Supporting fixes to the shared vllm-disagg path

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

Duyi-Wang commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Duyi-Wang commented Jun 24, 2026 •

edited

Loading