[AMD] Add MiniMax-M3-FP4 MI355X ATOMESH update 0623 by seungrokj · Pull Request #1930 · SemiAnalysisAI/InferenceX

seungrokj · 2026-06-25T05:44:27Z

Summary

Eliminate all hardcoded MODEL_NAME == "DeepSeek-V4-Pro" / per-model checks from server_atom.sh
All model-specific configuration (env vars, parallel flags, MTP flags, KV cache flags, HF overrides) now driven from models_atom.yaml using the same python3 yaml.safe_load pattern as server_vllm.sh
Add MiniMax-M3-MXFP4 and MiniMax-M3-MXFP8 entries to models_atom.yaml with EAGLE3 MTP flags
Image bump for minimaxm3-fp8-mi355x-atom-disagg: rocm/atom-dev:MiniMax-M3-20260622 → rocm/atom-dev:MiniMax-M3-20260623

Fields added to `models_atom.yaml`

Field	Purpose
`env`	Space-separated `KEY=VALUE` pairs exported unconditionally
`tp_dp_flags`	Parallel flags for TP+DPA mode
`tp_dp_env`	Env vars exported only in TP+DPA mode
`ep_dp_flags`	Parallel flags for EP+DPA mode
`ep_dp_env`	Env vars exported only in EP+DPA mode
`mtp_flags`	Flags prepended to `SPEC_ARGS` before `$DECODE_MTP_SIZE`
`kv_cache_flags`	Full `--kv_cache_dtype` flag string
`hf_overrides`	JSON string passed to `--hf-overrides`

PR Review Checklist

Verified that as of the moment of typing this, this is the latest version of PR_REVIEW_CHECKLIST.md
Verified that the general code quality meets the InferenceX standard and does not make the code quality any worse.
Verified that this PR has passed PR validation. Please link to GitHub Action workflow that shows this.
Verified that this PR passes evals. Please link to GitHub Action workflow that shows this.
Verified that speculative decoding PRs uses chat templates to align the AL distribution to real world
If a company claims that they support vLLM/SGLang as first class LLM inference engines on their hardware, I have verified that the respective vLLM/SGLang submission has been made before additional frameworks (TRT-LLM, ATOM, etc.). The only exceptions are for new hardware, such as MI455X UALoE72, Vera Rubin NVL72, Rubin NVL8, etc., and for new model architectures where there is an actual reason why vLLM/SGLang does not fundamentally support them yet.
Verified that the single-node recipes are similar to the official vLLM recipes and/or the SGLang cookbook:
- If they are not, I have verified that a PR has been opened in vLLM recipe repo or SGLang repo and linked it below in the additional detail section:
If any of the above criteria cannot reasonably be satisfied, I have provided additional reasoning below.

🤖 Generated with Claude Code

…els_atom.yaml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…om.yaml-driven) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

claude · 2026-06-25T05:56:28Z

+# =============================================================================
+# Model-Specific Configuration from YAML
+# =============================================================================
+# Load model-specific config from YAML (single parse for all fields)
+eval "$(python3 -c "
+import yaml
+with open('${ATOM_WS_PATH}/models_atom.yaml') as f:
+    m = yaml.safe_load(f).get('${MODEL_NAME}', {})
+print(f'MODEL_ENVS=\"{m.get(\"env\", \"\")}\"')
+print(f'MODEL_TP_DP_FLAGS=\"{m.get(\"tp_dp_flags\", \"\")}\"')
+print(f'MODEL_EP_DP_FLAGS=\"{m.get(\"ep_dp_flags\", \"\")}\"')
+print(f'MODEL_TP_DP_ENV=\"{m.get(\"tp_dp_env\", \"\")}\"')
+print(f'MODEL_EP_DP_ENV=\"{m.get(\"ep_dp_env\", \"\")}\"')
+print(f'MODEL_MTP_FLAGS=\"{m.get(\"mtp_flags\", \"\")}\"')
+print(f'MODEL_KV_ARG=\"{m.get(\"kv_cache_flags\", \"\")}\"')


🔴 hf_overrides JSON quotes stripped by bash eval — breaks DeepSeek-V4-Pro startup. The Python f-string print(f'_HF_OVERRIDES="{m.get("hf_overrides", "")}"') interpolates the raw YAML value ({"use_index_cache":true,"index_topk_freq":4}) into an outer double-quoted bash assignment, so the inner " characters terminate the outer string. After eval, _HF_OVERRIDES holds {use_index_cache:true,index_topk_freq:4} — unquoted keys — and --hf-overrides '{use_index_cache:true,index_topk_freq:4}' is rejected as invalid JSON, blocking the documented Verify server_atom.sh launches correctly for DeepSeek-V4-Pro test plan item. Fix: emit the value with shlex.quote() (or json.dumps) in the Python so the JSON survives the bash eval intact.

Extended reasoning...

What the bug is

The new YAML-driven loader at server_atom.sh:80-94 runs a Python snippet whose output is captured and eval'd as bash. The hf_overrides field for DeepSeek-V4-Pro in models_atom.yaml is the JSON string {"use_index_cache":true,"index_topk_freq":4}. The emitter line is:

print(f'_HF_OVERRIDES="{m.get("hf_overrides", "")}"')

For DeepSeek-V4-Pro this prints literally:

_HF_OVERRIDES="{"use_index_cache":true,"index_topk_freq":4}"

The inner unescaped " characters terminate the outer double-quoted region. Bash treats the line as a concatenation of quoted and unquoted word segments — "{" + use_index_cache + ":true," + index_topk_freq + ":4}" — and assembles them into a single word with all the quotes stripped.

Step-by-step proof

Reproduced locally with the exact Python emitter and bash eval:

$ python3 -c 'm={"hf_overrides": "{\"use_index_cache\":true,\"index_topk_freq\":4}"}; print(f"_HF_OVERRIDES=\"{m.get(\"hf_overrides\", \"\")}\"")' _HF_OVERRIDES="{"use_index_cache":true,"index_topk_freq":4}" $ eval '_HF_OVERRIDES="{"use_index_cache":true,"index_topk_freq":4}"' && echo "[$_HF_OVERRIDES]" [{use_index_cache:true,index_topk_freq:4}] $ python3 -c 'import json; json.loads("{use_index_cache:true,index_topk_freq:4}")' json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)

After the eval, the downstream line

HF_OVERRIDES_ARG="--hf-overrides '${_HF_OVERRIDES}'"

produces --hf-overrides '{use_index_cache:true,index_topk_freq:4}' — an invalid JSON literal with unquoted keys. The atom server's argparse / json.loads on --hf-overrides will reject this at startup.

Why existing code doesn't prevent it

The pre-PR code hard-coded the value as a bash string with backslash-escaped inner quotes:
HF_OVERRIDES_ARG="--hf-overrides '{\"use_index_cache\":true,\"index_topk_freq\":4}'"
That escaping is exactly what survives bash parsing, and it is what the new YAML-driven path loses. DeepSeek-V4-Pro is the only model in models_atom.yaml with a non-empty hf_overrides (the other models' YAML fields contain no " characters, so they are unaffected); the other emitted assignments (env, tp_dp_flags, etc.) are safe.

Impact

This regresses the dsv4-fp4-mi355x-atom-disagg recipe in amd-master.yaml (which sets MODEL_NAME=DeepSeek-V4-Pro and routes through server_atom.sh). The server will fail at startup when atom's argparse calls json.loads on the --hf-overrides argument — and this is precisely the path the PR's own test plan flags (Verify server_atom.sh launches correctly for DeepSeek-V4-Pro).

Fix

Quote the value in the Python emitter so the bash eval sees a properly-escaped literal. Either:

import shlex print(f'_HF_OVERRIDES={shlex.quote(m.get("hf_overrides", ""))}')

(produces _HF_OVERRIDES='{"use_index_cache":true,"index_topk_freq":4}', which bash parses correctly), or write each value to a NUL-delimited side channel that bash reads with read -d '' instead of evaling arbitrary Python output.

claude · 2026-06-25T05:56:28Z

 MTP      : method=mtp num_speculative_tokens=${DECODE_MTP_SIZE}
 xP/yD    : ${xP} / ${yD}
-KV cache : dtype=${KV_CACHE_DTYPE:-auto} block_size=${BLOCK_SIZE} mem_frac=${MEM_FRAC_STATIC}
+KV cache : ${KV_CACHE_ARG:-none} block_size=${BLOCK_SIZE} mem_frac=${MEM_FRAC_STATIC}


🟡 Nit: line 193's INFO banner prints the literal string MTP : method=mtp num_speculative_tokens=${DECODE_MTP_SIZE}, but mtp_flags is now YAML-driven and MiniMax-M3-MXFP4/MXFP8 use --method eagle3 --draft-model Inferact/MiniMax-M3-EAGLE3. When those models run with SPEC_DECODING=mtp, the banner will misleadingly claim method=mtp. Pure log/cosmetic — the Spec args : ${SPEC_ARGS[*]} line immediately below prints the actual flags. Suggest dropping the hardcoded method=mtp (the Spec args line already covers it) or replacing with ${MODEL_MTP_FLAGS}.

Extended reasoning...

What\n\nIn benchmarks/multi_node/amd_utils/server_atom.sh the === Configuration === heredoc still contains a hardcoded line:\n\n\nMTP : method=mtp num_speculative_tokens=${DECODE_MTP_SIZE}\n\n\nThis line predates the YAML-driven refactor in this PR. With the new mtp_flags field, MODEL_MTP_FLAGS can be anything — for the two new MiniMax-M3 entries it is --method eagle3 --draft-model Inferact/MiniMax-M3-EAGLE3 --num-speculative-tokens. So the banner can advertise method=mtp even when the server is actually being launched with EAGLE3 flags.\n\n### Step-by-step proof\n\n1. Set MODEL_NAME=MiniMax-M3-MXFP8, SPEC_DECODING=mtp, DECODE_MTP_SIZE=2.\n2. The YAML block (models_atom.yaml:34-39) supplies mtp_flags: --method eagle3 --draft-model Inferact/MiniMax-M3-EAGLE3 --num-speculative-tokens, so MODEL_MTP_FLAGS is the EAGLE3 string.\n3. server_atom.sh:174-176 builds SPEC_ARGS=(${MODEL_MTP_FLAGS} "$DECODE_MTP_SIZE") → the actual server is launched with --method eagle3 --draft-model … --num-speculative-tokens 2.\n4. server_atom.sh:193 still prints MTP : method=mtp num_speculative_tokens=2 — factually wrong about method.\n5. server_atom.sh:199 immediately below prints Spec args : --method eagle3 --draft-model Inferact/MiniMax-M3-EAGLE3 --num-speculative-tokens 2, which is correct.\n\n### Why this doesn't break anything\n\nThe SPEC_ARGS array (the actual flags passed to python3 -m atom.entrypoints.openai_server) is built correctly from MODEL_MTP_FLAGS; the bug is exclusively in the banner. The Spec args line two lines below shows the truth, so an operator inspecting the log can see the real flags. Also note: the new minimaxm3-fp8-mi355x-atom-disagg recipe sets SPEC_DECODING=none and DECODE_MTP_SIZE=0, so the misleading line is not exercised by anything this PR enables today — it's a latent issue for whenever a MiniMax-M3 atom-disagg recipe is added with MTP on.\n\n### How to fix\n\nReplace line 193 with one of:\n\n\nMTP : ${MODEL_MTP_FLAGS} ${DECODE_MTP_SIZE}\n\n\nor simply drop the line — Spec args : ${SPEC_ARGS[*]} directly below already covers the same information accurately. Severity is nit because it is purely cosmetic and the correct values are visible one line lower.

github-actions · 2026-06-25T10:57:31Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28149790315
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=28149790315

seungrokj and others added 2 commits June 25, 2026 14:39

[AMD] refactor server_atom.sh to drive model-specific config from mod…

a07ef93

…els_atom.yaml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

[AMD] add perf-changelog entry for server_atom.sh refactor (models_at…

ecda65b

…om.yaml-driven) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

seungrokj requested a review from a team June 25, 2026 05:44

seungrokj requested review from 1am9trash, billishyahao, chunfangamd and yctseng0211 as code owners June 25, 2026 05:44

github-project-automation Bot added this to InferenceMAX Board Jun 25, 2026

[AMD] fix perf-changelog pr-link for PR #1930

872f3ff

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

seungrokj changed the title ~~[AMD] refactor server_atom.sh: drive model-specific config from models_atom.yaml~~ [AMD] Add MiniMax-M3-FP4 MI355X ATOMESH update 0623 Jun 25, 2026

Merge branch 'main' into amd/m3_atom_pd_fp8_0623

3d714a9

seungrokj added AMD all-evals Expand eval selection to every fixed-sequence config evals-only Suppress throughput and run only eval jobs; combine with all-evals to expand selection full-sweep-enabled labels Jun 25, 2026

claude Bot reviewed Jun 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMD] Add MiniMax-M3-FP4 MI355X ATOMESH update 0623#1930

[AMD] Add MiniMax-M3-FP4 MI355X ATOMESH update 0623#1930
seungrokj wants to merge 4 commits into
mainfrom
amd/m3_atom_pd_fp8_0623

seungrokj commented Jun 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

claude Bot Jun 25, 2026

Uh oh!

claude Bot Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

seungrokj commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fields added to models_atom.yaml

PR Review Checklist

Uh oh!

Uh oh!

claude Bot Jun 25, 2026

Choose a reason for hiding this comment

What the bug is

Step-by-step proof

Why existing code doesn't prevent it

Impact

Fix

Uh oh!

claude Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

seungrokj commented Jun 25, 2026 •

edited

Loading

Fields added to `models_atom.yaml`