Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
512504d
[AMD] server_atom: improve config print and cleanup
seungrokj Jun 19, 2026
7ffb3e3
update perf-changelog for dsv4-fp4-mi355x-atom-disagg-mtp
seungrokj Jun 19, 2026
50634be
[AMD] fix DECODE_MTP_SIZE and BENCH_REQUEST_RATE propagation in atom-…
seungrokj Jun 19, 2026
2ccacee
[AMD] server_atom: pass SPEC_ARGS to prefill server
seungrokj Jun 19, 2026
dd1e8ac
[AMD] amd-master: fix comment for 1P1D TP8+DPA+TBO+MTP1 config
seungrokj Jun 19, 2026
1f854e4
[AMD] dsv4_atom-disagg: remove DECODE_MTP_SIZE from check_env_vars
seungrokj Jun 19, 2026
1cf914d
[AMD] bench: use --dsv4 flag for DeepSeek-V4-Pro MTP benchmarks
seungrokj Jun 19, 2026
1e7f3da
[AMD] server_atom: export IS_MTP=true when SPEC_DECODING=mtp for benc…
seungrokj Jun 19, 2026
638b837
[AMD] server_atom: fix hf-overrides JSON quoting
seungrokj Jun 19, 2026
3c89eae
fix: inline --hf-overrides to avoid eval word-splitting, remove OPT_ARGS
seungrokj Jun 19, 2026
23808cf
refactor: extract --hf-overrides into HF_OVERRIDES_ARG variable
seungrokj Jun 19, 2026
78806c3
fix: enable --hf-overrides only for DeepSeek-V4-Pro
seungrokj Jun 19, 2026
72734b0
fix: add HF_OVERRIDES_ARG to INFO config print block
seungrokj Jun 19, 2026
688eb03
fix: replace broken-quote array splice with ${ARRAY[*]} in CMD strings
seungrokj Jun 19, 2026
f804274
fix: remove ${CUDAGRAPH_OPT} from decode CMD
seungrokj Jun 19, 2026
931727a
feat: add MiniMax-M3 ATOM disagg CI script and server_atom.sh support
seungrokj Jun 19, 2026
501a8cc
feat: add minimaxm3-fp4-mi355x-atom-disagg recipe and AITER_QUICK_RED…
seungrokj Jun 19, 2026
b430d91
feat: export AITER_QUICK_REDUCE_QUANTIZATION=INT4 for non-DSv4 models
seungrokj Jun 19, 2026
7b80ea7
fix: server_atom.sh and minimaxm3 disagg cleanup
seungrokj Jun 19, 2026
4ea680d
fix: dsv4_fp4_mi355x_atom-disagg cleanup
seungrokj Jun 19, 2026
74aa3e0
fix: set BLOCK_SIZE=128 for MiniMax-M3 in minimaxm3_fp4_mi355x_atom-d…
seungrokj Jun 19, 2026
26ba108
fix: use KV_CACHE_DTYPE=fp8 for MiniMax-M3 disagg (matches atom serve…
seungrokj Jun 19, 2026
b76105f
feat: update minimaxm3-fp4-mi355x-atom-disagg search space and disabl…
seungrokj Jun 19, 2026
de6ddc6
feat: add MiniMax-M3-MXFP4/MXFP8 to models_atom.yaml; set KV_CACHE_DT…
seungrokj Jun 19, 2026
ac99718
fix: set mi355x-disagg runner and add dynamic cudagraph sizes for dec…
seungrokj Jun 19, 2026
aa67d5e
fix: gate ATOM_MOE_GU_ITLV and AITER_BF16_FP8_MOE_BOUND on DeepSeek-V…
seungrokj Jun 19, 2026
2de59c3
fix: preserve empty KV_CACHE_DTYPE to skip --kv-cache-dtype flag
seungrokj Jun 20, 2026
d19ea61
feat: update minimaxm3-fp4-mi355x-atom-disagg search space in amd-mas…
seungrokj Jun 20, 2026
198e2c5
fix: use KV_CACHE_DTYPE=auto for minimaxm3 disagg to skip --kv-cache-…
seungrokj Jun 20, 2026
bf0538d
fix: align minimaxm3 disagg settings with slurm reference script
seungrokj Jun 20, 2026
9ff735c
benchmarks: add MI355X FP4 atom disaggregated multi-node benchmark sc…
seungrokj Jun 21, 2026
d0de9f3
perf-changelog: append PR 1856 entry after rebase
functionstackx Jun 21, 2026
190b055
chore: trim atom mesh whitespace
functionstackx Jun 21, 2026
b60d4ca
Merge branch 'main' into amd/atom_mesh_0619_m3
seungrokj Jun 22, 2026
90d526d
fix: patch custom_all_reduce and pin MAX_MODEL_LEN for minimaxm3 atom…
seungrokj Jun 22, 2026
392a286
Merge branch 'main' into amd/atom_mesh_0619_m3
seungrokj Jun 22, 2026
cea10d2
Merge branch 'main' into amd/atom_mesh_0619_m3
seungrokj Jun 22, 2026
f550a11
fix: patch custom_all_reduce.py via git sparse-checkout at pinned commit
seungrokj Jun 22, 2026
2f73986
fix: bump atom image and clean up server_atom.sh
seungrokj Jun 22, 2026
132f240
Merge branch 'main' into amd/atom_mesh_0619_m3
seungrokj Jun 22, 2026
f9545e5
fix: bump minimaxm3-fp4-mi355x-atom-disagg image to nightly_202606221656
seungrokj Jun 23, 2026
3a7ab53
fix: bump minimaxm3-fp4-mi355x-atom-disagg image to MiniMax-M3-20260622
seungrokj Jun 23, 2026
5f0310c
Merge branch 'main' into amd/atom_mesh_0619_m3
seungrokj Jun 23, 2026
70a1a6f
Merge branch 'main' into amd/atom_mesh_0619_m3
seungrokj Jun 23, 2026
1280704
Merge branch 'main' into amd/atom_mesh_0619_m3
seungrokj Jun 23, 2026
7b27b66
Merge branch 'main' into amd/atom_mesh_0619_m3
seungrokj Jun 24, 2026
64885fc
Merge branch 'main' into amd/atom_mesh_0619_m3
functionstackx Jun 26, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions .github/configs/amd-master.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2827,6 +2827,57 @@ minimaxm3-fp8-mi355x-atom-disagg:
additional-settings:
- "DECODE_NODES=1"

minimaxm3-fp4-mi355x-atom-disagg:
image: rocm/atom-dev:MiniMax-M3-20260622
model: amd/MiniMax-M3-MXFP4
model-prefix: minimaxm3
runner: mi355x-disagg
precision: fp4
framework: atom-disagg
multinode: true
disagg: true
Comment thread
cursor[bot] marked this conversation as resolved.
scenarios:
fixed-seq-len:
- isl: 8192
osl: 1024
search-space:
# 1P1D TP4
- conc-list: [ 1, 2, 4, 8, 16, 32, 64, 128, 256, 512 ]
prefill:
num-worker: 1
tp: 4
ep: 1
dp-attn: false
additional-settings:
- "PREFILL_NODES=1"
decode:
num-worker: 1
tp: 4
ep: 1
dp-attn: false
additional-settings:
- "DECODE_NODES=1"
# 1P1D TP4
- isl: 1024
osl: 1024
search-space:
# 1P1D TP4
- conc-list: [ 1, 2, 4, 8, 16, 32, 64, 128, 256, 512 ]
prefill:
num-worker: 1
tp: 4
ep: 1
dp-attn: false
additional-settings:
- "PREFILL_NODES=1"
decode:
num-worker: 1
tp: 4
ep: 1
dp-attn: false
additional-settings:
- "DECODE_NODES=1"

# MiniMax-M3 MXFP8 MI300X day-zero recipe. Reuse the dedicated ROCm image and
# MI355X serving shape, but retain the default BF16 KV cache because this
# checkpoint lacks calibrated ROCm FP8 attention scales. Use the TP8-only H100
Expand Down
1 change: 0 additions & 1 deletion benchmarks/multi_node/amd_utils/server_atom.sh
Original file line number Diff line number Diff line change
Expand Up @@ -200,7 +200,6 @@ INFO
# rank 1 .. (NODE_OFFSET-1) -> remaining prefill nodes
# rank NODE_OFFSET .. -> decode nodes
# =============================================================================

if [ "$NODE_RANK" -eq 0 ]; then
# ──────────────────────────────────────────────────────────────────────────
# Node 0: prefill server (producer) + atomesh router
Expand Down
94 changes: 94 additions & 0 deletions benchmarks/multi_node/minimaxm3_fp4_mi355x_atom-disagg.sh
Comment thread
seungrokj marked this conversation as resolved.
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
#!/usr/bin/env bash

source "$(dirname "$0")/../benchmark_lib.sh"

check_env_vars \
CONC_LIST \
ISL \
OSL \
IMAGE \
MODEL_PATH \
PREFILL_NUM_WORKERS \
PREFILL_TP \
PREFILL_EP \
PREFILL_DP_ATTN \
DECODE_NUM_WORKERS \
DECODE_TP \
DECODE_EP \
DECODE_DP_ATTN \
PREFILL_NODES \
DECODE_NODES \
RANDOM_RANGE_RATIO \
FRAMEWORK

if [[ -n "$SLURM_JOB_ID" ]]; then
echo "JOB $SLURM_JOB_ID running on $SLURMD_NODENAME"
fi

set -x

# Use upstreamed multi_node scripts (no external clone needed)
cd "$GITHUB_WORKSPACE/benchmarks/multi_node/amd_utils" || exit 1

# Set up SGL launch script-specific environment variables
export TIME_LIMIT="08:00:00"
export MODEL_PATH=$MODEL_PATH
export MODEL_NAME=$MODEL_NAME
export CONTAINER_IMAGE=$IMAGE

if [[ "${PREFILL_EP:-1}" -eq 1 ]]; then
export PREFILL_ENABLE_EP=false
else
export PREFILL_ENABLE_EP=true
fi

if [[ "$PREFILL_DP_ATTN" == "true" ]]; then
export PREFILL_ENABLE_DP=true
else
export PREFILL_ENABLE_DP=false
fi

if [[ "${DECODE_EP:-1}" -eq 1 ]]; then
export DECODE_ENABLE_EP=false
else
export DECODE_ENABLE_EP=true
fi

if [[ "$DECODE_DP_ATTN" == "true" ]]; then
export DECODE_ENABLE_DP=true
else
export DECODE_ENABLE_DP=false
fi

# No MTP for MiniMax-M3
export SPEC_DECODING="none"
export DECODE_MTP_SIZE=0

# Block size 128
export KV_CACHE_DTYPE="${KV_CACHE_DTYPE:-auto}"
export BLOCK_SIZE="${BLOCK_SIZE:-128}"
export MEM_FRAC_STATIC="${MEM_FRAC_STATIC:-0.8}"
export MAX_MODEL_LEN=32768
export MAX_NUM_SEQS="${MAX_NUM_SEQS:-128}"
Comment thread
seungrokj marked this conversation as resolved.
export MAX_NUM_BATCHED_TOKENS="${MAX_NUM_BATCHED_TOKENS:-32768}"
Comment thread
cursor[bot] marked this conversation as resolved.
Comment thread
seungrokj marked this conversation as resolved.

Comment thread
cursor[bot] marked this conversation as resolved.
# Launch jobs based on ISL/OSL
# Replace ' ' in CONC_LIST with 'x' such that the concurrency list is represented
# by a list of numbers delimited by 'x'. This is because of how the underlying launch script
# expects the concurrencies.
JOB_ID=$(bash ./submit.sh $PREFILL_NODES \
$PREFILL_NUM_WORKERS \
$DECODE_NODES \
$DECODE_NUM_WORKERS \
$ISL $OSL "${CONC_LIST// /x}" inf \
${PREFILL_ENABLE_EP} ${PREFILL_ENABLE_DP} \
${DECODE_ENABLE_EP} ${DECODE_ENABLE_DP} \
${PREFILL_TP} ${DECODE_TP} \
${RANDOM_RANGE_RATIO})

if [[ $? -ne 0 ]]; then
echo "Failed to submit job" >&2
exit 1
fi

echo "$JOB_ID"
8 changes: 8 additions & 0 deletions perf-changelog.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4089,6 +4089,14 @@
- "8k/1k: 1p4d-dep4-tep4 (conc 128), 1p4d-dep4-tp8 (conc 4-256), 3p1d-dep4-dep16 (conc 1024), 6p1d-dep4-dep16 (conc 3072), 8p1d-dep4-dep16 (conc 6144)"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1862

- config-keys:
- minimaxm3-fp4-mi355x-atom-disagg
description:
- "Add minimaxm3-fp4-mi355x-atom-disagg CI script: multi-node disaggregated PD on MI355X via ATOM for MiniMax-M3-MXFP4"
- "No MTP, KV_CACHE_DTYPE=auto (MXFP4 native, no fp8 override), MAX_MODEL_LEN=32768, MAX_NUM_BATCHED_TOKENS=32768"
- "server_atom.sh: conditional --kv_cache_dtype, MAX_MODEL_LEN/MAX_NUM_BATCHED_TOKENS/CUDAGRAPH_OPT support, syntax fixes"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1856

- config-keys:
- dsv4-fp4-mi355x-sglang
description:
Expand Down