[None][fix] Fix DeepSeekV32 test_fp8_blockscale[baseline_mtp1] OOM on Blackwell#12823
[None][fix] Fix DeepSeekV32 test_fp8_blockscale[baseline_mtp1] OOM on Blackwell#12823sunnyqgg wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
… to avoid OOM MTP layers consume extra GPU memory for additional prediction parameters and intermediate tensors during forward pass. On Blackwell (SM 100/103), the test_fp8_blockscale[baseline_mtp1] was OOMing with only ~140 MiB shortage when using free_gpu_memory_fraction=0.6. Reduce the KV cache fraction from 0.6 to 0.55 when MTP is enabled (mtp_nextn > 0) on Blackwell GPUs, leaving more headroom for MTP forward pass allocations. Fixes: https://nvbugs/5955792 Signed-off-by: qgai <qgai@nvidia.com>
|
/bot run --extra-stage "DGX_B200-8_GPUs-PyTorch-1" |
📝 WalkthroughWalkthroughModified the KV cache configuration in a test to conditionally adjust the Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
/bot run --stage-list "DGX_B200-8_GPUs-PyTorch-1" |
|
PR_Github #42249 [ run ] triggered by Bot. Commit: |
|
PR_Github #42252 [ run ] triggered by Bot. Commit: |
|
PR_Github #42249 [ run ] completed with state |
|
PR_Github #42252 [ run ] completed with state |
|
/bot run |
|
PR_Github #42346 [ run ] triggered by Bot. Commit: |
|
PR_Github #42346 [ run ] completed with state
|
Summary
free_gpu_memory_fractionfrom 0.6 to 0.55 for MTP-enabled configs on Blackwell (SM 100/103) inTestDeepSeekV32::test_fp8_blockscaleTest plan
DGX_B200-8_GPUs-PyTorch-1stage passestest_fp8_blockscale[baseline_mtp1]TestDeepSeekV32::test_fp8_blockscalevariants still pass (baseline, baseline_fp8kv, latency, etc.)Fixes: https://nvbugs/5955792