Skip to content

[None][fix] Fix DeepSeekV32 test_fp8_blockscale[baseline_mtp1] OOM on Blackwell#12823

Open
sunnyqgg wants to merge 1 commit intoNVIDIA:mainfrom
sunnyqgg:fix/nvbug-5955792-deepseekv32-mtp-oom
Open

[None][fix] Fix DeepSeekV32 test_fp8_blockscale[baseline_mtp1] OOM on Blackwell#12823
sunnyqgg wants to merge 1 commit intoNVIDIA:mainfrom
sunnyqgg:fix/nvbug-5955792-deepseekv32-mtp-oom

Conversation

@sunnyqgg
Copy link
Copy Markdown
Collaborator

@sunnyqgg sunnyqgg commented Apr 8, 2026

Summary

  • Reduce KV cache free_gpu_memory_fraction from 0.6 to 0.55 for MTP-enabled configs on Blackwell (SM 100/103) in TestDeepSeekV32::test_fp8_blockscale
  • MTP layers consume extra GPU memory for additional prediction parameters and intermediate tensors during forward pass, causing OOM with ~140 MiB shortage
  • Remove corresponding test waive entry

Test plan

  • CI DGX_B200-8_GPUs-PyTorch-1 stage passes test_fp8_blockscale[baseline_mtp1]
  • Other TestDeepSeekV32::test_fp8_blockscale variants still pass (baseline, baseline_fp8kv, latency, etc.)

Fixes: https://nvbugs/5955792

… to avoid OOM

MTP layers consume extra GPU memory for additional prediction parameters
and intermediate tensors during forward pass. On Blackwell (SM 100/103),
the test_fp8_blockscale[baseline_mtp1] was OOMing with only ~140 MiB
shortage when using free_gpu_memory_fraction=0.6.

Reduce the KV cache fraction from 0.6 to 0.55 when MTP is enabled
(mtp_nextn > 0) on Blackwell GPUs, leaving more headroom for MTP
forward pass allocations.

Fixes: https://nvbugs/5955792
Signed-off-by: qgai <qgai@nvidia.com>
@sunnyqgg sunnyqgg requested a review from a team as a code owner April 8, 2026 02:54
@sunnyqgg
Copy link
Copy Markdown
Collaborator Author

sunnyqgg commented Apr 8, 2026

/bot run --extra-stage "DGX_B200-8_GPUs-PyTorch-1"

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 8, 2026

📝 Walkthrough

Walkthrough

Modified the KV cache configuration in a test to conditionally adjust the free_gpu_memory_fraction parameter based on a multi-token prediction condition. Removed a corresponding test waiver entry from the waive-list.

Changes

Cohort / File(s) Summary
Test Configuration Update
tests/integration/defs/accuracy/test_llm_api_pytorch.py
Modified test_fp8_blockscale to set free_gpu_memory_fraction to 0.55 when mtp_nextn > 0, otherwise 0.6.
Waiver Cleanup
tests/integration/test_lists/waives.txt
Removed waiver entry for accuracy/test_llm_api_pytorch.py::TestDeepSeekV32::test_fp8_blockscale[baseline_mtp1].

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: fixing an OOM issue in the DeepSeekV32 test on Blackwell hardware.
Description check ✅ Passed PR description clearly explains the issue, solution, test coverage, and includes proper NVBugs reference with meaningful context about memory optimization.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@sunnyqgg
Copy link
Copy Markdown
Collaborator Author

sunnyqgg commented Apr 8, 2026

/bot run --stage-list "DGX_B200-8_GPUs-PyTorch-1"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42249 [ run ] triggered by Bot. Commit: 7a18667 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42252 [ run ] triggered by Bot. Commit: 7a18667 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42249 [ run ] completed with state ABORTED. Commit: 7a18667

Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42252 [ run ] completed with state SUCCESS. Commit: 7a18667
/LLM/main/L0_MergeRequest_PR pipeline #33061 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

@sunnyqgg
Copy link
Copy Markdown
Collaborator Author

sunnyqgg commented Apr 8, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42346 [ run ] triggered by Bot. Commit: 7a18667 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42346 [ run ] completed with state SUCCESS. Commit: 7a18667
/LLM/main/L0_MergeRequest_PR pipeline #33132 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants