[None][fix] Fix DeepSeekV32 test_fp8_blockscale[baseline_mtp1] OOM on Blackwell by sunnyqgg · Pull Request #12823 · NVIDIA/TensorRT-LLM

sunnyqgg · 2026-04-08T02:54:29Z

Summary

Reduce KV cache free_gpu_memory_fraction from 0.6 to 0.55 for MTP-enabled configs on Blackwell (SM 100/103) in TestDeepSeekV32::test_fp8_blockscale
MTP layers consume extra GPU memory for additional prediction parameters and intermediate tensors during forward pass, causing OOM with ~140 MiB shortage
Remove corresponding test waive entry

Test plan

CI DGX_B200-8_GPUs-PyTorch-1 stage passes test_fp8_blockscale[baseline_mtp1]
Other TestDeepSeekV32::test_fp8_blockscale variants still pass (baseline, baseline_fp8kv, latency, etc.)

… to avoid OOM MTP layers consume extra GPU memory for additional prediction parameters and intermediate tensors during forward pass. On Blackwell (SM 100/103), the test_fp8_blockscale[baseline_mtp1] was OOMing with only ~140 MiB shortage when using free_gpu_memory_fraction=0.6. Reduce the KV cache fraction from 0.6 to 0.55 when MTP is enabled (mtp_nextn > 0) on Blackwell GPUs, leaving more headroom for MTP forward pass allocations. Fixes: https://nvbugs/5955792 Signed-off-by: qgai <qgai@nvidia.com>

sunnyqgg · 2026-04-08T02:54:42Z

/bot run --extra-stage "DGX_B200-8_GPUs-PyTorch-1"

coderabbitai · 2026-04-08T02:57:49Z

📝 Walkthrough

Walkthrough

Modified the KV cache configuration in a test to conditionally adjust the free_gpu_memory_fraction parameter based on a multi-token prediction condition. Removed a corresponding test waiver entry from the waive-list.

Changes

Cohort / File(s)	Summary
Test Configuration Update `tests/integration/defs/accuracy/test_llm_api_pytorch.py`	Modified `test_fp8_blockscale` to set `free_gpu_memory_fraction` to `0.55` when `mtp_nextn > 0`, otherwise `0.6`.
Waiver Cleanup `tests/integration/test_lists/waives.txt`	Removed waiver entry for `accuracy/test_llm_api_pytorch.py::TestDeepSeekV32::test_fp8_blockscale[baseline_mtp1]`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main change: fixing an OOM issue in the DeepSeekV32 test on Blackwell hardware.
Description check	✅ Passed	PR description clearly explains the issue, solution, test coverage, and includes proper NVBugs reference with meaningful context about memory optimization.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sunnyqgg · 2026-04-08T02:58:59Z

/bot run --stage-list "DGX_B200-8_GPUs-PyTorch-1"

tensorrt-cicd · 2026-04-08T03:01:11Z

PR_Github #42249 [ run ] triggered by Bot. Commit: 7a18667 Link to invocation

tensorrt-cicd · 2026-04-08T03:04:48Z

PR_Github #42252 [ run ] triggered by Bot. Commit: 7a18667 Link to invocation

tensorrt-cicd · 2026-04-08T03:04:52Z

PR_Github #42249 [ run ] completed with state ABORTED. Commit: 7a18667

Link to invocation

tensorrt-cicd · 2026-04-08T08:58:34Z

PR_Github #42252 [ run ] completed with state SUCCESS. Commit: 7a18667
/LLM/main/L0_MergeRequest_PR pipeline #33061 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

sunnyqgg · 2026-04-08T13:54:00Z

/bot run

tensorrt-cicd · 2026-04-08T13:59:59Z

PR_Github #42346 [ run ] triggered by Bot. Commit: 7a18667 Link to invocation

tensorrt-cicd · 2026-04-08T17:16:49Z

PR_Github #42346 [ run ] completed with state SUCCESS. Commit: 7a18667
/LLM/main/L0_MergeRequest_PR pipeline #33132 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

sunnyqgg requested a review from a team as a code owner April 8, 2026 02:54

github-actions bot assigned sunnyqgg Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[None][fix] Fix DeepSeekV32 test_fp8_blockscale[baseline_mtp1] OOM on Blackwell#12823

[None][fix] Fix DeepSeekV32 test_fp8_blockscale[baseline_mtp1] OOM on Blackwell#12823
sunnyqgg wants to merge 1 commit intoNVIDIA:mainfrom
sunnyqgg:fix/nvbug-5955792-deepseekv32-mtp-oom

sunnyqgg commented Apr 8, 2026 •

edited

Loading

Uh oh!

sunnyqgg commented Apr 8, 2026

Uh oh!

coderabbitai bot commented Apr 8, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

sunnyqgg commented Apr 8, 2026

Uh oh!

tensorrt-cicd commented Apr 8, 2026

Uh oh!

tensorrt-cicd commented Apr 8, 2026

Uh oh!

tensorrt-cicd commented Apr 8, 2026

Uh oh!

tensorrt-cicd commented Apr 8, 2026

Uh oh!

sunnyqgg commented Apr 8, 2026

Uh oh!

tensorrt-cicd commented Apr 8, 2026

Uh oh!

tensorrt-cicd commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sunnyqgg commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

sunnyqgg commented Apr 8, 2026

Uh oh!

coderabbitai bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

sunnyqgg commented Apr 8, 2026

Uh oh!

tensorrt-cicd commented Apr 8, 2026

Uh oh!

tensorrt-cicd commented Apr 8, 2026

Uh oh!

tensorrt-cicd commented Apr 8, 2026

Uh oh!

tensorrt-cicd commented Apr 8, 2026

Uh oh!

sunnyqgg commented Apr 8, 2026

Uh oh!

tensorrt-cicd commented Apr 8, 2026

Uh oh!

tensorrt-cicd commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sunnyqgg commented Apr 8, 2026 •

edited

Loading

coderabbitai bot commented Apr 8, 2026 •

edited

Loading