[None][test] add unit test and e2e test for gpt_oss_20b MHA kernel#12796
[None][test] add unit test and e2e test for gpt_oss_20b MHA kernel#12796ruodil wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
|
/bot run |
|
PR_Github #42091 [ run ] triggered by Bot. Commit: |
📝 WalkthroughWalkthroughThis change adds performance configurations and test cases for GPT-OSS 20B models with PyTorch float4 backend, along with a new unit test for CUDA kernel selection behavior using TensorRT-LLM integration. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/unittest/_torch/modeling/test_modeling_gpt_oss.py (1)
125-128: Consider adding.eval()as a best practice for inference models, though GPT-OSS has no training-sensitive branches.Line 128 creates
GptOssForCausalLMwithout calling.eval(). Whiletorch.inference_mode()correctly disables autograd for the profiling runs (lines 198-212), the GPT-OSS model does not contain Dropout layers orself.trainingconditional branches. Adding.eval()is nonetheless a good practice for clarity and consistency:- model = GptOssForCausalLM(model_config).cuda() + model = GptOssForCausalLM(model_config).cuda().eval()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unittest/_torch/modeling/test_modeling_gpt_oss.py` around lines 125 - 128, The test constructs a GptOssForCausalLM instance without setting it to evaluation mode; update the test to call .eval() on the model instance created from GptOssForCausalLM(model_config) (e.g., call model.eval() before or after .cuda()) so the model is explicitly in inference mode; this uses the existing symbols ModelConfig, gpt_oss_config, attn_backend="TRTLLM", and GptOssForCausalLM to locate the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unittest/_torch/modeling/test_modeling_gpt_oss.py`:
- Around line 6-23: Add the required NVIDIA SPDX copyright/header at the very
top of the file (before the first import, e.g., before the "import torch" line)
using the repository-mandated template and the latest meaningful modification
year (2026); ensure the header appears above the existing imports such as
"import torch", "from transformers import AutoTokenizer, GptOssConfig" and
references to "GptOssForCausalLM" so the file passes license checks.
- Around line 140-145: The zip() used to compute token_nums (token_nums = [p + s
for p, s in zip(past_seen_tokens, sequence_length)]) should include strict=True
to avoid silent truncation when past_seen_tokens and sequence_length lengths
differ; update that list comprehension to call zip(past_seen_tokens,
sequence_length, strict=True) so mismatched lengths raise an error during tests
(references: variables past_seen_tokens, sequence_length, token_nums).
---
Nitpick comments:
In `@tests/unittest/_torch/modeling/test_modeling_gpt_oss.py`:
- Around line 125-128: The test constructs a GptOssForCausalLM instance without
setting it to evaluation mode; update the test to call .eval() on the model
instance created from GptOssForCausalLM(model_config) (e.g., call model.eval()
before or after .cuda()) so the model is explicitly in inference mode; this uses
the existing symbols ModelConfig, gpt_oss_config, attn_backend="TRTLLM", and
GptOssForCausalLM to locate the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: dc769040-48cc-42e8-b9f6-32fe69cbdce5
📒 Files selected for processing (3)
tests/integration/defs/perf/pytorch_model_config.pytests/integration/test_lists/qa/llm_perf_core.ymltests/unittest/_torch/modeling/test_modeling_gpt_oss.py
| import torch | ||
| from torch.profiler import ProfilerActivity | ||
| from transformers import AutoTokenizer, GptOssConfig | ||
| from utils.llm_data import llm_models_root | ||
| from utils.util import skip_no_hopper | ||
| from utils.util import skip_no_hopper, skip_pre_hopper | ||
|
|
||
| import tensorrt_llm | ||
| from tensorrt_llm import LLM, SamplingParams | ||
| from tensorrt_llm._torch.attention_backend.utils import get_attention_backend | ||
| from tensorrt_llm._torch.metadata import KVCacheParams | ||
| from tensorrt_llm._torch.model_config import ModelConfig | ||
| from tensorrt_llm._torch.models.modeling_gpt_oss import GptOssForCausalLM | ||
| from tensorrt_llm._torch.pyexecutor.resource_manager import KVCacheManager | ||
| from tensorrt_llm.bindings.executor import \ | ||
| KvCacheConfig as BindingsKvCacheConfig | ||
| from tensorrt_llm.llmapi import CudaGraphConfig, KvCacheConfig, MoeConfig | ||
| from tensorrt_llm.mapping import Mapping | ||
|
|
There was a problem hiding this comment.
Add the required NVIDIA SPDX header before the import block.
This file was meaningfully modified in 2026 but still starts directly with imports, so the repository-mandated copyright/license header is missing.
As per coding guidelines, All TensorRT-LLM Open Source Software code files should contain an NVIDIA copyright header with the year of latest meaningful modification.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unittest/_torch/modeling/test_modeling_gpt_oss.py` around lines 6 - 23,
Add the required NVIDIA SPDX copyright/header at the very top of the file
(before the first import, e.g., before the "import torch" line) using the
repository-mandated template and the latest meaningful modification year (2026);
ensure the header appears above the existing imports such as "import torch",
"from transformers import AutoTokenizer, GptOssConfig" and references to
"GptOssForCausalLM" so the file passes license checks.
| batch_size = 8 | ||
| context_sequence_length = [] # no context (prefill) sequences | ||
| sequence_length = [1] * batch_size # all decode, 1 new token each | ||
| past_seen_tokens = [256] * batch_size # enough history for multi-block | ||
| request_ids = list(range(batch_size)) | ||
| token_nums = [p + s for p, s in zip(past_seen_tokens, sequence_length)] |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python - <<'PY'
import ast
from pathlib import Path
path = Path("tests/unittest/_torch/modeling/test_modeling_gpt_oss.py")
tree = ast.parse(path.read_text(encoding="utf-8"))
for node in ast.walk(tree):
if isinstance(node, ast.Call) and isinstance(node.func, ast.Name) and node.func.id == "zip":
has_strict = any(keyword.arg == "strict" for keyword in node.keywords)
print(f"Line {node.lineno}: zip(...), strict={has_strict}")
PYRepository: NVIDIA/TensorRT-LLM
Length of output: 95
Add strict=True to the zip() call on Line 145.
The zip() function on line 145 currently lacks the strict parameter, which allows silent truncation if past_seen_tokens and sequence_length lengths diverge. Since TensorRT-LLM requires Python 3.10+, use strict=True to detect such mismatches early. Ruff already flags this as B905.
Suggested fix
- token_nums = [p + s for p, s in zip(past_seen_tokens, sequence_length)]
+ token_nums = [
+ p + s
+ for p, s in zip(past_seen_tokens, sequence_length, strict=True)
+ ]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| batch_size = 8 | |
| context_sequence_length = [] # no context (prefill) sequences | |
| sequence_length = [1] * batch_size # all decode, 1 new token each | |
| past_seen_tokens = [256] * batch_size # enough history for multi-block | |
| request_ids = list(range(batch_size)) | |
| token_nums = [p + s for p, s in zip(past_seen_tokens, sequence_length)] | |
| batch_size = 8 | |
| context_sequence_length = [] # no context (prefill) sequences | |
| sequence_length = [1] * batch_size # all decode, 1 new token each | |
| past_seen_tokens = [256] * batch_size # enough history for multi-block | |
| request_ids = list(range(batch_size)) | |
| token_nums = [ | |
| p + s | |
| for p, s in zip(past_seen_tokens, sequence_length, strict=True) | |
| ] |
🧰 Tools
🪛 Ruff (0.15.9)
[warning] 145-145: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unittest/_torch/modeling/test_modeling_gpt_oss.py` around lines 140 -
145, The zip() used to compute token_nums (token_nums = [p + s for p, s in
zip(past_seen_tokens, sequence_length)]) should include strict=True to avoid
silent truncation when past_seen_tokens and sequence_length lengths differ;
update that list comprehension to call zip(past_seen_tokens, sequence_length,
strict=True) so mismatched lengths raise an error during tests (references:
variables past_seen_tokens, sequence_length, token_nums).
|
PR_Github #42091 [ run ] completed with state
|
|
/bot run |
|
PR_Github #42240 [ run ] triggered by Bot. Commit: |
|
PR_Github #42240 [ run ] completed with state
|
Summary by CodeRabbit
Tests
Chores
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.