Skip to content

Optimize Qwen3 decode scope2 SV path#114

Draft
high-cloud wants to merge 1 commit intohw-native-sys:mainfrom
high-cloud:codex/qwen3-decode-fused-sv
Draft

Optimize Qwen3 decode scope2 SV path#114
high-cloud wants to merge 1 commit intohw-native-sys:mainfrom
high-cloud:codex/qwen3-decode-fused-sv

Conversation

@high-cloud
Copy link
Copy Markdown
Contributor

Summary

  • Fuse the scope2 softmax and SV matmul loops in Qwen3 decode.
  • Remove the intermediate all_exp_padded tensor and its extra write/read path.
  • Keep A2/A3-friendly Q_HEAD_PAD alignment by computing padded softmax rows directly.

Validation

  • python -m py_compile examples/models/qwen3/qwen3_32b_decode.py
  • python modules/pypto-lib/examples/models/qwen3/qwen3_32b_decode.py -p a2a3 -d 10 --max-seq --runtime-profiling
    • out: PASS (131072/131072 elements matched)
    • Total Test Time: 2992.18 us

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 15, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 26df5072-affa-48a7-9324-a6de1b5736a0

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@high-cloud high-cloud changed the title [codex] Optimize Qwen3 decode scope2 SV path Optimize Qwen3 decode scope2 SV path Apr 15, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Qwen3 decode implementation by fusing the softmax and SV matmul stages. It removes the intermediate all_exp_padded tensor and updates tensor shapes and slicing logic to use Q_HEAD_PAD for improved alignment. The online softmax accumulation is also adjusted to handle the updated padding strides. I have no feedback to provide as the review comments were purely explanatory or validated the existing changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant