Optimize Qwen3 decode scope2 SV path by high-cloud · Pull Request #114 · hw-native-sys/pypto-lib

high-cloud · 2026-04-15T09:36:57Z

Summary

Fuse the scope2 softmax and SV matmul loops in Qwen3 decode.
Remove the intermediate all_exp_padded tensor and its extra write/read path.
Keep A2/A3-friendly Q_HEAD_PAD alignment by computing padded softmax rows directly.

Validation

python -m py_compile examples/models/qwen3/qwen3_32b_decode.py
python modules/pypto-lib/examples/models/qwen3/qwen3_32b_decode.py -p a2a3 -d 10 --max-seq --runtime-profiling
- out: PASS (131072/131072 elements matched)
- Total Test Time: 2992.18 us

coderabbitai · 2026-04-15T09:37:04Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 26df5072-affa-48a7-9324-a6de1b5736a0

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request refactors the Qwen3 decode implementation by fusing the softmax and SV matmul stages. It removes the intermediate all_exp_padded tensor and updates tensor shapes and slicing logic to use Q_HEAD_PAD for improved alignment. The online softmax accumulation is also adjusted to handle the updated padding strides. I have no feedback to provide as the review comments were purely explanatory or validated the existing changes.

Optimize Qwen3 decode scope2 SV path

27cbda3

high-cloud changed the title ~~[codex] Optimize Qwen3 decode scope2 SV path~~ Optimize Qwen3 decode scope2 SV path Apr 15, 2026

gemini-code-assist bot reviewed Apr 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Qwen3 decode scope2 SV path#114

Optimize Qwen3 decode scope2 SV path#114
high-cloud wants to merge 1 commit intohw-native-sys:mainfrom
high-cloud:codex/qwen3-decode-fused-sv

high-cloud commented Apr 15, 2026

Uh oh!

coderabbitai bot commented Apr 15, 2026

Review skipped

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

high-cloud commented Apr 15, 2026

Summary

Validation

Uh oh!

coderabbitai bot commented Apr 15, 2026

Review skipped

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant