Optimize Qwen3 decode scope2 SV path#114
Optimize Qwen3 decode scope2 SV path#114high-cloud wants to merge 1 commit intohw-native-sys:mainfrom
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request refactors the Qwen3 decode implementation by fusing the softmax and SV matmul stages. It removes the intermediate all_exp_padded tensor and updates tensor shapes and slicing logic to use Q_HEAD_PAD for improved alignment. The online softmax accumulation is also adjusted to handle the updated padding strides. I have no feedback to provide as the review comments were purely explanatory or validated the existing changes.
Summary
Validation