Update: migrate scope3 sort/gather to tensor-level API and reduce MAX_SEQ by zhangqi-chen · Pull Request #140 · hw-native-sys/pypto-lib

zhangqi-chen · 2026-04-21T03:22:35Z

Summary

Replace pl.tile.sort32/mrgsort/gather + explicit pl.load/pl.store with pl.tensor.sort32/mrgsort/gather + pl.slice/pl.assemble, adapting to the new tensor-level ops merged in pypto (#1097)
Reduce MAX_SEQ from 8192 to 4096; introduce SORT_LEN=8192 to keep the sort buffer at full width — scores tensor is [BATCH, SORT_LEN] and Stage 0 fills the entire row with -inf so the [MAX_SEQ, SORT_LEN) tail is always -inf without an extra fillpad in the sort kernel
idx_init signature changed to pl.UINT32 (required by tensor.sort32); TensorSpec keeps torch.int32 (same bit layout, matches simpler runtime)

Related Issues

N/A

…_SEQ - Replace pl.tile.sort32/mrgsort/gather + explicit pl.load/pl.store with pl.tensor.sort32/mrgsort/gather + pl.slice/pl.assemble, adapting to the new tensor-level ops merged in pypto (#1097) - Reduce MAX_SEQ from 8192 to 4096; introduce SORT_LEN=8192 to keep the sort buffer at full width — scores tensor is [BATCH, SORT_LEN] and Stage 0 fills the entire row with -inf so the [MAX_SEQ, SORT_LEN) tail is always -inf without an extra fillpad in the sort kernel - idx_init signature changed to pl.UINT32 (required by tensor.sort32); TensorSpec keeps torch.int32 (same bit layout, matches simpler runtime)

gemini-code-assist · 2026-04-21T03:22:39Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

coderabbitai · 2026-04-21T03:22:49Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6d33f4bd-42c1-4975-ad52-21908bc4f09e

📥 Commits

Reviewing files that changed from the base of the PR and between 35486ff and da642d0.

📒 Files selected for processing (1)

examples/models/deepseek_v3_2/deepseek_v3_2_decode_front_scope3.py

📝 Walkthrough

Walkthrough

The deepseek_v3_2 decode front scope3 kernel is restructured to use two buffer size constants: MAX_SEQ (reduced from 8192 to 4096) for k-cache indexing and a new SORT_LEN=8192 for sorting operations. Tensor shapes for scoring, sorting, and index initialization are updated accordingly, with Stages 0, 3, and 4 modified to handle the expanded sort buffer.

Changes

Cohort / File(s)	Summary
Buffer sizing and constants `examples/models/deepseek_v3_2/deepseek_v3_2_decode_front_scope3.py`	`MAX_SEQ` reduced from 8192 to 4096; new `SORT_LEN=8192` constant added with comment enforcing `SORT_LEN > MAX_SEQ`. Updated `build_tensor_specs()` to reflect `idx_init` shape change from `[1, MAX_SEQ]` to `[1, SORT_LEN]`.
Kernel stages 0, 3, 4 `examples/models/deepseek_v3_2/deepseek_v3_2_decode_front_scope3.py`	Stage 0 pre-fills entire `scores[b, 0:SORT_LEN]` with `-inf`. Stage 3 sorting operates on `SORT_LEN`-dimensioned tensors via tensor-level operations. Stage 4 top-k extraction slices `sorted_gm` with width `2 * INDEX_TOPK` and uses `assemble` for output writing.
Tensor transient storage `examples/models/deepseek_v3_2/deepseek_v3_2_decode_front_scope3.py`	`scores` resized from `[BATCH, MAX_SEQ]` to `[BATCH, SORT_LEN]`; `sorted_gm` resized from `[BATCH, 2 * MAX_SEQ]` to `[BATCH, 2 * SORT_LEN]`.
Golden reference and test initialization `examples/models/deepseek_v3_2/deepseek_v3_2_decode_front_scope3.py`	Golden `scores` tensor updated to `(BATCH, SORT_LEN)`. `init_idx_init()` and its `TensorSpec` changed from arange over `MAX_SEQ` to `SORT_LEN`. Kernel/PyTorch top-k now operates over extended `SORT_LEN` rows with valid data only for `[:ctx_len]`.
Public API signature `examples/models/deepseek_v3_2/deepseek_v3_2_decode_front_scope3.py`	Function parameter `idx_init: pl.Tensor[[1, MAX_SEQ], pl.UINT32]` updated to `idx_init: pl.Tensor[[1, SORT_LEN], pl.UINT32]`.

Possibly related PRs

Update: replace decode-front with ds32exp and add scope3 #137: Modifies the same decode_front_scope3 kernel and tensor shape specifications (idx_init, scores, sorted_gm buffers) and top-k/sort buffer handling.

Poem

🐰 Buffers resized with a hop and a bound,
SORT_LEN now holds the sorting ground,
While MAX_SEQ shrinks down with grace,
Infinity fills the padding space,
Decode stages dance their choreographed way!

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately captures the two main changes: migrating to tensor-level API and reducing MAX_SEQ.
Description check	✅ Passed	The description clearly explains the migration to tensor-level operations and the MAX_SEQ reduction with SORT_LEN introduction.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

zhangqi-chen merged commit 9db4508 into hw-native-sys:main Apr 21, 2026
6 checks passed

zhangqi-chen deleted the feat/ds32-decode-front-scope3 branch April 21, 2026 03:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update: migrate scope3 sort/gather to tensor-level API and reduce MAX_SEQ#140

Update: migrate scope3 sort/gather to tensor-level API and reduce MAX_SEQ#140
zhangqi-chen merged 1 commit intohw-native-sys:mainfrom
zhangqi-chen:feat/ds32-decode-front-scope3

zhangqi-chen commented Apr 21, 2026

Uh oh!

gemini-code-assist bot commented Apr 21, 2026

Uh oh!

coderabbitai bot commented Apr 21, 2026 •

edited

Loading

Walkthrough

Changes

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zhangqi-chen commented Apr 21, 2026

Summary

Related Issues

Uh oh!

gemini-code-assist bot commented Apr 21, 2026

Uh oh!

coderabbitai bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Apr 21, 2026 •

edited

Loading