[follow up 1299]: fix gpu build && optimize witness build and avoid GPU OOM#1316
Merged
kunxian-xia merged 3 commits intomasterfrom Apr 21, 2026
Merged
[follow up 1299]: fix gpu build && optimize witness build and avoid GPU OOM#1316kunxian-xia merged 3 commits intomasterfrom
kunxian-xia merged 3 commits intomasterfrom
Conversation
4853368 to
cfdab05
Compare
fc65623 to
a72e4ac
Compare
kunxian-xia
approved these changes
Apr 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
PR #1299 changed GKR output/eval wiring, but the GPU proving flow still treated
build_main_witnessas if only the old read/write/lookup outputs existed. That caused two issues:Design Rationale
Keep the proof shape and verifier unchanged, and fix this entirely in prover-side staging:
This reduces VRAM pressure during tower proving without changing proof semantics.
Change Highlights
ceno_zkvmGkrOutputStageMaskrouting for first-layer output groupsprove_tower_relationgkr_witnessAPIgkr_iopBenchmark / Performance Impact
Primary intent is memory reduction, not throughput optimization.
Operation
Benchmark command(s):
Ceno reth
https://github.com/scroll-tech/ceno-reth-benchmark/actions/runs/24667766891
Testing
Risks and Rollout
Follow-ups (optional)
Copilot Reviewer Directive (keep this section)
When Copilot reviews this PR, apply
.github/copilot-instructions.mdstrictly.