[follow up 1299]: fix gpu build && optimize witness build and avoid GPU OOM by hero78119 · Pull Request #1316 · scroll-tech/ceno

hero78119 · 2026-04-20T08:32:55Z

Problem

PR #1299 changed GKR output/eval wiring, but the GPU proving flow still treated build_main_witness as if only the old read/write/lookup outputs existed. That caused two issues:

stale scheduler / memcheck estimates after refactor(rotation): move rotation proof to chip level and unify selec… #1299
unnecessary GPU witness materialization before tower proving, which can push large Keccak payloads into OOM on 4090-class cards

Design Rationale

Keep the proof shape and verifier unchanged, and fix this entirely in prover-side staging:

route first-layer GKR output groups with prover-only stage metadata
materialize only tower-needed outputs before tower proving
keep ECC / rotation self-contained in their existing submodules
update GPU memory estimation to match the post-refactor(rotation): move rotation proof to chip level and unify selec… #1299 output topology

This reduces VRAM pressure during tower proving without changing proof semantics.

Change Highlights

ceno_zkvm
- add prover-only GkrOutputStageMask routing for first-layer output groups
- build only tower-facing witness outputs before prove_tower_relation
- keep ECC / rotation on their existing dedicated witness/eval paths
- update GPU memory estimation for post-refactor(rotation): move rotation proof to chip level and unify selec… #1299 GKR outputs and tower-stage residency
- update local precompile/test callsites to the new gkr_witness API
gkr_iop
- extend witness generation with filtered materialization APIs for CPU/GPU backends
- keep filtering internal to prover execution; no verifier/proof-format changes

Benchmark / Performance Impact

Primary intent is memory reduction, not throughput optimization.

Operation

Benchmark command(s):

CENO_GPU_ENABLE_WITGEN=1 cargo run --config net.git-fetch-with-cli=true --features gpu --release --package ceno_zkvm --bin e2e -- --platform=ceno --max-cycle-per-shard=1600 examples/target/riscv32im-ceno-zkvm-elf/release/examples/keccak_syscall

Ceno reth
https://github.com/scroll-tech/ceno-reth-benchmark/actions/runs/24667766891

Testing

cargo make clippy
cargo check -p ceno_zkvm --features gpu
CENO_GPU_ENABLE_WITGEN=1 cargo run --config net.git-fetch-with-cli=true --features gpu --release --package ceno_zkvm --bin e2e -- --platform=ceno --max-cycle-per-shard=1600 examples/target/riscv32im-ceno-zkvm-elf/release/examples/keccak_syscall

Risks and Rollout

Risk is limited to prover-side witness staging and GPU memory estimation.
Verifier behavior and proof format are intentionally unchanged.
If this regresses proving, rollback can revert the new output-stage routing and filtered witness materialization together.

Follow-ups (optional)

Add a targeted large-payload regression check for post-refactor(rotation): move rotation proof to chip level and unify selec… #1299 GPU WITGEN memory peaks.

Copilot Reviewer Directive (keep this section)

When Copilot reviews this PR, apply .github/copilot-instructions.md strictly.

hero78119 changed the title ~~follow up 1299: fix gpu build~~ [follow up 1299]: fix gpu build Apr 20, 2026

follow up 1299: fix gpu build

cfdab05

hero78119 force-pushed the bug/fix_gpu_build branch from 4853368 to cfdab05 Compare April 20, 2026 08:35

add gkr output class

a72e4ac

hero78119 changed the title ~~[follow up 1299]: fix gpu build~~ [follow up 1299]: fix gpu build && optimize witness build and avoid GPU OOM Apr 20, 2026

hero78119 added the regression-e2e-reth trigger regression test of https://github.com/scroll-tech/ceno-reth-benchmark label Apr 20, 2026

hero78119 force-pushed the bug/fix_gpu_build branch from fc65623 to a72e4ac Compare April 20, 2026 13:52

Merge branch 'master' into bug/fix_gpu_build

c8760fd

kunxian-xia approved these changes Apr 21, 2026

View reviewed changes

kunxian-xia added this pull request to the merge queue Apr 21, 2026

Merged via the queue into master with commit 9cf83a0 Apr 21, 2026
4 checks passed

kunxian-xia deleted the bug/fix_gpu_build branch April 21, 2026 01:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[follow up 1299]: fix gpu build && optimize witness build and avoid GPU OOM#1316

[follow up 1299]: fix gpu build && optimize witness build and avoid GPU OOM#1316
kunxian-xia merged 3 commits intomasterfrom
bug/fix_gpu_build

hero78119 commented Apr 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hero78119 commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Design Rationale

Change Highlights

Benchmark / Performance Impact

Operation

Testing

Risks and Rollout

Follow-ups (optional)

Copilot Reviewer Directive (keep this section)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hero78119 commented Apr 20, 2026 •

edited

Loading