Skip to content

[follow up 1299]: fix gpu build && optimize witness build and avoid GPU OOM#1316

Merged
kunxian-xia merged 3 commits intomasterfrom
bug/fix_gpu_build
Apr 21, 2026
Merged

[follow up 1299]: fix gpu build && optimize witness build and avoid GPU OOM#1316
kunxian-xia merged 3 commits intomasterfrom
bug/fix_gpu_build

Conversation

@hero78119
Copy link
Copy Markdown
Collaborator

@hero78119 hero78119 commented Apr 20, 2026

Problem

PR #1299 changed GKR output/eval wiring, but the GPU proving flow still treated build_main_witness as if only the old read/write/lookup outputs existed. That caused two issues:

Design Rationale

Keep the proof shape and verifier unchanged, and fix this entirely in prover-side staging:

This reduces VRAM pressure during tower proving without changing proof semantics.

Change Highlights

  • ceno_zkvm
    • add prover-only GkrOutputStageMask routing for first-layer output groups
    • build only tower-facing witness outputs before prove_tower_relation
    • keep ECC / rotation on their existing dedicated witness/eval paths
    • update GPU memory estimation for post-refactor(rotation): move rotation proof to chip level and unify selec… #1299 GKR outputs and tower-stage residency
    • update local precompile/test callsites to the new gkr_witness API
  • gkr_iop
    • extend witness generation with filtered materialization APIs for CPU/GPU backends
    • keep filtering internal to prover execution; no verifier/proof-format changes

Benchmark / Performance Impact

Primary intent is memory reduction, not throughput optimization.

Operation

Benchmark command(s):

CENO_GPU_ENABLE_WITGEN=1 cargo run --config net.git-fetch-with-cli=true --features gpu --release --package ceno_zkvm --bin e2e -- --platform=ceno --max-cycle-per-shard=1600 examples/target/riscv32im-ceno-zkvm-elf/release/examples/keccak_syscall

Ceno reth
https://github.com/scroll-tech/ceno-reth-benchmark/actions/runs/24667766891

Testing

cargo make clippy
cargo check -p ceno_zkvm --features gpu
CENO_GPU_ENABLE_WITGEN=1 cargo run --config net.git-fetch-with-cli=true --features gpu --release --package ceno_zkvm --bin e2e -- --platform=ceno --max-cycle-per-shard=1600 examples/target/riscv32im-ceno-zkvm-elf/release/examples/keccak_syscall

Risks and Rollout

  • Risk is limited to prover-side witness staging and GPU memory estimation.
  • Verifier behavior and proof format are intentionally unchanged.
  • If this regresses proving, rollback can revert the new output-stage routing and filtered witness materialization together.

Follow-ups (optional)

Copilot Reviewer Directive (keep this section)

When Copilot reviews this PR, apply .github/copilot-instructions.md strictly.

@hero78119 hero78119 changed the title follow up 1299: fix gpu build [follow up 1299]: fix gpu build Apr 20, 2026
@hero78119 hero78119 changed the title [follow up 1299]: fix gpu build [follow up 1299]: fix gpu build && optimize witness build and avoid GPU OOM Apr 20, 2026
@hero78119 hero78119 added the regression-e2e-reth trigger regression test of https://github.com/scroll-tech/ceno-reth-benchmark label Apr 20, 2026
@kunxian-xia kunxian-xia added this pull request to the merge queue Apr 21, 2026
Merged via the queue into master with commit 9cf83a0 Apr 21, 2026
4 checks passed
@kunxian-xia kunxian-xia deleted the bug/fix_gpu_build branch April 21, 2026 01:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

regression-e2e-reth trigger regression test of https://github.com/scroll-tech/ceno-reth-benchmark

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants