feat(l1): implement prestateTracer for debug_traceTransaction and debug_traceBlockByNumber#6501
feat(l1): implement prestateTracer for debug_traceTransaction and debug_traceBlockByNumber#6501avilagaston9 wants to merge 5 commits intofeat/l1/ws-subscriptionsfrom
Conversation
…ockByNumber. This adds the prestateTracer (with optional diff mode) to the existing debug tracing infrastructure. The tracer captures pre-execution account state (balance, nonce, code, storage) and optionally computes a post-execution diff. Key changes: - PrestateTrace / PrePostState types in ethrex-common with JSON serialization - PrestateTracerObserver in ethrex-vm that records accessed accounts and storage - trace_transaction_prestate / trace_block_prestate in ethrex-blockchain - PrestateTracer variant wired into the RPC debug_trace* handlers
Lines of code reportTotal lines added: Detailed view |
…n, and serde round-trip.
Zero-pad storage values to 32 bytes (format!("0x{:064x}")) to match geth's
prestateTracer output format. Only compute post_map when diff_mode is true to
avoid unnecessary work in the common non-diff path. Add #[serde(default)] to
PrestateAccountState.storage so deserialization handles absent storage field,
fixing a round-trip inconsistency with skip_serializing_if.
… add PrestateResult enum. When a transaction accesses a new storage slot on an account already cached from a previous transaction in the same block, build_account_state_map was using only pre_snapshot storage (which lacked slots first loaded during the current tx). Now merges original values from initial_accounts_state for newly-loaded slots so the pre-state output includes every accessed slot. Replace the (PrestateTrace, Option<PrePostState>) return type with a PrestateResult enum (Prestate | Diff) across the tracing stack for a self-documenting API that avoids an unused empty map allocation. Add regression tests for both non-diff and diff modes that verify newly-accessed storage slots appear with their original pre-tx values.
…e tests to test crate. Replace String fields in PrestateAccountState with U256 (balance), Bytes (code), HashMap<H256, H256> (storage), and Address map keys — matching how the rest of the codebase represents these values. Serde handles hex formatting at the serialization boundary instead of manual format!() calls in business logic. Split build_account_state_map (which used a use_pre: bool flag) into find_touched_accounts, build_pre_state_map, and build_post_state_map so each function has a single responsibility. De-duplicate the pre_map computation that was repeated in both branches of trace_tx_prestate. Extract the duplicated TestDatabase into a shared test/tests/levm/test_db.rs module, update l2_hook_tests to import it, and move the prestate tracer regression tests to test/tests/levm/prestate_tracer_tests.rs. Remove serde round-trip tests from crates/common/tracing.rs. Drop hex and once_cell dependencies from ethrex-vm.
…once=0 in serialization, and gate dead code behind l2 feature. For already-cached accounts, build_pre_state_map and build_post_state_map now filter storage to only slots actually touched by the current transaction (newly loaded or value changed), instead of including all cached slots from previous transactions. Added skip_serializing_if for nonce=0 to match Geth's output format. Gated read_env_file_by_config behind cfg(feature = "l2") to fix a dead-code warning. Added tests for the storage filtering fix and for CREATE accounts appearing correctly in diff mode.
🤖 Codex Code Review
I couldn’t run the targeted tests in this environment: the pinned toolchain wants to write under Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
🤖 Claude Code ReviewNow I have a complete picture of the PR. Let me write up the review. PR #6501 —
|
Greptile SummaryThis PR implements the There are three Geth-compatibility gaps worth tracking for future work: read-only accessed accounts are excluded from the non-diff prestate, self-destructed accounts are invisible to Confidence Score: 5/5Safe to merge — no P0/P1 issues; all findings are Geth-compatibility gaps that don't affect the primary diff-mode use case. All three findings are P2: divergences from Geth's exact prestateTracer semantics (read-only account exclusion, self-destruct invisibility, SLOAD-only false-positive in post-state). The stated primary consumer (Credible Layer in diff mode) is covered by the existing tests. The orchestration and type infrastructure are correct and mirror established patterns in the codebase. crates/vm/backends/levm/tracing.rs — find_touched_accounts and build_post_state_map have the Geth-compatibility gaps noted above.
|
| Filename | Overview |
|---|---|
| crates/vm/backends/levm/tracing.rs | Core prestate tracing logic; correctly handles the newly-loaded-slot merge case (the main regression target), but diverges from Geth: read-only accounts are excluded, self-destructed accounts may be missed, and diff-mode post-state can contain false-positive storage entries for SLOAD-only accounts. |
| crates/common/tracing.rs | Adds PrestateAccountState, PrestateTrace, PrestateResult, and PrePostState types with appropriate serde attributes; nonce/code/storage omit-when-empty logic is correct; balance always serializes (minor Geth format divergence). |
| crates/blockchain/tracing.rs | Adds trace_transaction_prestate and trace_block_prestate; sequencing (rerun_block then trace_tx_prestate) is correct; Arc/Mutex pattern mirrors existing trace_block_calls correctly. |
| crates/networking/rpc/tracing.rs | Wires PrestateTracer variant into debug_traceTransaction and debug_traceBlockByNumber; serde(rename_all = "camelCase") on TracerType correctly maps PrestateTracer to the "prestateTracer" JSON string. |
| crates/vm/tracing.rs | Thin wrapper that delegates trace_tx_prestate to LEVM; straightforward and mirrors the existing trace_tx_calls pattern. |
| test/tests/levm/prestate_tracer_tests.rs | Four targeted tests covering the key regression (newly-accessed storage in subsequent txs), exclusion of prior-tx slots, diff mode, and CREATE-child tracking; good coverage of the main cases. |
| test/tests/levm/test_db.rs | Shared TestDatabase extracted cleanly from l2_hook_tests; no issues. |
| test/tests/levm/l2_hook_tests.rs | Refactored to use shared TestDatabase; unused imports removed, logic unchanged. |
| test/tests/l2/utils.rs | read_env_file_by_config gated behind #[cfg(feature = "l2")] and File/BufRead imports moved inside; reasonable change to prevent unused-import warnings on non-l2 builds. |
| test/tests/levm/mod.rs | Adds test_db and prestate_tracer_tests modules; no issues. |
Sequence Diagram
sequenceDiagram
participant Client
participant RPC as RPC Handler
participant BC as Blockchain
participant EVM as Evm
participant LEVM as LEVM
Client->>RPC: debug_traceTransaction { tracer: prestateTracer, diffMode }
RPC->>BC: trace_transaction_prestate(tx_hash, reexec, timeout, diff_mode)
BC->>BC: rebuild_parent_state(parent_hash, reexec)
BC->>EVM: rerun_block(block, Some(tx_index))
note over EVM: executes txs 0..tx_index-1 to set up state
BC->>EVM: trace_tx_prestate(block, tx_index, diff_mode)
EVM->>LEVM: trace_tx_prestate(db, header, tx, diff_mode)
note over LEVM: snapshot pre_snapshot = db.current_accounts_state
LEVM->>LEVM: VM::new + execute() updates db in place
LEVM->>LEVM: find_touched_accounts(pre_snapshot, post_cache, db)
LEVM->>LEVM: build_pre_state_map(...)
alt diff_mode
LEVM->>LEVM: build_post_state_map(...)
LEVM-->>EVM: PrestateResult::Diff(PrePostState)
else non-diff
LEVM-->>EVM: PrestateResult::Prestate(PrestateTrace)
end
EVM-->>BC: PrestateResult
BC-->>RPC: PrestateResult
RPC-->>Client: serialized trace or diff
Prompt To Fix All With AI
This is a comment left during a code review.
Path: crates/vm/backends/levm/tracing.rs
Line: 139-163
Comment:
**Non-diff mode omits read-only accessed accounts**
`find_touched_accounts` skips any account where `pre == post` (line 142–144), so accounts that were only READ (not modified) during execution are silently dropped. Geth's `prestateTracer` in non-diff mode captures the pre-state of every account touched by execution — including call targets that returned read-only results (e.g. price oracles, ERC-20 `balanceOf`). Callers comparing output with Geth for non-diff mode will see incomplete results.
The same `continue` makes diff mode exclude read-only accounts too, which is actually the correct Geth behaviour for `post` — but it also means their pre-state is absent, which could be relevant for tools that need context about accessed-but-unmodified accounts.
For the stated primary use case (diff mode for Credible Layer) this doesn't block anything, but it's worth noting the divergence from the Geth spec.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: crates/vm/backends/levm/tracing.rs
Line: 139-163
Comment:
**Self-destructed accounts will be missing from prestate output**
`find_touched_accounts` only iterates over `post_cache` (`current_accounts_state` after execution). If LEVM removes or zeroes-out a self-destructed account from `current_accounts_state`, that account won't be in the iteration set and its pre-state (which existed before the tx) will silently be absent from both diff and non-diff output. Geth's prestateTracer includes self-destructed accounts in `pre` with their original state.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: crates/vm/backends/levm/tracing.rs
Line: 237-261
Comment:
**Diff-mode post-state may include unchanged storage for newly-accessed accounts**
For an account first accessed in this tx (absent from `pre_snapshot`), `build_post_state_map` skips the `pre_cached` retain block (line 249). That means every storage slot loaded during execution ends up in `post`, even slots that were only SLOADed (not SSTOREd). The resulting diff would show `pre: {}` (empty — from `initial_accounts_state` with empty storage) and `post: {slotK: original_value}`, suggesting the storage changed when it was merely read.
In Geth's diff mode, post-state only contains slots whose values actually changed. Consumers of the diff (e.g. the Credible Layer sidecar) would need to handle these false positives.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: crates/common/tracing.rs
Line: 74-86
Comment:
**`balance` always serialized even for zero-balance accounts**
Geth omits `balance` when it is zero (the field simply isn't emitted). The struct currently has no `skip_serializing_if` guard on `balance`, so zero-balance accounts will always include `"balance":"0x0"` in the output. Low impact, but diverges slightly from Geth's wire format. Consider:
```suggestion
#[serde(skip_serializing_if = "U256::is_zero")]
pub balance: U256,
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "Fix prestateTracer to exclude storage sl..." | Re-trigger Greptile
| for (addr, post_account) in post_cache { | ||
| let pre_account = match pre_snapshot.get(addr) { | ||
| Some(pre) => { | ||
| if pre.info == post_account.info && pre.storage == post_account.storage { | ||
| continue; | ||
| } | ||
| pre | ||
| } | ||
| None => { | ||
| // Account was first loaded during this tx. | ||
| // Pre-state comes from initial_accounts_state (the pristine DB-loaded value). | ||
| let Some(initial) = db.initial_accounts_state.get(addr) else { | ||
| continue; | ||
| }; | ||
| if initial.info == post_account.info && initial.storage == post_account.storage { | ||
| continue; | ||
| } | ||
| initial | ||
| } | ||
| }; | ||
|
|
||
| touched.push((*addr, pre_account, post_account)); | ||
| } | ||
|
|
||
| touched |
There was a problem hiding this comment.
Non-diff mode omits read-only accessed accounts
find_touched_accounts skips any account where pre == post (line 142–144), so accounts that were only READ (not modified) during execution are silently dropped. Geth's prestateTracer in non-diff mode captures the pre-state of every account touched by execution — including call targets that returned read-only results (e.g. price oracles, ERC-20 balanceOf). Callers comparing output with Geth for non-diff mode will see incomplete results.
The same continue makes diff mode exclude read-only accounts too, which is actually the correct Geth behaviour for post — but it also means their pre-state is absent, which could be relevant for tools that need context about accessed-but-unmodified accounts.
For the stated primary use case (diff mode for Credible Layer) this doesn't block anything, but it's worth noting the divergence from the Geth spec.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/vm/backends/levm/tracing.rs
Line: 139-163
Comment:
**Non-diff mode omits read-only accessed accounts**
`find_touched_accounts` skips any account where `pre == post` (line 142–144), so accounts that were only READ (not modified) during execution are silently dropped. Geth's `prestateTracer` in non-diff mode captures the pre-state of every account touched by execution — including call targets that returned read-only results (e.g. price oracles, ERC-20 `balanceOf`). Callers comparing output with Geth for non-diff mode will see incomplete results.
The same `continue` makes diff mode exclude read-only accounts too, which is actually the correct Geth behaviour for `post` — but it also means their pre-state is absent, which could be relevant for tools that need context about accessed-but-unmodified accounts.
For the stated primary use case (diff mode for Credible Layer) this doesn't block anything, but it's worth noting the divergence from the Geth spec.
How can I resolve this? If you propose a fix, please make it concise.| for (addr, post_account) in post_cache { | ||
| let pre_account = match pre_snapshot.get(addr) { | ||
| Some(pre) => { | ||
| if pre.info == post_account.info && pre.storage == post_account.storage { | ||
| continue; | ||
| } | ||
| pre | ||
| } | ||
| None => { | ||
| // Account was first loaded during this tx. | ||
| // Pre-state comes from initial_accounts_state (the pristine DB-loaded value). | ||
| let Some(initial) = db.initial_accounts_state.get(addr) else { | ||
| continue; | ||
| }; | ||
| if initial.info == post_account.info && initial.storage == post_account.storage { | ||
| continue; | ||
| } | ||
| initial | ||
| } | ||
| }; | ||
|
|
||
| touched.push((*addr, pre_account, post_account)); | ||
| } | ||
|
|
||
| touched |
There was a problem hiding this comment.
Self-destructed accounts will be missing from prestate output
find_touched_accounts only iterates over post_cache (current_accounts_state after execution). If LEVM removes or zeroes-out a self-destructed account from current_accounts_state, that account won't be in the iteration set and its pre-state (which existed before the tx) will silently be absent from both diff and non-diff output. Geth's prestateTracer includes self-destructed accounts in pre with their original state.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/vm/backends/levm/tracing.rs
Line: 139-163
Comment:
**Self-destructed accounts will be missing from prestate output**
`find_touched_accounts` only iterates over `post_cache` (`current_accounts_state` after execution). If LEVM removes or zeroes-out a self-destructed account from `current_accounts_state`, that account won't be in the iteration set and its pre-state (which existed before the tx) will silently be absent from both diff and non-diff output. Geth's prestateTracer includes self-destructed accounts in `pre` with their original state.
How can I resolve this? If you propose a fix, please make it concise.| /// Build the post-tx state map for all accounts touched by a transaction. | ||
| fn build_post_state_map( | ||
| pre_snapshot: &CacheDB, | ||
| post_cache: &CacheDB, | ||
| db: &GeneralizedDatabase, | ||
| ) -> PrestateTrace { | ||
| let mut result = PrestateTrace::new(); | ||
|
|
||
| for (addr, _, post_account) in find_touched_accounts(pre_snapshot, post_cache, db) { | ||
| let mut state = build_account_output(post_account, db); | ||
|
|
||
| // For already-cached accounts, filter to only slots touched by this tx. | ||
| if let Some(pre_cached) = pre_snapshot.get(&addr) { | ||
| state.storage.retain(|k, _| { | ||
| if !pre_cached.storage.contains_key(k) { | ||
| return true; | ||
| } | ||
| pre_cached.storage.get(k) != post_account.storage.get(k) | ||
| }); | ||
| } | ||
|
|
||
| result.insert(addr, state); | ||
| } | ||
|
|
||
| result |
There was a problem hiding this comment.
Diff-mode post-state may include unchanged storage for newly-accessed accounts
For an account first accessed in this tx (absent from pre_snapshot), build_post_state_map skips the pre_cached retain block (line 249). That means every storage slot loaded during execution ends up in post, even slots that were only SLOADed (not SSTOREd). The resulting diff would show pre: {} (empty — from initial_accounts_state with empty storage) and post: {slotK: original_value}, suggesting the storage changed when it was merely read.
In Geth's diff mode, post-state only contains slots whose values actually changed. Consumers of the diff (e.g. the Credible Layer sidecar) would need to handle these false positives.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/vm/backends/levm/tracing.rs
Line: 237-261
Comment:
**Diff-mode post-state may include unchanged storage for newly-accessed accounts**
For an account first accessed in this tx (absent from `pre_snapshot`), `build_post_state_map` skips the `pre_cached` retain block (line 249). That means every storage slot loaded during execution ends up in `post`, even slots that were only SLOADed (not SSTOREd). The resulting diff would show `pre: {}` (empty — from `initial_accounts_state` with empty storage) and `post: {slotK: original_value}`, suggesting the storage changed when it was merely read.
In Geth's diff mode, post-state only contains slots whose values actually changed. Consumers of the diff (e.g. the Credible Layer sidecar) would need to handle these false positives.
How can I resolve this? If you propose a fix, please make it concise.| pub struct PrestateAccountState { | ||
| pub balance: U256, | ||
| #[serde(default, skip_serializing_if = "is_zero_nonce")] | ||
| pub nonce: u64, | ||
| #[serde( | ||
| default, | ||
| skip_serializing_if = "Bytes::is_empty", | ||
| with = "crate::serde_utils::bytes" | ||
| )] | ||
| pub code: Bytes, | ||
| #[serde(default, skip_serializing_if = "HashMap::is_empty")] | ||
| pub storage: HashMap<H256, H256>, | ||
| } |
There was a problem hiding this comment.
balance always serialized even for zero-balance accounts
Geth omits balance when it is zero (the field simply isn't emitted). The struct currently has no skip_serializing_if guard on balance, so zero-balance accounts will always include "balance":"0x0" in the output. Low impact, but diverges slightly from Geth's wire format. Consider:
| pub struct PrestateAccountState { | |
| pub balance: U256, | |
| #[serde(default, skip_serializing_if = "is_zero_nonce")] | |
| pub nonce: u64, | |
| #[serde( | |
| default, | |
| skip_serializing_if = "Bytes::is_empty", | |
| with = "crate::serde_utils::bytes" | |
| )] | |
| pub code: Bytes, | |
| #[serde(default, skip_serializing_if = "HashMap::is_empty")] | |
| pub storage: HashMap<H256, H256>, | |
| } | |
| #[serde(skip_serializing_if = "U256::is_zero")] | |
| pub balance: U256, |
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/common/tracing.rs
Line: 74-86
Comment:
**`balance` always serialized even for zero-balance accounts**
Geth omits `balance` when it is zero (the field simply isn't emitted). The struct currently has no `skip_serializing_if` guard on `balance`, so zero-balance accounts will always include `"balance":"0x0"` in the output. Low impact, but diverges slightly from Geth's wire format. Consider:
```suggestion
#[serde(skip_serializing_if = "U256::is_zero")]
pub balance: U256,
```
How can I resolve this? If you propose a fix, please make it concise.
Motivation
The
prestateTraceris a standard Ethereum debug tracer that captures pre-execution account state (balance, nonce, code, storage) and optionally computes a post-execution diff. It's needed by external tools like the Credible Layer sidecar, which usesdebug_traceBlockByNumberwithprestateTracerin diff mode to build its local state database.Description
Implement the
prestateTracer(with optionaldiffMode) across the tracing stack:ethrex-commonwith serde formatting to match Geth's output format.trace_tx_prestateinethrex-vmsnapshots theGeneralizedDatabasecache before executing a transaction, then compares against the post-execution cache to identify touched accounts. For already-cached accounts whose new storage slots were loaded during the tx, it merges values frominitial_accounts_stateto capture original slot values.trace_transaction_prestate/trace_block_prestateinethrex-blockchainrebuild parent state and orchestrate per-tx tracing with timeouts.PrestateTracervariant wired into the RPCdebug_traceTransactionanddebug_traceBlockByNumberhandlers withPrestateTracerConfig { diffMode }.Also extracts the duplicated
TestDatabaseinto a sharedtest/tests/levm/test_db.rsmodule reused by bothl2_hook_testsand the newprestate_tracer_tests.How to Test