fix(l1): adding block by block exec as a fallback#6464
Conversation
🤖 Kimi Code ReviewThe PR implements a fallback mechanism for batch block execution failures, which is a pragmatic workaround for state cache corruption bugs. Overall the code is correct and well-structured. Issues and Suggestions:
Positive aspects:
Verdict: Approve with minor suggestions. The workaround is appropriately targeted and the refactoring improves maintainability. Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt |
🤖 Claude Code ReviewNow I have everything needed for a thorough review. Review: fix(l1): adding block by block exec as a fallbackOverall: The fix is pragmatic and correctly addresses a real production issue. The logic is sound. A few concerns worth addressing, one of them is on every hot-path call. CorrectnessThe control flow is correct:
Performance concern (noteworthy)
match blockchain
.add_blocks_in_batch(blocks.clone(), cancel_token) // clone on every call
.awaitThis clones the entire batch on every successful execution — which is the common case. For large batches (each A less expensive alternative: have // Hypothetical: add_blocks_in_batch returns blocks on Err
Err((err, batch_failure)) => {
match err {
ChainError::InvalidBlock(_) => {
warn!(...);
run_blocks_pipeline(blockchain, batch_failure.blocks).await
}
...
}
}If changing that signature is out of scope, a comment at the Implicit assumption worth documentingThe fallback works correctly only if // Safe to retry: batch mode failure due to state cache corruption
// does not persist partial state to storage.
run_blocks_pipeline(blockchain, blocks).awaitWithout this, a future maintainer fixing an unrelated Silent retry on genuinely invalid blocksWhen a block is truly invalid (not a cache false-positive), the flow is:
This doubles execution cost for genuinely invalid blocks. That's acceptable, but the warning message says "retrying batch with per-block pipeline execution" without any hint that the block might actually be invalid. If the pipeline also fails, the caller's error handling (line 221–235) will correctly mark descendants as having an invalid ancestor, so no correctness issue — just slightly misleading logging in the double-failure case. Minor:
|
| Item | Severity | Action |
|---|---|---|
Unconditional blocks.clone() on hot path |
Medium | Optimize or document |
| Implicit assumption about clean storage on retry | Low | Add inline comment |
| Silent double-execution on genuinely invalid blocks | Low | Acceptable; minor log improvement optional |
cancel_token not propagated to pipeline |
Low | Pre-existing; track separately |
The fix is correct and the production evidence (Hoodi sync passing block 443,055) validates the approach. The clone cost is the most actionable item before merging.
Automated review by Claude (Anthropic) · sonnet · custom prompt
Lines of code reportTotal lines added: Detailed view |
🤖 Codex Code Review
No other correctness issues stood out in the diff itself. I could not run Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
🤖 Kimi Code ReviewOverall Assessment: The PR implements a reasonable workaround for batch execution state corruption by falling back to single-block pipeline execution on specific post-execution errors. The logic is sound but has minor efficiency and observability considerations. Issues and Suggestions: 1. Missing Import for 2. Efficiency: Re-execution from Batch Start (Lines 288-306) 3. DoS Vector: Double Validation Cost 4. Documentation for /// Returns true for errors that arise from EVM execution...
/// Note: These errors may also indicate genuinely invalid blocks;
/// retrying them trades CPU efficiency for correctness guarantees.
5. Cancel Token Propagation 6. Clone Cost Assumption (Line 285-286) 7. Error Pattern Exhaustiveness Nitpick:
Security Note: Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt |
🤖 Codex Code ReviewFindings:
I did not run Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
Greptile SummaryThis PR adds a pipeline-execution fallback in full sync: when
Confidence Score: 4/5Safe to merge; one minor gap in the fallback coverage but no regression risk The change is well-reasoned and confirmed to work in production (hoodi sync past block 443,055). The all-or-nothing semantics of crates/networking/p2p/sync/full.rs — specifically the
|
| Filename | Overview |
|---|---|
| crates/networking/p2p/sync/full.rs | Adds a pipeline-execution fallback when batch execution fails with a post-execution error; refactors sequential block execution into run_blocks_pipeline; is_post_execution_error misses InvalidTransaction which can also arise from EVM state corruption |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[add_blocks called] --> B{sync_head_found?}
B -- yes --> C[run_blocks_pipeline]
B -- no --> D[add_blocks_in_batch\nbatch execution]
D --> E{Result?}
E -- Ok --> F[✅ Done]
E -- "Err(InvalidBlock + is_post_execution_error)" --> G["⚠️ warn + log failed block"]
G --> C
E -- "Err(other)" --> H[❌ Propagate error]
C --> I{Pipeline result?}
I -- Ok --> F
I -- Err --> H
Prompt To Fix All With AI
This is a comment left during a code review.
Path: crates/networking/p2p/sync/full.rs
Line: 327-337
Comment:
**`InvalidTransaction` missing from post-execution error filter**
`InvalidTransaction` can originate from EVM execution (via `EvmError::Transaction → ChainError::InvalidBlock(InvalidBlockError::InvalidTransaction(...))`). If batch-mode state-cache corruption causes an account's nonce or balance to appear wrong mid-batch, the block's transaction will fail with `InvalidTransaction`, but `is_post_execution_error` returns `false` for it — so the pipeline fallback never fires. The same root cause (cross-block shared state) that produces `StateRootMismatch` can also produce `InvalidTransaction`, yet only the former triggers the retry.
```suggestion
fn is_post_execution_error(err: &InvalidBlockError) -> bool {
matches!(
err,
InvalidBlockError::GasUsedMismatch(_, _)
| InvalidBlockError::StateRootMismatch
| InvalidBlockError::ReceiptsRootMismatch
| InvalidBlockError::RequestsHashMismatch
| InvalidBlockError::BlockAccessListHashMismatch
| InvalidBlockError::BlobGasUsedMismatch
| InvalidBlockError::InvalidTransaction(_)
)
}
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "Merge branch 'main' into fullsync-fallba..." | Re-trigger Greptile
| fn is_post_execution_error(err: &InvalidBlockError) -> bool { | ||
| matches!( | ||
| err, | ||
| InvalidBlockError::GasUsedMismatch(_, _) | ||
| | InvalidBlockError::StateRootMismatch | ||
| | InvalidBlockError::ReceiptsRootMismatch | ||
| | InvalidBlockError::RequestsHashMismatch | ||
| | InvalidBlockError::BlockAccessListHashMismatch | ||
| | InvalidBlockError::BlobGasUsedMismatch | ||
| ) | ||
| } |
There was a problem hiding this comment.
InvalidTransaction missing from post-execution error filter
InvalidTransaction can originate from EVM execution (via EvmError::Transaction → ChainError::InvalidBlock(InvalidBlockError::InvalidTransaction(...))). If batch-mode state-cache corruption causes an account's nonce or balance to appear wrong mid-batch, the block's transaction will fail with InvalidTransaction, but is_post_execution_error returns false for it — so the pipeline fallback never fires. The same root cause (cross-block shared state) that produces StateRootMismatch can also produce InvalidTransaction, yet only the former triggers the retry.
| fn is_post_execution_error(err: &InvalidBlockError) -> bool { | |
| matches!( | |
| err, | |
| InvalidBlockError::GasUsedMismatch(_, _) | |
| | InvalidBlockError::StateRootMismatch | |
| | InvalidBlockError::ReceiptsRootMismatch | |
| | InvalidBlockError::RequestsHashMismatch | |
| | InvalidBlockError::BlockAccessListHashMismatch | |
| | InvalidBlockError::BlobGasUsedMismatch | |
| ) | |
| } | |
| fn is_post_execution_error(err: &InvalidBlockError) -> bool { | |
| matches!( | |
| err, | |
| InvalidBlockError::GasUsedMismatch(_, _) | |
| | InvalidBlockError::StateRootMismatch | |
| | InvalidBlockError::ReceiptsRootMismatch | |
| | InvalidBlockError::RequestsHashMismatch | |
| | InvalidBlockError::BlockAccessListHashMismatch | |
| | InvalidBlockError::BlobGasUsedMismatch | |
| | InvalidBlockError::InvalidTransaction(_) | |
| ) | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/networking/p2p/sync/full.rs
Line: 327-337
Comment:
**`InvalidTransaction` missing from post-execution error filter**
`InvalidTransaction` can originate from EVM execution (via `EvmError::Transaction → ChainError::InvalidBlock(InvalidBlockError::InvalidTransaction(...))`). If batch-mode state-cache corruption causes an account's nonce or balance to appear wrong mid-batch, the block's transaction will fail with `InvalidTransaction`, but `is_post_execution_error` returns `false` for it — so the pipeline fallback never fires. The same root cause (cross-block shared state) that produces `StateRootMismatch` can also produce `InvalidTransaction`, yet only the former triggers the retry.
```suggestion
fn is_post_execution_error(err: &InvalidBlockError) -> bool {
matches!(
err,
InvalidBlockError::GasUsedMismatch(_, _)
| InvalidBlockError::StateRootMismatch
| InvalidBlockError::ReceiptsRootMismatch
| InvalidBlockError::RequestsHashMismatch
| InvalidBlockError::BlockAccessListHashMismatch
| InvalidBlockError::BlobGasUsedMismatch
| InvalidBlockError::InvalidTransaction(_)
)
}
```
How can I resolve this? If you propose a fix, please make it concise.
🤖 Claude Code ReviewNow I have all the context I need for a thorough review. PR Review:
|
| blocks: Vec<Block>, | ||
| ) -> Result<(), (ChainError, Option<BatchBlockProcessingFailure>)> { | ||
| tokio::task::spawn_blocking(move || { | ||
| let mut last_valid_hash = H256::default(); |
There was a problem hiding this comment.
If the first block fails, the 'last good' value is zero which is unintuitive.
| async fn run_blocks_pipeline( | ||
| blockchain: Arc<Blockchain>, | ||
| blocks: Vec<Block>, | ||
| ) -> Result<(), (ChainError, Option<BatchBlockProcessingFailure>)> { |
There was a problem hiding this comment.
Should also use the cancel_token
| matches!( | ||
| err, | ||
| InvalidBlockError::GasUsedMismatch(_, _) | ||
| | InvalidBlockError::StateRootMismatch | ||
| | InvalidBlockError::ReceiptsRootMismatch | ||
| | InvalidBlockError::RequestsHashMismatch | ||
| | InvalidBlockError::BlockAccessListHashMismatch | ||
| | InvalidBlockError::BlobGasUsedMismatch | ||
| ) |
There was a problem hiding this comment.
If the worry is invalid intermediate state, InvalidTransaction (which includes 'insufficient funds') could also be caused by bad state.
We detected that we were consistently failing fullsync at hoodi block 443,055 due to batching reasons. This PR adds pipelined block by block execution as a fallback when a single batch fails. After this stability fix, hoodi full sync went past that block without issues. This should improve Full sync stability in general.