Conversation
🤖 Kimi Code ReviewThis PR adds comprehensive Prometheus metrics for full sync operations. The implementation is generally well-structured and follows Rust best practices. Here are my observations: Critical IssuesNone found. This is a metrics-only change that doesn't modify consensus-critical logic. Potential Issues1. Integer Overflow Risk (Low severity, technical debt)File: The // Line 229
self.target_block.set(block.cast_signed());Suggestion: Use saturating conversion or explicit bounds checking: self.target_block.set(block.try_into().unwrap_or(i64::MAX));2. Missing Block Hash Debug InfoFile: The new timing log replaces the previous log that included first/last block hashes. While cleaner, this removes valuable debugging context when execution fails. Suggestion: Keep block hashes in the log or log them separately at debug level: debug!(
"Batch range: {:#?} to {:#?}",
blocks.first().map(|b| b.hash()),
blocks.last().map(|b| b.hash())
);3. Magic Numbers for Sync StagesFile: Stage values (0=idle, 1=headers, etc.) are documented but not enforced via types. Suggestion: Define an enum for type safety (conversion to i64 is trivial): #[repr(i64)]
enum SyncStage { Idle = 0, DownloadingHeaders = 1, ... }Code Quality Suggestions4. Macro HygieneFile: The 5. Registry AllocationFile:
Suggestion: Consider registering metrics once at startup and storing the registry in the struct, or document that this is intentionally ephemeral to avoid global registry state. 6. Timestamp PrecisionFile: Using Positive Observations
SummaryThe PR is ready to merge after addressing the overflow risk (Item 1). The metrics provide valuable observability into sync performance bottlenecks (execution vs. merkleization vs. storage) which will help operators diagnose node performance issues. Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt |
Lines of code reportTotal lines added: Detailed view |
🤖 Claude Code ReviewNow I have enough information for a thorough review. Here it is: Review:
|
🤖 Codex Code ReviewFindings
No consensus, EVM, trie, or validation logic changed in this PR, so I did not find blockchain-correctness or security issues beyond these metrics-path regressions. I could not run Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
…6486) ## Summary - The Hive consume-engine Amsterdam tests for EIP-7778 and EIP-8037 were failing because ethrex's per-tx gas limit checks were incompatible with Amsterdam's new gas accounting rules. - **EIP-7778** uses pre-refund gas for block accounting, so cumulative pre-refund gas can exceed the block gas limit even when a block builder correctly included all transactions. - **EIP-8037** introduces 2D gas accounting (`block_gas = max(regular, state)`), meaning cumulative total gas (regular + state) can legally exceed the block gas limit. - The fix skips the per-tx cumulative gas check for Amsterdam and adds a **post-execution** block-level overflow check using `max(sum_regular, sum_state)` in all three execution paths (sequential, pipeline, parallel). ## Local test results - **200/201** EIP-7778 + EIP-8037 Hive consume-engine tests pass - **105/105** EIP-7778 + EIP-8037 EF blockchain tests pass (4 + 101) - The single remaining Hive failure (`test_block_regular_gas_limit[exceed=True]`) expects `TransactionException.GAS_ALLOWANCE_EXCEEDED` but we return `BlockException.GAS_USED_OVERFLOW` — the block is correctly rejected, just with a different error classification. ## Test plan - [x] All EIP-7778 EF blockchain tests pass locally - [x] All EIP-8037 EF blockchain tests pass locally - [x] 200/201 Hive consume-engine Amsterdam tests pass locally - [ ] Full CI Amsterdam Hive suite passes --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
🤖 Claude Code ReviewHere is my review of PR #6466: PR #6466 —
|
🤖 Kimi Code ReviewThis PR adds comprehensive metrics for full sync operations. The implementation is generally sound, but I have several specific observations: Issues Found1. Unused
|
🤖 Codex Code Review
No consensus, EVM, gas-accounting, or state-transition correctness issues stood out in the diff itself; the problems I found are observability/diagnostics. I wasn’t able to run Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
Greptile SummaryThis PR adds Prometheus metrics and a Grafana dashboard for the full sync process, tracking stage transitions, header/body download rates, block execution progress, and per-phase batch timing (EVM execution, merkle, storage).
Confidence Score: 4/5Safe to merge after fixing the body_failures counter — the rest are P2 improvements. One P1 defect: the crates/networking/p2p/sync/full.rs (body_failures never incremented), crates/blockchain/blockchain.rs (unconditional timing + missing transitions metric)
|
| Filename | Overview |
|---|---|
| crates/blockchain/metrics/fullsync.rs | New Prometheus metrics struct for full sync; comprehensive coverage of stage, progress, rates, and batch timings, but missing a batch_transitions_ms gauge that is measured in the calling code. |
| crates/blockchain/blockchain.rs | Adds per-phase timing instrumentation and metrics emission; timing variables and [FULLSYNC TIMING] log are unconditional (not gated by metrics feature), and transitions_ms is logged but never emitted as a Prometheus metric. |
| crates/networking/p2p/sync/full.rs | Wires up sync-cycle metrics (stage transitions, header/body rates, batch timing); body_failures counter is exposed but never incremented on body download errors. |
| crates/blockchain/metrics/api.rs | Registers METRICS_FULLSYNC in the Prometheus scrape endpoint; follows the same pattern as existing metrics modules. |
| crates/blockchain/metrics/mod.rs | Adds fullsync submodule gated by api or metrics feature, consistent with existing modules. |
| metrics/provisioning/grafana/dashboards/common_dashboards/fullsync_dashboard.json | New Grafana dashboard with stage, progress, header/body/execution panels; has a hardcoded production instance in the template default value. |
Sequence Diagram
sequenceDiagram
participant SC as sync_cycle_full
participant AB as add_blocks_in_batch
participant BC as execute_blocks_batch
participant ME as METRICS_FULLSYNC
SC->>ME: reset_cycle(), set_stage(1), inc_cycles_started()
loop Header download batches
SC->>ME: set_target_block, set_lowest_header, inc_headers_downloaded
end
SC->>ME: set_blocks_total(), set_stage(2), set_execution_stage_start_now()
loop Execution batches
SC->>ME: set_batch_body_download_ms, set_bodies_per_second, set_stage(3)
SC->>AB: add_blocks_in_batch()
AB->>BC: execute_blocks_batch()
BC->>ME: set_batch_execution_ms (exec_ms)
Note over BC,ME: transitions_ms measured but NOT set in metrics
BC->>ME: set_batch_merkle_ms, set_batch_store_ms
AB->>ME: set_blocks_executed, set_blocks_per_second, set_batch_total_ms, set_batch_size
end
SC->>ME: set_stage(0), inc_cycles_completed()
Comments Outside Diff (1)
-
metrics/provisioning/grafana/dashboards/common_dashboards/fullsync_dashboard.json, line 980-987 (link)Hardcoded production instance in dashboard template variable
The
currentfield of theinstancetemplate variable is hardcoded to"ethrex-mainnet-8:3701". Anyone importing this dashboard on a different environment will see that specific production target pre-selected until they manually change it.Prompt To Fix With AI
This is a comment left during a code review. Path: metrics/provisioning/grafana/dashboards/common_dashboards/fullsync_dashboard.json Line: 980-987 Comment: **Hardcoded production instance in dashboard template variable** The `current` field of the `instance` template variable is hardcoded to `"ethrex-mainnet-8:3701"`. Anyone importing this dashboard on a different environment will see that specific production target pre-selected until they manually change it. How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: crates/networking/p2p/sync/full.rs
Line: 170-175
Comment:
**`body_failures` metric never incremented**
`inc_body_failures()` is defined, exposed via Prometheus, and shown in the Grafana dashboard, but it is never called anywhere in the PR. When `request_block_bodies` returns `None` or an error, the `?` propagates immediately and the counter stays at 0. Operators will see a perpetually-zero `fullsync_body_failures` gauge even during body download problems.
```suggestion
let bodies = peers
.request_block_bodies(header_batch)
.await
.inspect_err(|_| fullsync_metrics!(METRICS_FULLSYNC.inc_body_failures()))
?
.ok_or_else(|| {
fullsync_metrics!(METRICS_FULLSYNC.inc_body_failures());
SyncError::BodiesNotFound
})?;
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: crates/blockchain/blockchain.rs
Line: 2190-2293
Comment:
**Timing instrumentation always compiled regardless of `metrics` feature**
All four timing `Instant`s (`exec_start`, `transitions_start`, `merkle_start`, `store_start`) and the `[FULLSYNC TIMING]` `info!` log are unconditional — they run even when the `metrics` feature is disabled. The existing `metrics!` macro is specifically designed to gate this overhead. Similarly, `body_download_start` and its `[FULLSYNC TIMING]` info log in `full.rs` fire unconditionally. On a busy node this adds one `info!` line per batch and per body-download round, which can be verbose in non-metrics deployments.
Consider wrapping both the timing variables and the log under `metrics!(...)` / `fullsync_metrics!(...)`, or demoting the log to `debug!`.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: crates/blockchain/blockchain.rs
Line: 2235-2238
Comment:
**`transitions_ms` measured and logged but not exposed as a Prometheus metric**
`transitions_ms` (time spent in `vm.get_state_transitions()`) is included in the `[FULLSYNC TIMING]` log and clearly matters for profiling, but there is no corresponding `batch_transitions_ms` gauge in `MetricsFullSync`. The Grafana "Batch Time Breakdown" stacked-bar chart sums `body_download + execution + merkle + store`, so it will consistently under-report total batch time by the transitions duration. This gap will be invisible to operators using the dashboard.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: metrics/provisioning/grafana/dashboards/common_dashboards/fullsync_dashboard.json
Line: 980-987
Comment:
**Hardcoded production instance in dashboard template variable**
The `current` field of the `instance` template variable is hardcoded to `"ethrex-mainnet-8:3701"`. Anyone importing this dashboard on a different environment will see that specific production target pre-selected until they manually change it.
```suggestion
"current": {},
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "Merge branch 'main' into fullsync-metric..." | Re-trigger Greptile
| while !headers.is_empty() { | ||
| let header_batch = &headers[..min(MAX_BLOCK_BODIES_TO_REQUEST, headers.len())]; | ||
| let bodies = peers | ||
| .request_block_bodies(header_batch) | ||
| .await? | ||
| .ok_or(SyncError::BodiesNotFound)?; |
There was a problem hiding this comment.
body_failures metric never incremented
inc_body_failures() is defined, exposed via Prometheus, and shown in the Grafana dashboard, but it is never called anywhere in the PR. When request_block_bodies returns None or an error, the ? propagates immediately and the counter stays at 0. Operators will see a perpetually-zero fullsync_body_failures gauge even during body download problems.
| while !headers.is_empty() { | |
| let header_batch = &headers[..min(MAX_BLOCK_BODIES_TO_REQUEST, headers.len())]; | |
| let bodies = peers | |
| .request_block_bodies(header_batch) | |
| .await? | |
| .ok_or(SyncError::BodiesNotFound)?; | |
| let bodies = peers | |
| .request_block_bodies(header_batch) | |
| .await | |
| .inspect_err(|_| fullsync_metrics!(METRICS_FULLSYNC.inc_body_failures())) | |
| ? | |
| .ok_or_else(|| { | |
| fullsync_metrics!(METRICS_FULLSYNC.inc_body_failures()); | |
| SyncError::BodiesNotFound | |
| })?; |
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/networking/p2p/sync/full.rs
Line: 170-175
Comment:
**`body_failures` metric never incremented**
`inc_body_failures()` is defined, exposed via Prometheus, and shown in the Grafana dashboard, but it is never called anywhere in the PR. When `request_block_bodies` returns `None` or an error, the `?` propagates immediately and the counter stays at 0. Operators will see a perpetually-zero `fullsync_body_failures` gauge even during body download problems.
```suggestion
let bodies = peers
.request_block_bodies(header_batch)
.await
.inspect_err(|_| fullsync_metrics!(METRICS_FULLSYNC.inc_body_failures()))
?
.ok_or_else(|| {
fullsync_metrics!(METRICS_FULLSYNC.inc_body_failures());
SyncError::BodiesNotFound
})?;
```
How can I resolve this? If you propose a fix, please make it concise.| @@ -2232,9 +2235,13 @@ impl Blockchain { | |||
| tokio::task::yield_now().await; | |||
| } | |||
|
|
|||
| let exec_ms = exec_start.elapsed().as_millis(); | |||
|
|
|||
| let transitions_start = std::time::Instant::now(); | |||
| let account_updates = vm | |||
| .get_state_transitions() | |||
| .map_err(|err| (ChainError::EvmError(err), None))?; | |||
| let transitions_ms = transitions_start.elapsed().as_millis(); | |||
|
|
|||
| let last_block = blocks | |||
| .last() | |||
| @@ -2244,11 +2251,13 @@ impl Blockchain { | |||
| let last_block_gas_limit = last_block.header.gas_limit; | |||
|
|
|||
| // Apply the account updates over all blocks and compute the new state root | |||
| let merkle_start = std::time::Instant::now(); | |||
| let account_updates_list = self | |||
| .storage | |||
| .apply_account_updates_batch(first_block_header.parent_hash, &account_updates) | |||
| .map_err(|e| (e.into(), None))? | |||
| .ok_or((ChainError::ParentStateNotFound, None))?; | |||
| let merkle_ms = merkle_start.elapsed().as_millis(); | |||
|
|
|||
| let new_state_root = account_updates_list.state_trie_hash; | |||
| let state_updates = account_updates_list.state_updates; | |||
| @@ -2258,6 +2267,7 @@ impl Blockchain { | |||
| // Check state root matches the one in block header | |||
| validate_state_root(&last_block.header, new_state_root).map_err(|e| (e, None))?; | |||
|
|
|||
| let store_start = std::time::Instant::now(); | |||
| let update_batch = UpdateBatch { | |||
| account_updates: state_updates, | |||
| storage_updates: accounts_updates, | |||
| @@ -2270,6 +2280,17 @@ impl Blockchain { | |||
| self.storage | |||
| .store_block_updates(update_batch) | |||
| .map_err(|e| (e.into(), None))?; | |||
| let store_ms = store_start.elapsed().as_millis(); | |||
|
|
|||
| info!( | |||
| "[FULLSYNC TIMING] Batch {}: exec={}ms, transitions={}ms, merkle={}ms, store={}ms, total={}ms", | |||
| blocks_len, | |||
| exec_ms, | |||
| transitions_ms, | |||
| merkle_ms, | |||
| store_ms, | |||
| interval.elapsed().as_millis() | |||
| ); | |||
There was a problem hiding this comment.
Timing instrumentation always compiled regardless of
metrics feature
All four timing Instants (exec_start, transitions_start, merkle_start, store_start) and the [FULLSYNC TIMING] info! log are unconditional — they run even when the metrics feature is disabled. The existing metrics! macro is specifically designed to gate this overhead. Similarly, body_download_start and its [FULLSYNC TIMING] info log in full.rs fire unconditionally. On a busy node this adds one info! line per batch and per body-download round, which can be verbose in non-metrics deployments.
Consider wrapping both the timing variables and the log under metrics!(...) / fullsync_metrics!(...), or demoting the log to debug!.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/blockchain.rs
Line: 2190-2293
Comment:
**Timing instrumentation always compiled regardless of `metrics` feature**
All four timing `Instant`s (`exec_start`, `transitions_start`, `merkle_start`, `store_start`) and the `[FULLSYNC TIMING]` `info!` log are unconditional — they run even when the `metrics` feature is disabled. The existing `metrics!` macro is specifically designed to gate this overhead. Similarly, `body_download_start` and its `[FULLSYNC TIMING]` info log in `full.rs` fire unconditionally. On a busy node this adds one `info!` line per batch and per body-download round, which can be verbose in non-metrics deployments.
Consider wrapping both the timing variables and the log under `metrics!(...)` / `fullsync_metrics!(...)`, or demoting the log to `debug!`.
How can I resolve this? If you propose a fix, please make it concise.| tokio::task::yield_now().await; | ||
| } | ||
|
|
||
| let exec_ms = exec_start.elapsed().as_millis(); |
There was a problem hiding this comment.
transitions_ms measured and logged but not exposed as a Prometheus metric
transitions_ms (time spent in vm.get_state_transitions()) is included in the [FULLSYNC TIMING] log and clearly matters for profiling, but there is no corresponding batch_transitions_ms gauge in MetricsFullSync. The Grafana "Batch Time Breakdown" stacked-bar chart sums body_download + execution + merkle + store, so it will consistently under-report total batch time by the transitions duration. This gap will be invisible to operators using the dashboard.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/blockchain/blockchain.rs
Line: 2235-2238
Comment:
**`transitions_ms` measured and logged but not exposed as a Prometheus metric**
`transitions_ms` (time spent in `vm.get_state_transitions()`) is included in the `[FULLSYNC TIMING]` log and clearly matters for profiling, but there is no corresponding `batch_transitions_ms` gauge in `MetricsFullSync`. The Grafana "Batch Time Breakdown" stacked-bar chart sums `body_download + execution + merkle + store`, so it will consistently under-report total batch time by the transitions duration. This gap will be invisible to operators using the dashboard.
How can I resolve this? If you propose a fix, please make it concise.
Add metrics for full sync so that they can be seen in a grafana dashboard.