Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
147 changes: 147 additions & 0 deletions DB_OPTIMIZATION_PRIORITIES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# Database Optimization Priorities

This document categorizes all DB-related pending items from the roadmap based on whether they require database resyncing and their potential performance impact.

## Items Requiring Schema Modification (Require Resyncing)

These items modify existing database tables/schema and require a full resync:

| Section | Item | Priority | Description |
|---------|------|----------|-------------|
| IO | **Canonical tx index** | 1 | Add a canonical-tx index table or DUPSORT layout for O(1) lookups (currently O(k) prefix scans) |
| IO | **Split hot vs cold data** | 2 | Geth "freezer/ancients" pattern - store recent state in fast KV store, push old bodies/receipts to append-only ancient store |
| New Features | **Archive node** | — | Allow archive node mode - changes storage requirements/schema |
| New Features | **Pre merge blocks** | — | Be able to process pre merge blocks - requires schema changes to support different block formats |

---

## Items NOT Requiring Schema Modification (No Resync)

These items are optimizations and configurations that don't require resyncing, sorted by potential performance impact:

### High Impact (Most Likely to Improve Performance)

1. **Add Block Cache (RocksDB)** (#5935, P0)
- Currently relying only on OS page cache. Explicit block cache is fundamental for RocksDB performance
- Also try row cache
- **Why high impact**: Block cache is one of the most important RocksDB features for read performance

2. **Use Two-Level Index (RocksDB)** (#5936, P0)
- Use Two-Level Index with Partitioned Filters
- **Why high impact**: Significantly reduces memory overhead and improves cache efficiency for large datasets

3. **Use multiget on trie traversal** (#4949, P1)
- Using multiget on trie traversal might reduce read time
- **Why high impact**: Batching reads during trie traversal can dramatically reduce I/O latency

4. **Bulk reads for block bodies** (P1)
- Implement `multi_get` for `get_block_bodies` and `get_block_bodies_by_hash` which currently loop over per-key reads
- Location: `crates/storage/store.rs:388-454`
- **Why high impact**: Substantial improvement for batch operations, reduces round-trips

5. **Enable unordered writes for State (RocksDB)** (#5937, P0)
- For `ACCOUNT_TRIE_NODES, STORAGE_TRIE_NODES cf_opts.set_unordered_write(true);`
- Faster writes when we don't need strict ordering
- **Why high impact**: Can significantly speed up write-heavy operations during sync

6. **Toggle compaction during sync** (P2)
- Disable RocksDB compaction during snap sync for higher write throughput, then compact after
- Nethermind pattern: Wire `disable_compaction/enable_compaction` into sync stages
- **Why high impact**: Proven pattern from other clients, can dramatically improve sync performance

7. **Memory-Mapped Reads (RocksDB)** (#5943, P0)
- Can be an improvement on high-RAM systems
- **Why high impact**: Significant improvement by bypassing kernel page cache on systems with sufficient RAM

### Medium-High Impact

8. **Page caching + readahead** (#5940, P0)
- Use for trie iteration, sync operations
- **Why medium-high**: Reduces random I/O by prefetching related data

9. **Reduce trie cache Mutex contention** (P1)
- `trie_cache` is behind `Arc<Mutex<Arc<TrieLayerCache>>>`
- Use `ArcSwap` or `RwLock` for lock-free reads
- Location: `crates/storage/store.rs:159,1360`
- **Why medium-high**: High-frequency access point, lock contention can be significant bottleneck

10. **Reduce LatestBlockHeaderCache contention** (P1)
- `LatestBlockHeaderCache` uses Mutex for every read
- Use `ArcSwap` for atomic pointer swaps
- Location: `crates/storage/store.rs:2880-2894`
- **Why medium-high**: Accessed on every read operation

11. **Increase Bloom Filter (RocksDB)** (#5938, P0)
- Change and benchmark higher bits per key for state tables
- **Why medium-high**: Reduces unnecessary disk reads by improving filter accuracy

12. **Use Bytes/Arc in trie layer cache** (P2)
- Trie layer cache clones `Vec<u8>` values on every read
- Use `Bytes` or `Arc<[u8]>` to reduce allocations
- Location: `crates/storage/layering.rs:57,63`
- **Why medium-high**: Reduces allocations in hot path

13. **Optimize for Point Lookups (RocksDB)** (#5941, P0)
- Adds hash index inside FlatKeyValue for faster point lookups
- **Why medium-high**: Faster for common lookup patterns

### Medium Impact

14. **Consider LZ4 for State Tables (RocksDB)** (#5939, P0)
- Trades CPU for smaller DB and potentially better cache utilization
- **Why medium**: Depends on CPU vs I/O bottleneck and workload characteristics

15. **Increase layers commit threshold** (#5944, P0)
- For read-heavy workloads with plenty of RAM
- **Why medium**: Reduces write amplification but only beneficial in specific scenarios

16. **Configurable cache budgets** (P2)
- Expose cache split for DB/trie/snapshot as runtime config
- Currently hardcoded in ethrex
- **Why medium**: Allows tuning for specific hardware but requires user knowledge

17. **Benchmark bloom filter** (#5946, P1)
- Review trie layer's bloom filter, remove it or test other libraries/configurations
- **Why medium**: May remove overhead if not beneficial, but needs measurement

18. **Modify block size (RocksDB)** (#5942, P0)
- Benchmark different block size configurations
- **Why medium**: Workload dependent, requires benchmarking to determine optimal value

### Lower Impact (But Still Useful)

19. **Remove locks** (#5945, P1)
- Check if there are still some unnecessary locks, e.g. in the VM we have one
- **Why lower**: Limited scope, only affects specific components

20. **geth db migration tooling** (P0, In Progress)
- As we don't support pre-merge blocks we need a tool to migrate other client's DB to ours at a specific block
- **Why lower**: More of a feature than performance improvement, enables compatibility

21. **Migrations** (P4)
- Add DB Migration mechanism for ethrex upgrades
- **Why lower**: Infrastructure for future changes, not direct performance improvement

---

## Summary

- **4 items** require DB schema changes and resyncing
- **21 items** are DB-related optimizations/configurations that don't require resyncing
- Most high-priority (P0) DB work focuses on RocksDB tuning and configuration

### Top Recommended Actions (No Resync Required)

The top 7 items provide the most significant performance improvements:

1. Add Block Cache (RocksDB) - fundamental for read performance
2. Use Two-Level Index (RocksDB) - reduces memory overhead
3. Use multiget on trie traversal - reduces I/O latency
4. Bulk reads for block bodies - improves batch operations
5. Enable unordered writes for State - speeds up writes during sync
6. Toggle compaction during sync - proven pattern for sync performance
7. Memory-Mapped Reads - significant improvement on high-RAM systems

### Key Observation

The explicit block cache (#5935) is likely the **single biggest win** since the system is currently relying only on OS page cache, which is a fundamental RocksDB optimization.
6 changes: 6 additions & 0 deletions crates/blockchain/blockchain.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2229,6 +2229,12 @@ impl Blockchain {
transactions_count += block.body.transactions.len();
all_receipts.push((block.hash(), receipts));

// Normalize VM cache for next block to prevent metadata pollution (issue #6467)
// This resets transient flags (exists, status) while preserving state changes
if i + 1 < blocks_len {
vm.normalize_cache_for_next_block();
}

// Conversion is safe because EXECUTE_BATCH_SIZE=1024
log_batch_progress(blocks_len as u32, i as u32);
tokio::task::yield_now().await;
Expand Down
29 changes: 23 additions & 6 deletions crates/blockchain/payload.rs
Original file line number Diff line number Diff line change
Expand Up @@ -833,10 +833,28 @@ pub fn apply_plain_transaction(
// EIP-8037 (Amsterdam+): track regular and state gas separately
let tx_state_gas = report.state_gas_used;
let tx_regular_gas = report.gas_used.saturating_sub(tx_state_gas);
context.block_regular_gas_used = context

// Compute new totals before committing them
let new_regular = context
.block_regular_gas_used
.saturating_add(tx_regular_gas);
context.block_state_gas_used = context.block_state_gas_used.saturating_add(tx_state_gas);
let new_state = context.block_state_gas_used.saturating_add(tx_state_gas);

// EIP-8037 (Amsterdam+): post-execution block gas overflow check
// Reject the transaction if adding it would cause max(regular, state) to exceed the gas limit
if context.is_amsterdam && new_regular.max(new_state) > context.payload.header.gas_limit {
return Err(EvmError::Custom(format!(
"block gas limit exceeded (state gas overflow): \
max({new_regular}, {new_state}) = {} > gas_limit {}",
new_regular.max(new_state),
context.payload.header.gas_limit
))
.into());
}

// Commit the new totals
context.block_regular_gas_used = new_regular;
context.block_state_gas_used = new_state;

if context.is_amsterdam {
debug!(
Expand All @@ -852,15 +870,14 @@ pub fn apply_plain_transaction(
}

// Update remaining_gas for block gas limit checks.
// EIP-8037 (Amsterdam+): per-tx check only validates regular gas against block limit.
// State gas is NOT checked per-tx; block-end validation enforces
// max(block_regular, block_state) <= gas_limit.
// EIP-8037 (Amsterdam+): remaining_gas reflects both regular and state gas dimensions.
// For pre-tx heuristic checks, this ensures we reject txs when either dimension is full.
if context.is_amsterdam {
context.remaining_gas = context
.payload
.header
.gas_limit
.saturating_sub(context.block_regular_gas_used);
.saturating_sub(new_regular.max(new_state));
} else {
context.remaining_gas = context.remaining_gas.saturating_sub(report.gas_used);
}
Expand Down
90 changes: 45 additions & 45 deletions crates/vm/backends/levm/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@ use ethrex_levm::account::{AccountStatus, LevmAccount};
use ethrex_levm::call_frame::Stack;
use ethrex_levm::constants::{
POST_OSAKA_GAS_LIMIT_CAP, STACK_LIMIT, SYS_CALL_GAS_LIMIT, TX_BASE_COST,
TX_MAX_GAS_LIMIT_AMSTERDAM,
};
use ethrex_levm::db::Database;
use ethrex_levm::db::gen_db::{CacheDB, GeneralizedDatabase};
Expand Down Expand Up @@ -126,18 +125,13 @@ impl LEVM {
})?;

for (tx_idx, (tx, tx_sender)) in transactions_with_sender.into_iter().enumerate() {
// Pre-tx gas limit guard per EIP-8037/EIP-7825:
// Amsterdam: check min(TX_MAX_GAS_LIMIT, tx.gas) against regular gas only.
// State gas is NOT checked per-tx; block-end validation enforces
// max(block_regular, block_state) <= gas_limit.
// Pre-Amsterdam: check tx.gas against cumulative_gas_used (post-refund sum).
if is_amsterdam {
check_gas_limit(
block_regular_gas_used,
tx.gas_limit().min(TX_MAX_GAS_LIMIT_AMSTERDAM),
block.header.gas_limit,
)?;
} else {
// Pre-tx gas limit guard:
// Pre-Amsterdam: reject tx if cumulative post-refund gas + tx.gas > block limit.
// Amsterdam+: skip — EIP-8037's 2D gas model means cumulative gas (regular +
// state) can legally exceed the block gas limit as long as
// max(sum_regular, sum_state) stays within it. Block-level overflow is
// detected post-execution.
if !is_amsterdam {
check_gas_limit(cumulative_gas_used, tx.gas_limit(), block.header.gas_limit)?;
}

Expand Down Expand Up @@ -189,6 +183,17 @@ impl LEVM {
receipts.push(receipt);
}

// EIP-7778 (Amsterdam+): block-level gas overflow check.
// Per-tx checks are skipped for Amsterdam because block gas is computed
// from pre-refund values; overflow can only be detected after execution.
if is_amsterdam && block_gas_used > block.header.gas_limit {
return Err(EvmError::Transaction(format!(
"Gas allowance exceeded: Block gas used overflow: \
block_gas_used {block_gas_used} > block_gas_limit {}",
block.header.gas_limit
)));
}

// Set BAL index for post-execution phase (requests + withdrawals, uint16)
// Order must match geth: requests (system calls) BEFORE withdrawals.
if is_amsterdam {
Expand Down Expand Up @@ -424,18 +429,13 @@ impl LEVM {
let mut tx_since_last_flush = 2;

for (tx_idx, (tx, tx_sender)) in transactions_with_sender.into_iter().enumerate() {
// Pre-tx gas limit guard per EIP-8037/EIP-7825:
// Amsterdam: check min(TX_MAX_GAS_LIMIT, tx.gas) against regular gas only.
// State gas is NOT checked per-tx; block-end validation enforces
// max(block_regular, block_state) <= gas_limit.
// Pre-Amsterdam: check tx.gas against cumulative_gas_used (post-refund sum).
if is_amsterdam {
check_gas_limit(
block_regular_gas_used,
tx.gas_limit().min(TX_MAX_GAS_LIMIT_AMSTERDAM),
block.header.gas_limit,
)?;
} else {
// Pre-tx gas limit guard:
// Pre-Amsterdam: reject tx if cumulative post-refund gas + tx.gas > block limit.
// Amsterdam+: skip — EIP-8037's 2D gas model means cumulative gas (regular +
// state) can legally exceed the block gas limit as long as
// max(sum_regular, sum_state) stays within it. Block-level overflow is
// detected post-execution.
if !is_amsterdam {
check_gas_limit(cumulative_gas_used, tx.gas_limit(), block.header.gas_limit)?;
}

Expand Down Expand Up @@ -497,6 +497,17 @@ impl LEVM {
receipts.push(receipt);
}

// EIP-7778 (Amsterdam+): block-level gas overflow check.
// Per-tx checks are skipped for Amsterdam because block gas is computed
// from pre-refund values; overflow can only be detected after execution.
if is_amsterdam && block_gas_used > block.header.gas_limit {
return Err(EvmError::Transaction(format!(
"Gas allowance exceeded: Block gas used overflow: \
block_gas_used {block_gas_used} > block_gas_limit {}",
block.header.gas_limit
)));
}

#[cfg(feature = "perf_opcode_timings")]
{
let mut timings = OPCODE_TIMINGS.lock().expect("poison");
Expand Down Expand Up @@ -972,32 +983,21 @@ impl LEVM {
// balance in the BAL won't match execution that ran all txs).
let mut block_regular_gas_used = 0_u64;
let mut block_state_gas_used = 0_u64;
for (tx_idx, _, report, _, _, _) in &exec_results {
// Per-tx check: only regular gas is checked per-tx (EIP-8037/EIP-7825).
// State gas is validated at block end via max(regular, state) <= gas_limit.
let tx_gas_limit = txs_with_sender[*tx_idx].0.gas_limit();
check_gas_limit(
block_regular_gas_used,
tx_gas_limit.min(TX_MAX_GAS_LIMIT_AMSTERDAM),
header.gas_limit,
)?;
for (_, _, report, _, _, _) in &exec_results {
let tx_state_gas = report.state_gas_used;
let tx_regular_gas = report.gas_used.saturating_sub(tx_state_gas);
block_regular_gas_used = block_regular_gas_used.saturating_add(tx_regular_gas);
block_state_gas_used = block_state_gas_used.saturating_add(tx_state_gas);
// Post-tx check: needed because all txs are already executed — if the last tx
// pushes actual gas over the limit, there's no next iteration to catch it
// like the sequential path does.
let running_block_gas_after = block_regular_gas_used.max(block_state_gas_used);
if running_block_gas_after > header.gas_limit {
return Err(EvmError::Transaction(format!(
"Gas allowance exceeded: \
used {running_block_gas_after} > block limit {}",
header.gas_limit
)));
}
}
let block_gas_used = block_regular_gas_used.max(block_state_gas_used);
// EIP-7778: block-level overflow check using pre-refund gas.
if block_gas_used > header.gas_limit {
return Err(EvmError::Transaction(format!(
"Gas allowance exceeded: Block gas used overflow: \
block_gas_used {block_gas_used} > block_gas_limit {}",
header.gas_limit
)));
}

// 4. Per-tx BAL validation — now safe to run after gas limit is confirmed OK.
// Also mark off storage_reads that appear in per-tx execution state.
Expand Down
7 changes: 7 additions & 0 deletions crates/vm/backends/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -195,6 +195,13 @@ impl Evm {
LEVM::get_state_transitions(&mut self.db)
}

/// Normalizes VM cache metadata between blocks in batch execution.
/// Fixes issue #6467 where transient account metadata (exists, status) leaks
/// between blocks, causing incorrect EIP-7702 gas refund calculations.
pub fn normalize_cache_for_next_block(&mut self) {
self.db.normalize_cache_for_next_block();
}

/// Wraps [LEVM::process_withdrawals].
/// Applies the withdrawals to the state or the block_chache if using [LEVM].
pub fn process_withdrawals(&mut self, withdrawals: &[Withdrawal]) -> Result<(), EvmError> {
Expand Down
29 changes: 29 additions & 0 deletions crates/vm/levm/src/db/gen_db.rs
Original file line number Diff line number Diff line change
Expand Up @@ -409,6 +409,35 @@ impl GeneralizedDatabase {
Ok(account_updates)
}

/// Normalizes cached accounts between blocks in batch execution.
///
/// After executing block N, accounts in current_accounts_state represent the
/// post-block state. These accounts will be the base for block N+1, but their
/// transient metadata (status, exists) must be reset to prevent pollution.
///
/// This fixes issue #6467 where mark_modified() sets exists=true, which then
/// incorrectly persists into the next block, causing wrong EIP-7702 refunds.
///
/// Called between blocks in add_blocks_in_batch.
pub fn normalize_cache_for_next_block(&mut self) {
// Normalize metadata in current_accounts_state (the working cache)
for (address, account) in self.current_accounts_state.iter_mut() {
// Reset status - these accounts are now the "unmodified" base for next block
account.status = AccountStatus::Unmodified;

// Recalculate exists based on actual account content, not mark_modified flag.
// An account exists if it's non-empty (has balance, nonce, code, or storage).
// This matches the semantics of loading from DB (see From<AccountState>).
account.exists = !account.info.is_empty() || account.has_storage;

// Ensure account exists in initial_accounts_state for storage access
// If it was created in a previous block of this batch, add it now
if !self.initial_accounts_state.contains_key(address) {
self.initial_accounts_state.insert(*address, account.clone());
}
}
}

pub fn get_state_transitions_tx(&mut self) -> Result<Vec<AccountUpdate>, VMError> {
let mut account_updates: Vec<AccountUpdate> = vec![];
for (address, new_state_account) in self.current_accounts_state.drain() {
Expand Down
Loading
Loading