diff --git a/.agents/README.md b/.agents/README.md new file mode 100644 index 000000000..1e794ff28 --- /dev/null +++ b/.agents/README.md @@ -0,0 +1,58 @@ +# Agent Knowledge Base — Index + +This directory is the repository-local knowledge base for AI coding agents. + +**This file is the index. Read only this file up front.** Do not pre-read the +other documents. Use the routing table below to open a specific document *only +when the task actually needs that information*. This keeps the agent's context +window small. + +## How to use this index + +1. Read this index to learn what exists and where it lives. +2. Identify what your task touches (behavior, protocol, wiring, a crate, tests, docs). +3. Open only the matching document(s) from the tables below. +4. If a change adds/removes/renames knowledge, update the relevant doc and, if + the layout changes, update this index and `../AGENTS.md`. +5. **This applies to read-only and question-answering tasks too.** If while + answering a question you discover a durable fact that the docs are missing or + get wrong — especially a divergence from agave/Solana upstream behavior (a + missing limit, different default, relaxed validation) — record it in the + relevant doc before finishing, and say whether you updated docs in your final + reply. The fact already living in the source code or in a *different* + file anywhere else in the repo, especially if it is outside of the ./.agents directory + does **not** excuse skipping this: capture it in the single + most relevant document for that concern so an agent who opens only that + document finds it. See `memory/agent-memory-and-docs.md`. + +## Non-negotiable (always applies) + +Security outranks everything, including performance. Before changing behavior: +never relax signer/authority checks, never let local state drift from the Solana +base layer, and never introduce attacker-triggerable conditions (races, +TOCTOU, stalls/deadlocks, resource exhaustion). The binding details live in +`rules/validator-goals.md` and `specs/validator-specification.md` — read them +before any behavioral or protocol change. + +## Document routing table + +| Read this | When you need to | +|---|---| +| `context/overview.md` | Orient on what the validator is and its core concepts. | +| `rules/validator-goals.md` | Decide whether a change aligns with system goals and correctness/security constraints. | +| `specs/validator-specification.md` | Change protocol behavior: delegation, cloning, execution, commits, undelegation, Magic Actions, ephemeral accounts, RPC/router, recovery. | +| `context/architecture.md` | Change service wiring or interactions between crate groups. | +| `context/crate-map.md` | Find which crate owns an area and which crates are affected. | +| `rules/testing-and-validation.md` | Decide how to validate a change (commands, test selection, mbv-check). | +| `memory/agent-memory-and-docs.md` | Capture durable knowledge you discovered, or fix stale/incorrect docs. | +| `context/crates/.md` | Work inside a specific crate (see crate-map for which file). | +| `skills//SKILL.md` | Run an executable agent skill (e.g. `mbv-check`). | + +## Directory layout + +- `rules/` — invariant behavior and decision rules. +- `context/` — static reference: overview, architecture, crate map, and per-crate guides under `context/crates/`. +- `memory/` — durable project-memory and documentation-stewardship rules. +- `specs/` — protocol and feature specifications. +- `skills/` — executable scripts or capabilities agents can run. +- `personas/` — specialized agent profiles when needed. diff --git a/.agents/context/architecture.md b/.agents/context/architecture.md new file mode 100644 index 000000000..04283ff94 --- /dev/null +++ b/.agents/context/architecture.md @@ -0,0 +1,199 @@ +# High-Level Architecture + +This file explains the repository-level architecture and how major crate groups interact. It intentionally stays high level. Detailed/lower-level architecture belongs in crate-specific docs under `.agents/context/crates/` as those files are added. For crate-by-crate ownership, use `.agents/context/crate-map.md`. + +## System shape + +The validator is a service graph around one core loop: make the right accounts available locally, execute valid ER transactions, persist the result, and settle scheduled state changes back to Solana. This graph is performance-sensitive: architectural changes must preserve low-latency, high-throughput behavior on critical paths unless there is no viable alternative, and any unavoidable tradeoff must be called out explicitly. + +It is also security-critical and the security contract outranks performance. Architectural changes must not weaken signer/authority enforcement, must keep account synchronization with the Solana base layer at least as correct and stable as today, and must not introduce attacker-triggerable conditions (races, timing/ordering attacks, stalls/deadlocks, resource exhaustion). The validator settles real funds; a security regression can lose money for the operator or customers. See `.agents/rules/validator-goals.md` and `.agents/specs/validator-specification.md` for the binding security invariants. + +```text +Client / Operator + | + v +RPC / TUI / Admin ingress + | + v +Validator orchestration + | + +--> account synchronization <----> Solana RPC/WS + delegation metadata + | + +--> transaction execution -----> local AccountsDb + Ledger -----> events + | + +--> commit/undelegation -------> base-layer transactions + | + +--> task scheduling -----------> submitted transactions + | + +--> replication/metrics -------> replicas + observability +``` + +## Main layers + +### 1. Process and service orchestration + +Owned primarily by `magicblock-validator`, `magicblock-api`, and `magicblock-config`. + +Responsibilities: + +- parse/load configuration, +- construct the validator service graph, +- open persistent stores, +- initialize account sync, RPC, scheduler, committor, task scheduler, replication, metrics, and admin support, +- recover persisted work, +- coordinate startup mode versus primary/replica mode, +- stop services and flush state in the correct order. + +Architecture rule: process entrypoints should stay thin; cross-service wiring belongs in the orchestration layer, not in leaf crates. + +### 2. Client/API ingress + +Owned primarily by `magicblock-aperture` plus admin/TUI support crates. + +Responsibilities: + +- expose Solana-compatible JSON-RPC and websocket/pubsub behavior, +- accept transactions and simulations, +- serve account/ledger/status reads, +- trigger just-in-time account availability work for local misses, +- forward validator events to clients/subscribers. + +Architecture rule: the RPC layer should route work to account sync and execution services; it should not duplicate execution, delegation, or commit protocol logic. Keep per-request work lean and avoid blocking critical request paths. + +### 3. Account synchronization + +Owned primarily by `magicblock-chainlink`, `magicblock-account-cloner`, and `magicblock-accounts`. + +Responsibilities: + +- determine whether required accounts are delegated, undelegated/read-only, fee-payers, programs, or missing/stale, +- fetch base-layer account data and delegation metadata, +- subscribe to remote changes where needed, +- materialize local account/program state, +- provide account availability to RPC and transaction execution, +- hand scheduled commit work toward settlement. + +Architecture rule: this layer prepares local state for execution. It should not decide post-execution account access rules; those belong to the execution/SVM path. Avoid fetch amplification, duplicate clone work, subscription churn, and unnecessary serialization in account availability paths. + +### 4. Transaction execution + +Owned primarily by `magicblock-processor`, `magicblock-core`, the local storage crates, and the forked SVM dependency. + +Responsibilities: + +- receive processable transactions, +- acquire account locks, +- schedule work onto executors, +- run SVM execution, +- enforce MagicBlock access validation, +- commit local account changes, +- write ledger/status records, +- emit account, transaction, slot, and replication events. + +Architecture rule: execution must preserve the writable-account invariant and avoid mixing scheduler/account-lock concerns with RPC or commit-delivery concerns. It must also preserve scheduler/executor parallelism and avoid avoidable latency, contention, allocation, or I/O regressions in the hot path. + +### 5. Local persistence + +Owned primarily by `magicblock-accounts-db` and `magicblock-ledger`. + +Responsibilities: + +- store local account state, +- index accounts for execution/RPC, +- support snapshots/maintenance, +- store transaction, status, block, address-signature, and blockhash history, +- support recovery and user-visible RPC history. + +Architecture rule: maintenance operations that can race execution must be coordinated with scheduler pausing. + +### 6. Base-layer settlement + +Owned primarily by `magicblock-program`, `magicblock-magic-program-api`, `magicblock-committor-service`, `magicblock-committor-program`, `magicblock-table-mania`, and `magicblock-rpc-client`. + +Responsibilities: + +- let programs schedule commits, commit-and-undelegate operations, intent bundles, and Magic Actions, +- persist and recover pending settlement work, +- build valid base-layer transactions, +- handle address lookup tables and large changesets, +- send/confirm base-layer transactions, +- keep local lifecycle state consistent with scheduled undelegation. + +Architecture rule: Magic Program instructions schedule intent; validator services realize that intent on the base layer. + +### 7. Background services + +Owned by task scheduler, replicator, metrics, admin, and shared service crates. + +Responsibilities: + +- execute scheduled program tasks, +- replicate primary output to replicas, +- expose metrics/admin/operator hooks, +- provide reusable service infrastructure. + +Architecture rule: background services should integrate through shared channels/service APIs rather than reaching through unrelated crate internals. + +## Important interaction patterns + +### Transaction submission path + +```text +RPC/router ingress + -> account synchronization ensures required accounts exist locally + -> processor scheduler locks accounts + -> executor runs SVM + -> AccountsDb and Ledger persist results + -> events notify RPC subscriptions, metrics, replication, and other consumers +``` + +### First use of delegated state + +```text +base-layer delegation exists + -> ER read/transaction needs account + -> account sync fetches account + delegation metadata + -> cloner installs local representation + -> processor can execute valid transactions against it +``` + +### Commit / undelegation path + +```text +program invokes Magic Program in ER + -> MagicContext records scheduled intent + -> validator-side processing picks up intent + -> committor builds/sends base-layer transaction(s) + -> commit keeps delegation active OR undelegation returns ownership after settlement +``` + +### Startup path + +```text +load config + -> open ledger/accounts storage + -> initialize services + -> recover persisted work + -> replay/repair local state where configured + -> enter primary or replica execution mode +``` + +### Shutdown path + +```text +cancel services + -> protect/finish in-flight work where required + -> join threads/runtimes + -> flush persistent stores +``` + +## Boundaries .agents should preserve + +- **RPC ingress is not the protocol source of truth.** It should call into account sync, execution, and storage layers. +- **Account synchronization is not transaction execution.** It prepares accounts; execution validates and commits changes. +- **Magic Program scheduling is not base-layer settlement.** It records intent; committor services deliver it. +- **Local persistence is shared infrastructure.** Coordinate maintenance with execution. +- **Replication observes/replays validator output.** Do not make primary and replica modes accidentally diverge. +- **Crate-specific details belong in crate docs.** Keep this file focused on cross-crate architecture. +- **Performance is part of the architecture contract.** Do not move heavy work into RPC, account sync, scheduler/executor, persistence, or settlement hot paths without an explicit justification and mitigation plan. +- **Security is the top architecture constraint and outranks performance.** Do not let any layer weaken signer/authority enforcement, drift local state out of sync with the base layer, or open attacker-triggerable race/timing/stall/exhaustion conditions. RPC ingress in particular handles untrusted input and must not become a path that bypasses execution/SVM validation or account-sync correctness. diff --git a/.agents/context/crate-map.md b/.agents/context/crate-map.md new file mode 100644 index 000000000..1904ca0cd --- /dev/null +++ b/.agents/context/crate-map.md @@ -0,0 +1,86 @@ +# Crate Map + +This map helps .agents find the right crate before making changes. The dependency lists focus on workspace crates and are intentionally concise; external Solana/SVM dependencies are omitted. + +The validator is performance-sensitive. When changing any crate on RPC, account synchronization, scheduling/execution, persistence, replication, or settlement paths, preserve low-latency and high-throughput behavior. Avoid unnecessary blocking, allocation, lock contention, I/O, serialization, logging, and duplicate work; explicitly call out any unavoidable performance tradeoff. + +## Core validator crates + +| Crate | Purpose | Depends on | Used by | Notes | +|---|---|---|---|---| +| `magicblock-validator` | Main validator binary and process entrypoint. | `magicblock-api`, `magicblock-config`, `magicblock-core`, `magicblock-tui-client`, `magicblock-version` | End users/operators | Parses config, builds runtime, starts headless/TUI validator; see `.agents/context/crates/magicblock-validator.md` before changing this crate. | +| `magicblock-api` | Top-level service orchestration and `MagicValidator` implementation. | accounts, aperture, chainlink, committor, config, core, ledger, processor, replicator, task scheduler, admin/services | `magicblock-validator` | Owns startup/shutdown wiring for most services; see `.agents/context/crates/magicblock-api.md` before changing this crate. | +| `magicblock-config` | Validator configuration model and layered config loading. | none | Most service crates | CLI/env/TOML/default config source; see `.agents/context/crates/magicblock-config.md` before changing configurable behavior. | +| `magicblock-core` | Shared channels, traits, account locks/helpers, intent/core types. | `magicblock-magic-program-api` | Most runtime crates | Central wiring layer; changes can affect scheduler, RPC, ledger, services, replication. See `.agents/context/crates/magicblock-core.md` before changing this crate. | +| `magicblock-version` | Build/version metadata. | none | `magicblock-validator`, `magicblock-aperture` | Keep version reporting stable for RPC/operator tooling; see `.agents/context/crates/magicblock-version.md` before changing this crate. | + +## RPC, API, and operator-facing crates + +| Crate | Purpose | Depends on | Used by | Notes | +|---|---|---|---|---| +| `magicblock-aperture` | Solana-compatible JSON-RPC and websocket/pubsub server. | account cloner, accounts-db, chainlink, config, core, ledger, metrics, version | `magicblock-api` | Handles RPC methods, subscriptions, transaction submission, local read misses/cloning. See `.agents/context/crates/magicblock-aperture.md` before changing this crate. | +| `magicblock-rpc-client` | RPC client utilities for sending/confirming base-layer transactions. | `magicblock-metrics` | committor, table-mania, account-cloner, API/admin | Critical for base-layer commit delivery; see `.agents/context/crates/magicblock-rpc-client.md` before changing this crate. | +| `magicblock-validator-admin` | Admin/client helpers for validator management operations. | `magicblock-program`, `magicblock-rpc-client` | `magicblock-api` | Keep compatible with operator/admin workflows; see `.agents/context/crates/magicblock-validator-admin.md` before changing this crate. | +| `magicblock-tui-client` | TUI client/binary support. | none | `magicblock-validator` | UI-facing; should not own core validator logic. | + +## Execution and storage crates + +| Crate | Purpose | Depends on | Used by | Notes | +|---|---|---|---|---| +| `magicblock-processor` | Transaction scheduler, executor pool, SVM execution, commit-to-local-state path. | `magicblock-accounts-db`, `magicblock-core`, `magicblock-ledger`, `magicblock-magic-program-api`, `magicblock-metrics`, `magicblock-program` | `magicblock-api`, tests | Core execution path; preserve account locking and writable-account access invariants. | +| `magicblock-accounts-db` | Custom local account database. | `magicblock-config`, `magicblock-magic-program-api` | account cloner, accounts, aperture, API, chainlink, processor, replicator, tests/tools | Append-only mmap storage plus indexes/snapshots; maintenance requires scheduler pause. See `.agents/context/crates/magicblock-accounts-db.md` before changing this crate. | +| `magicblock-ledger` | Local ledger/history and latest block state. | `magicblock-core`, `magicblock-metrics`, `solana-storage-proto`, `test-kit` | aperture, API, processor, replicator, task scheduler, tools/tests | Stores tx/status/block history and latest blockhash/slot; see `.agents/context/crates/magicblock-ledger.md` before changing this crate. | +| `solana-storage-proto` | Generated/protobuf storage support. | none | `magicblock-ledger` | Low-level ledger serialization support; see `.agents/context/crates/storage-proto.md` before changing this crate. | + +## Delegation, cloning, and account lifecycle crates + +| Crate | Purpose | Depends on | Used by | Notes | +|---|---|---|---|---| +| `magicblock-chainlink` | Base-chain account/delegation coordination. | accounts-db, AML, config, core, magic-program API, metrics | account-cloner, accounts, aperture, API, magic program | Checks/clones remote accounts, tracks delegation state, coordinates base-layer reads. See `.agents/context/crates/magicblock-chainlink.md` before changing this crate. | +| `magicblock-account-cloner` | Fetches and injects base-layer accounts/programs into local validator state. | accounts-db, chainlink, committor-service, config, core, ledger, magic-program API, magic program, rpc-client | accounts, aperture, API | Distinguishes fee payer, delegated, and undelegated accounts; handles large/program clone paths. See `.agents/context/crates/magicblock-account-cloner.md` before changing this crate. | +| `magicblock-accounts` | Account manager and scheduled commit processing glue. | account-cloner, accounts-db, chainlink, committor-service, core, metrics, magic program | `magicblock-api` | Current active role is scheduled commit processing and pending intent recovery; see `.agents/context/crates/magicblock-accounts.md` before changing this crate. | +| `magicblock-aml` | External/cached risk-scoring integration. | `magicblock-config` (dev: `magicblock-core`) | `magicblock-chainlink` | Optional Range risk checks for post-delegation action signers; see `.agents/context/crates/magicblock-aml.md` before changing this crate. | + +## Commit and base-layer settlement crates + +| Crate | Purpose | Depends on | Used by | Notes | +|---|---|---|---|---| +| `magicblock-committor-service` | Executes scheduled base-layer intents: commit, undelegate, finalize, action. | committor program, core, metrics, magic program, rpc-client, table-mania | account-cloner, accounts, API | Durable commit pipeline; handles scheduling, transaction prep, buffers, ALTs, confirmations. See `.agents/context/crates/magicblock-committor-service.md` before changing this crate. | +| `magicblock-committor-program` | On-chain committor program. | none | `magicblock-committor-service` | Base-layer program side for changeset buffers/commit application; see `.agents/context/crates/magicblock-committor-program.md` before changing this crate. | +| `magicblock-table-mania` | Address lookup table management. | metrics, rpc-client | `magicblock-committor-service` | Creates/extends/deactivates/closes ALTs needed by commit transactions. See `.agents/context/crates/magicblock-table-mania.md` before changing this crate. | + +## Magic Program and shared protocol crates + +| Crate | Purpose | Depends on | Used by | Notes | +|---|---|---|---|---| +| `magicblock-program` | Magic Program implementation (`programs/magicblock`). | chainlink, core, magic-program API, test-kit | account-cloner, accounts, API, committor, processor, task scheduler, admin | Implements scheduling, cloning, ephemeral accounts, validator-only operations. | +| `magicblock-magic-program-api` | Shared Magic Program instruction, PDA, args, and compatibility types. | none | core, accounts-db, chainlink, processor, magic program, services, cloner, API, test programs | Use this instead of duplicating Magic Program wire types; see `.agents/context/crates/magicblock-magic-program-api.md` before changing this crate. | +| `guinea` | Test-only program for validator behavior. | `magicblock-magic-program-api` | processor tests, test-kit | Used to exercise ephemeral/delegated behavior in tests. | + +## Scheduling, replication, services, and observability + +| Crate | Purpose | Depends on | Used by | Notes | +|---|---|---|---|---| +| `magicblock-task-scheduler` | Program-scheduled task/crank service. | config, core, ledger, magic program | `magicblock-api` | SQLite-backed delay queue, retries/backoff, scheduled transaction submission. See `.agents/context/crates/magicblock-task-scheduler.md` before changing this crate. | +| `magicblock-replicator` | Primary/replica event replication over NATS JetStream. | accounts-db, config, core, ledger | `magicblock-api` | Preserves HA/replica replay behavior; primary and replica modes differ intentionally. | +| `magicblock-services` | Shared service utilities/adapters. | core, magic-program API | `magicblock-api` | Common service abstractions; keep generic. See `.agents/context/crates/magicblock-services.md` before changing this crate. | +| `magicblock-metrics` | Metrics helpers and instrumentation. | none | RPC, ledger, processor, chainlink, committor, table-mania, API | Prefer adding observability here rather than ad-hoc metrics code. See `.agents/context/crates/magicblock-metrics.md` before changing this crate. | + +## Tools and test support + +| Crate | Purpose | Depends on | Used by | Notes | +|---|---|---|---|---| +| `test-kit` | Shared integration/unit test helpers. | guinea, accounts-db, core, ledger, processor | aperture, committor, ledger, processor, magic program tests | Put reusable test harness logic here; see `.agents/context/crates/test-kit.md` before changing this crate. | +| `genx` | Developer/tooling binary. | `magicblock-accounts-db` | manual/tooling use | Keep outside runtime-critical paths. | +| `ledger-stats` | Ledger/accounts statistics tool. | accounts-db, core, ledger | manual/tooling use | Useful for inspecting local persisted state. | +| `keypair-base58` | Keypair conversion/helper binary. | none | manual/tooling use | Small standalone operator/dev helper. | + +## How to use this map + +- For transaction correctness, start with `magicblock-processor`, then inspect `magicblock-accounts-db`, `magicblock-ledger`, and `magicblock-program` interactions. +- For delegation or account cloning bugs, start with `magicblock-chainlink`, `magicblock-account-cloner`, and `magicblock-accounts`. +- For commit or undelegation bugs, start with `magicblock-program`, `magicblock-accounts`, and `magicblock-committor-service`. +- For RPC behavior, start with `magicblock-aperture`; check `magicblock-chainlink` if reads trigger cloning. +- For validator lifecycle/startup/shutdown, start with `magicblock-api` and `magicblock-validator`. +- When adding, removing, renaming, or repurposing a crate, update this file and `AGENTS.md` in the same change. +- When changing crate responsibilities, note whether performance-sensitive work moved onto or off of a hot path and document any expected regression or mitigation. diff --git a/.agents/context/crates/magicblock-account-cloner.md b/.agents/context/crates/magicblock-account-cloner.md new file mode 100644 index 000000000..00cc3f4f0 --- /dev/null +++ b/.agents/context/crates/magicblock-account-cloner.md @@ -0,0 +1,390 @@ +# `magicblock-account-cloner` + +## Purpose + +`magicblock-account-cloner` is the production implementation of the `magicblock-chainlink::cloner::Cloner` boundary. Chainlink decides which base-layer accounts or programs need to exist locally; this crate turns those clone requests into validator-signed local transactions that invoke the Magic Program and materialize the state inside the ER validator. + +At a high level it: + +- builds and submits Magic Program clone/evict instructions through the internal `TransactionSchedulerHandle`, +- encodes small accounts inline with `CloneAccount`, +- chunks large accounts with `CloneAccountInit` and `CloneAccountContinue`, +- executes post-delegation actions after delegated-account cloning, +- clones program ELF data through a temporary validator-owned buffer PDA and finalizes loader-specific program accounts, +- converts old BPF loader v1 programs into upgradeable-loader v3 local representation, +- deploys loader-v2/v3/v4 programs through the local loader-v4 path, +- provides small helpers for committor result diagnostics and deterministic program-clone buffer PDAs. + +This crate is on the account-availability path for transaction submission, RPC read misses that need cloning, and program loading. Changes can affect account-sync latency, transaction-scheduler pressure, Magic Program instruction compatibility, program execution readiness, and post-delegation action safety. Avoid adding blocking work, duplicate clone transactions, unnecessary serialization, unbounded logging, or extra scheduler round trips in these flows without an explicit performance tradeoff. + +## Update requirement + +Update this document in the same change whenever behavior in `magicblock-account-cloner` changes, or whenever another crate changes the clone requests or Magic Program instructions this crate consumes. This file is useful only if it reflects the current implementation. + +Update it for changes to: + +- the `ChainlinkCloner` public constructor or its `Cloner` trait implementation, +- account clone sizing, chunking, cleanup, post-delegation action handling, or transaction-size limits, +- program clone flow, loader support, buffer PDA derivation, executable-check toggling, deploy/finalize authority handling, or one-slot wait behavior, +- Magic Program clone/evict/finalize instruction fields or account metas, +- `AccountCloneRequest`, `DelegationActions`, `LoadedProgram`, or `RemoteProgramLoader` semantics in `magicblock-chainlink`, +- error mapping or transaction diagnostics in `account_cloner.rs` / `util.rs`, +- tests or integration validation for account/program cloning, +- performance characteristics of clone transaction construction or scheduler submission. + +## Where it sits in the repository + +Primary files: + +| Path | Role | +|---|---| +| `magicblock-account-cloner/Cargo.toml` | Crate dependencies. Depends on Chainlink for the `Cloner` trait/request types, Magic Program/API for clone instructions, ledger for latest blockhash/slot, core for scheduler handles, and committor/rpc-client for diagnostics helpers. | +| `magicblock-account-cloner/README.md` | High-level cloning notes. Some component names are historical; treat source and this agent guide as canonical for current implementation. | +| `magicblock-account-cloner/src/lib.rs` | Main `ChainlinkCloner` implementation, transaction builders, account clone flow, program clone flow, and unit tests. | +| `magicblock-account-cloner/src/account_cloner.rs` | `AccountClonerError`, result alias, and `map_committor_request_result` helper for turning committor oneshot results into diagnostic-rich errors. | +| `magicblock-account-cloner/src/util.rs` | Buffer PDA derivation and transaction diagnostic lookup helpers. | +| `magicblock-chainlink/src/cloner/mod.rs` | Trait boundary implemented here: `Cloner`, `AccountCloneRequest`, and `DelegationActions`. | +| `magicblock-chainlink/src/fetch_cloner/` | Builds clone requests, resolves delegation records/actions/program dependencies, and calls this crate through the `Cloner` trait. | +| `programs/magicblock/src/clone_account/` | Magic Program processors for clone, chunk, cleanup, post-delegation actions, and program finalization instructions emitted by this crate. | +| `magicblock-magic-program-api/src/instruction.rs` | Wire enum and `AccountCloneFields` used by clone instructions. | +| `magicblock-api/src/magic_validator.rs` | Production startup wiring: constructs `ChainlinkCloner` and passes it into `ProdInnerChainlink`. | +| `test-integration/test-cloning/` | Integration coverage for account/program cloning behavior, including multi-program and post-delegation action scenarios. | + +Main consumers: + +- `magicblock-api` creates `ChainlinkCloner::new(transaction_scheduler, latest_block)` during validator startup unless Chainlink is disabled for replica mode. +- `magicblock-chainlink` owns clone decisions and invokes this crate through `Arc`-style generic wiring. +- `magicblock-accounts` aliases `ProdChainlink` for scheduled commit / undelegation integration. +- `magicblock-aperture` stores the same production Chainlink alias in shared RPC state; it reaches the cloner indirectly through Chainlink. +- `magicblock-api` and `magicblock-accounts` convert `AccountClonerError` where committor diagnostic helpers are used. + +Important upstream/downstream relationships: + +- Upstream account classification, delegation resolution, post-delegation action validation, and program resolution happen in `magicblock-chainlink`; do not duplicate those policies here. +- Downstream state mutation happens by submitting local transactions to the Magic Program through `magicblock-core`'s transaction scheduler; this crate does not write `AccountsDb` directly. +- Program clone instructions must remain compatible with `programs/magicblock` processors and Solana loader interfaces. + +## Public API shape / Main public types and APIs + +The crate exports: + +```rust +mod account_cloner; +mod util; + +pub use account_cloner::*; +pub use util::derive_buffer_pubkey; + +pub struct ChainlinkCloner { ... } +``` + +### `ChainlinkCloner` + +`ChainlinkCloner` stores: + +- `tx_scheduler: TransactionSchedulerHandle` — internal scheduler used to execute validator-signed local clone transactions. +- `block: LatestBlock` — latest local blockhash/slot source used for transaction signing and post-program-clone readiness waits. + +Public constructor: + +```rust +impl ChainlinkCloner { + pub fn new( + tx_scheduler: TransactionSchedulerHandle, + block: LatestBlock, + ) -> Self +} +``` + +Production wiring in `magicblock-api/src/magic_validator.rs` wraps it in `Arc` and passes it into `InnerChainlinkImpl::try_new_from_endpoints`. + +### `Cloner` trait implementation + +`ChainlinkCloner` implements `magicblock_chainlink::cloner::Cloner`: + +- `evict_account(pubkey)` builds `InstructionUtils::evict_account_instruction(pubkey)` and submits it locally. +- `clone_account(AccountCloneRequest)` builds one or more account clone transactions and returns the last submitted signature. +- `clone_program(LoadedProgram)` builds one or more program-buffer/finalize/deploy transactions and returns the last submitted signature, or `Signature::default()` for retracted programs. + +The trait and request types live in Chainlink. Preserve that boundary: Chainlink supplies validated clone requests; this crate materializes them. + +### Constants and helper exports + +- `MAX_INLINE_DATA_SIZE: usize = 63 * 1024` controls chunk size for inline data payloads. +- Internal `MAX_INLINE_TRANSACTION_SIZE` is `u16::MAX`; transactions larger than this fail preflight with `ClonerError::CloneTransactionTooLarge` before submission. +- `derive_buffer_pubkey(program_pubkey)` derives `Pubkey::find_program_address(&[b"buffer", program_pubkey], &validator_authority_id())` for program clone buffers. + +### Error/diagnostic helper API + +`account_cloner.rs` exports: + +- `AccountClonerResult = Result`. +- `AccountClonerError::{RecvError, JoinError, CommittorServiceError}`. +- `map_committor_request_result(res, intent_committor)` which awaits a committor oneshot result and, for TableMania errors with a signature, fetches transaction logs and compute units through `BaseIntentCommittor::get_transaction` and `MagicblockRpcClient` helpers. + +This helper is not part of the primary account clone path, but changing it can affect error observability for account/commit integration flows. + +## Runtime flows + +### Production startup wiring + +```text +magicblock-api startup + -> create TransactionSchedulerHandle and LatestBlock + -> ChainlinkCloner::new(handle, latest_block) + -> ProdInnerChainlink::try_new_from_endpoints(...) + -> Chainlink fetch/clone pipeline calls Cloner methods as needed +``` + +Replica mode disables Chainlink in `magicblock-api`; in that mode `ChainlinkCloner` is not constructed for active base-layer cloning. + +### Regular account clone flow + +1. Chainlink builds an `AccountCloneRequest` containing the target pubkey, resolved `AccountSharedData`, optional post-delegation actions, and delegation metadata. +2. `clone_account` reads the latest blockhash from `LatestBlock`. +3. If `request.account.data().len() <= MAX_INLINE_DATA_SIZE`, it first builds a small clone transaction with `CloneAccount`. +4. If the small transaction fits `MAX_INLINE_TRANSACTION_SIZE`, it is submitted through `send_tx` and the signature is returned. +5. If a nominally small account has post-delegation actions that push the transaction over the size limit, the cloner falls back to the chunked large-account path. +6. Large/chunked accounts use: + 1. `CloneAccountInit` with the first data chunk and full `AccountCloneFields`, + 2. one or more `CloneAccountContinue` instructions for remaining chunks, + 3. an additional final empty `CloneAccountContinue` plus post-delegation executor instruction when actions are present. +7. Every chunked transaction is checked with `ensure_transactions_fit` before submission. +8. Transactions are submitted sequentially. On any submission error, the cloner sends `CleanupPartialClone` and returns `FailedToCloneRegularAccount`. +9. The returned signature is the last successfully submitted transaction signature, or default only if no transaction was submitted. + +Pitfalls: + +- `AccountCloneFields` must preserve lamports, owner, executable, delegated, confined, and remote slot from the Chainlink-resolved account. Delegated accounts must continue to be presented locally with the correct owner/flags. +- Post-delegation actions are intentionally included both in clone instructions and in a sibling post-delegation action executor instruction. Keep this aligned with `programs/magicblock/src/clone_account/process_post_delegation_actions.rs`. +- Chunked clone cleanup is best-effort. Errors from `send_cleanup` are logged but do not replace the original clone error. + +### Program clone flow + +```text +LoadedProgram from Chainlink + -> derive validator-owned buffer PDA ["buffer", program_id] + -> clone ELF/program data into buffer (small or chunked) + -> finalize into local program representation + -> loader-specific deploy/authority instructions + -> wait for one local slot before returning +``` + +1. Chainlink resolves a `LoadedProgram` and calls `clone_program`. +2. The cloner builds loader-specific transactions: + - `RemoteProgramLoader::V1` uses `FinalizeV1ProgramFromBuffer` and creates local upgradeable-loader-style program/program-data accounts. + - Other supported loaders use `FinalizeProgramFromBuffer`, `LoaderV4Instruction::Deploy`, and `SetProgramAuthority`. +3. `LoaderV4Status::Retracted` programs are skipped and return `Signature::default()`. +4. Program bytes are first cloned into a buffer PDA derived from validator authority and program id. +5. Small programs fit in one transaction containing buffer `CloneAccount` plus finalization instructions. +6. Large programs use `CloneAccountInit`, middle `CloneAccountContinue` chunks, and a final `CloneAccountContinue(is_last=true)` with finalization/deploy/authority instructions. +7. V1 and V4 finalization sequences temporarily disable and then re-enable executable checks via Magic Program instructions because finalization sets executable state. +8. On any transaction submission failure, cleanup targets the buffer PDA and returns `FailedToCloneProgram`. +9. After successful submission, `clone_program` waits until `LatestBlock` advances beyond the current slot before returning so the cloned program can be used. + +Pitfalls: + +- The buffer account is temporary and must stay deterministic; changing `derive_buffer_pubkey` affects cleanup, idempotency, and program clone compatibility. +- V1 programs are assumed immutable after local deployment; the code avoids a full upgrade flow and directly creates/updates program/program-data accounts. +- The one-slot wait is a readiness guarantee for program use. Removing it can introduce races where a just-cloned program is invoked before it is usable. + +### Eviction flow + +1. Chainlink requests eviction through the `Cloner` trait, typically as part of subscription/LRU or account lifecycle handling. +2. `evict_account` builds the Magic Program evict instruction and signs a local transaction with validator authority. +3. The transaction is submitted through the same scheduler path as clones. +4. Errors are wrapped as `ClonerError::FailedToEvictAccount`. + +### Committor diagnostic mapping flow + +1. A caller passes a committor oneshot receiver plus `Arc` to `map_committor_request_result`. +2. Send/receive failures become `AccountClonerError::CommittorServiceError` or `RecvError`. +3. Successful committor values are returned unchanged. +4. `CommittorServiceError::TableManiaError` with a signature triggers a transaction lookup for logs and compute units. +5. The final error string includes TableMania debug output plus available CUs/logs. + +This helper performs remote/committor diagnostics and should not be inserted into hot clone submission paths without considering latency impact. + +## Important internals and caveats + +### Transaction signing and submission + +All clone, evict, cleanup, finalize, and deploy transactions are signed with `magicblock_program::validator::validator_authority()`. `send_tx` captures `tx.signatures[0]`, encodes the transaction with `with_encoded`, and submits it through `TransactionSchedulerHandle::execute`. + +Do not bypass the scheduler or write accounts directly from this crate. The local transaction path keeps Magic Program validation, account locking, ledger/status writes, and event emission consistent with normal validator execution. + +### Size limits and chunking + +`MAX_INLINE_DATA_SIZE` is an approximate payload chunk size, not a guarantee that every built transaction fits Solana's serialized transaction limit. The code separately checks serialized transaction size with `bincode::serialized_size` against `u16::MAX`. Post-delegation actions can make a small data payload too large, which is why the small-account path can fall back to chunking. + +### Post-delegation actions + +Delegation actions originate from DLP delegation records and are parsed/validated in Chainlink. This crate only transports and executes them as part of clone finalization. Actions attached to non-delegated or unresolved DLP-owned accounts should be rejected before they reach this crate; if that boundary changes, inspect `magicblock-chainlink/src/fetch_cloner/mod.rs` and its tests. + +### Program loaders + +The cloner treats `RemoteProgramLoader::V1` specially and sends all other non-retracted loaded programs through the V4 deployment path. Loader semantics are resolved in `magicblock-chainlink/src/remote_account_provider/program_account.rs`. If new loader variants or statuses are added, update both crates and this guide. + +### README caveat + +`magicblock-account-cloner/README.md` describes older conceptual components such as separate fetcher, updates, and dumper crates. The current repository places fetch/update/classification behavior primarily in `magicblock-chainlink`; this crate is the transaction-building and local materialization executor behind the Chainlink `Cloner` trait. + +## Important invariants + +1. This crate must not decide which accounts are safe to clone; Chainlink owns classification, delegation-record resolution, and request construction. +2. This crate must not bypass the local transaction scheduler or mutate `AccountsDb` directly. +3. `AccountCloneFields` must faithfully carry lamports, owner, executable, delegated, confined, and remote slot from the resolved account. +4. Large account clones must be completed with `CloneAccountContinue(is_last=true)` or cleaned up with `CleanupPartialClone` on failure. +5. Transaction size preflight must remain in place for chunked transactions, especially when post-delegation actions are present. +6. Post-delegation actions for delegated clones must remain synchronized between clone instructions and post-delegation action executor instructions. +7. Program clone buffer PDAs must remain deterministic and cleanup-compatible. +8. Program finalization must preserve loader-specific authority and deployment semantics. +9. Retracted programs must not be deployed locally as usable programs. +10. Program cloning must preserve the wait-until-next-slot readiness behavior unless a replacement readiness guarantee is implemented. +11. Any new work added to account/program clone paths must be bounded and avoid unnecessary allocations, serialization, logging, or scheduler submissions. + +## Common change areas and what to inspect + +### Changing regular account clone behavior + +Inspect first: + +- `magicblock-account-cloner/src/lib.rs` methods `clone_account`, `build_small_account_tx`, `build_large_account_txs`, `clone_fields`, and cleanup helpers; +- `magicblock-chainlink/src/cloner/mod.rs` for request/trait shape; +- `magicblock-chainlink/src/fetch_cloner/` for request construction and delegation-action validation; +- `programs/magicblock/src/clone_account/` for instruction processor expectations; +- unit tests in `magicblock-account-cloner/src/lib.rs`. + +Risks: + +- transaction-size regressions with large action payloads; +- partial clone state left behind after failures; +- delegated/confined/remote-slot flags diverging from Chainlink's resolved account state. + +### Changing post-delegation action handling + +Inspect first: + +- `build_small_account_tx` and `build_large_account_txs`; +- `programs/magicblock/src/clone_account/process_post_delegation_actions.rs`; +- `magicblock-chainlink/src/fetch_cloner/delegation.rs` and action dependency validation in `fetch_cloner/mod.rs`; +- `test-integration/test-cloning/tests/10_post_delegation_token_transfer.rs`. + +Risks: + +- executing actions before the cloned account is fully materialized; +- allowing actions on non-delegated or unresolved accounts; +- producing transactions too large to execute. + +### Changing program clone or loader behavior + +Inspect first: + +- `build_program_txs`, `build_v1_program_txs`, `build_v4_program_txs`, `build_program_txs_from_finalize`, and `build_large_program_txs`; +- `derive_buffer_pubkey` in `src/util.rs`; +- `magicblock-chainlink/src/remote_account_provider/program_account.rs`; +- Magic Program finalizers in `programs/magicblock/src/clone_account/`; +- loader API dependencies in `Cargo.toml`; +- `test-integration/test-cloning/tests/08_multi_program_cloning.rs`. + +Risks: + +- ABI/loader mismatches; +- executable checks left disabled after an error; +- program authority not matching the base-layer authority; +- program becoming visible before it is usable. + +### Changing startup or Chainlink wiring + +Inspect first: + +- `magicblock-api/src/magic_validator.rs::init_chainlink`; +- type aliases `InnerChainlinkImpl` / `ChainlinkImpl` in `magicblock-api` and `magicblock-accounts`; +- `magicblock-chainlink` production aliases and disabled replication mode behavior. + +Risks: + +- constructing cloners in replica mode where Chainlink should be disabled; +- using a stale `LatestBlock` or wrong scheduler handle; +- changing account availability behavior for RPC and transaction submission. + +### Changing diagnostic/error handling helpers + +Inspect first: + +- `magicblock-account-cloner/src/account_cloner.rs`; +- `magicblock-account-cloner/src/util.rs::get_tx_diagnostics`; +- `magicblock-committor-service` error types; +- `magicblock-rpc-client` transaction log/CU helpers; +- `magicblock-api/src/errors.rs` and `magicblock-accounts/src/errors.rs` conversions. + +Risks: + +- hiding commit/table errors needed for operator debugging; +- adding slow diagnostics to latency-sensitive clone paths; +- changing public error strings used by tests or logs. + +## Tests and validation + +For documentation-only changes involving this guide: + +```bash +ls .agents/context/crates/magicblock-account-cloner.md +rg "magicblock-account-cloner.md" AGENTS.md .agents/context/crate-map.md +``` + +For Rust changes in this crate, run at minimum: + +```bash +cargo fmt +cargo clippy -p magicblock-account-cloner --all-targets -- -D warnings +cargo nextest run -p magicblock-account-cloner +``` + +For changes affecting Chainlink request construction or account availability, also run focused Chainlink checks: + +```bash +cargo nextest run -p magicblock-chainlink +``` + +For integration behavior, especially account/program clone flows or post-delegation actions, use the cloning suite: + +```bash +cd test-integration +make test-cloning +``` + +Useful focused integration areas include: + +- `test-integration/test-cloning/tests/05_parallel-cloning.rs` for concurrent clone pressure, +- `test-integration/test-cloning/tests/08_multi_program_cloning.rs` for program cloning, +- `test-integration/test-cloning/tests/10_post_delegation_token_transfer.rs` for post-delegation actions. + +Broader baseline validation remains the repository standard from `.agents/rules/testing-and-validation.md`: + +```bash +cargo fmt +cargo clippy --workspace --all-targets -- -D warnings +cargo nextest run --workspace +``` + +Performance validation expectations: + +- Documentation-only changes have no runtime performance impact. +- Account/program clone changes should report whether they add transactions, serialization, allocations, logging, scheduler waits, cleanup work, or remote diagnostics to clone hot paths. +- Changes to parallel clone behavior, chunk sizing, or program loading should include a focused cloning integration check or a reason why it was not run. + +## Related docs + +- `AGENTS.md` — required agent workflow and documentation-memory rules. +- `.agents/context/overview.md` — validator runtime model and core concepts. +- `.agents/specs/validator-specification.md` — account cloning, delegated account representation, program cloning, and Magic Program clone instruction notes. +- `.agents/context/architecture.md` — account synchronization layer and service boundaries. +- `.agents/context/crate-map.md` — crate ownership map and pointer back to this guide. +- `.agents/rules/testing-and-validation.md` — repository validation commands and integration-test workflow. +- `.agents/context/crates/magicblock-chainlink.md` — Chainlink-side account synchronization, request construction, delegation, subscription, and fetch/clone pipeline guidance. +- `magicblock-account-cloner/README.md` — high-level cloning overview; verify historical statements against current source. +- `magicblock-chainlink/src/cloner/mod.rs` — trait and request boundary implemented by this crate. +- `programs/magicblock/src/clone_account/` — Magic Program clone processors that consume instructions built here. +- `magicblock-magic-program-api/src/instruction.rs` — clone/finalize instruction wire types. +- `test-integration/test-cloning/` — integration tests for cloning behavior. diff --git a/.agents/context/crates/magicblock-accounts-db.md b/.agents/context/crates/magicblock-accounts-db.md new file mode 100644 index 000000000..12725c1f6 --- /dev/null +++ b/.agents/context/crates/magicblock-accounts-db.md @@ -0,0 +1,315 @@ +# `magicblock-accounts-db` + +## Purpose + +`magicblock-accounts-db` is the validator's local account storage crate. It backs execution, RPC reads, account synchronization, replication, and tooling with a persistent, memory-mapped account store plus LMDB secondary indexes. + +At a high level it: + +- stores `AccountSharedData` records in an append-oriented mmap file at `/accountsdb/main/accounts.db`, +- indexes accounts by pubkey, owner/program, allocation size, and previous owner metadata in LMDB, +- returns borrowed account views directly from mmap for low-allocation hot-path reads, +- supports atomic batch insertion, account removal, owner-index maintenance, program-account scanning, and repeated-read helpers, +- creates/restores/prunes compressed account snapshots used for rollback and replication bootstrap, +- supports startup maintenance such as stale-bank reset and optional best-effort defragmentation. + +This crate is on multiple performance-sensitive paths: transaction execution account loads/commits, RPC `getAccountInfo`/program-account reads, account cloning, ledger replay, replication snapshot/reset flows, and startup/shutdown persistence. Changes must preserve low-latency indexed lookups, low-allocation mmap reads, bounded LMDB transactions, and exclusive-access requirements around maintenance operations. Do not add avoidable deserialization, cloning, blocking I/O, heavy logging, or long-held locks to account read/write hot paths. + +## Update requirement + +Whenever an agent changes behavior in `magicblock-accounts-db`, or changes another crate in a way that changes AccountsDb flows/contracts, this document must be updated in the same change. Update it for changes to: + +- storage layout, mmap header fields, block sizing, allocation/recycling, or serialized account representation, +- LMDB tables, key/value encodings, owner/program indexes, or cursor/iterator lifetime handling, +- public APIs such as `AccountsDb`, `AccountsReader`, `AccountsScanner`, `AccountsBank`, or `AccountsDbError`, +- snapshot creation, archive validation, external snapshot insertion, rollback, checksum, or pruning behavior, +- startup reset/defragmentation rules and protected account lists, +- config fields consumed from `magicblock-config::config::AccountsDbConfig`, +- safety requirements for borrowed mmap account data, snapshots, checksums, and defragmentation, +- tests or validation commands relevant to this crate, +- performance characteristics of execution/RPC/account-sync/storage hot paths. + +## Where it sits in the repository + +Primary source files: + +| Path | Role | +|---|---| +| `magicblock-accounts-db/src/lib.rs` | Main `AccountsDb` facade, public account APIs, batch insert/rollback, snapshots, reset, defragmentation, checksum, `AccountsReader`, and `AccountsScanner`. | +| `src/storage.rs` | Memory-mapped `accounts.db` file, `StorageHeader`, block allocation, account serialization/deserialization, flush/reload, and defragmentation byte moves. | +| `src/index.rs` | LMDB index manager for pubkey lookup, owner/program scans, owner-change reconciliation, deallocation tracking, and account-move application. | +| `src/index/table.rs` | Thin typed wrapper around LMDB database operations. | +| `src/index/iterator.rs` | LMDB cursor/transaction-backed offset/pubkey iterator for all-account and program-account scans. | +| `src/index/utils.rs` | LMDB environment flags and `AccountOffsetFinder` used by repeated-read `AccountsReader`. | +| `src/snapshot.rs` | Snapshot strategy detection, archive creation/registration, external snapshot insertion, rollback extraction, and pruning. | +| `src/reset.rs` | Protected sysvar/native/Magic Program accounts and startup stale-bank reset allowlist. | +| `src/traits.rs` | `AccountsBank` trait used by chainlink, processor, tests, and mock banks. | +| `src/error.rs` | `AccountsDbError` and logging helper trait. | +| `src/tests.rs`, `src/index/tests.rs` | Unit coverage for account storage, owner indexes, snapshots, rollback, defragmentation, checksum, reset, and index allocation behavior. | +| `magicblock-accounts-db/README.md` | Existing crate overview and safety warning for borrowed mmap state. | + +Main consumers: + +- `magicblock-api` constructs `AccountsDb` during validator startup, may insert external snapshots from replication, optionally defragments on startup, and resets stale accounts before serving work. +- `magicblock-processor` uses `AccountsBank::get_account` for SVM callbacks, writes execution results, and pauses executors before unsafe snapshot creation. +- `magicblock-aperture` serves RPC reads and transaction submission paths from local account state. +- `magicblock-chainlink` and `magicblock-account-cloner` read/write local clones and evict/update accounts through `AccountsBank`. +- `magicblock-replicator` snapshots, resets, and replays state for replica mode. +- `test-kit`, integration tests, and tools such as `tools/ledger-stats` open/read the database for harnesses and inspection. + +Important upstream dependencies: + +- `magicblock-config::config::AccountsDbConfig` supplies `database_size`, `block_size`, `index_size`, `max_snapshots`, `defragment_on_startup`, and `reset`. +- `solana_account::AccountSharedData` supplies the owned/borrowed account representation and MagicBlock flags (`delegated`, `ephemeral`, `undelegating`, `confined`, owner-changed tracking). +- `magicblock-magic-program-api` and Solana SDK IDs define protected accounts kept during startup reset. + +## Public API shape / Main public types and APIs + +### `AccountsDb` + +`AccountsDb` coordinates three subsystems: `AccountsStorage`, `AccountsDbIndex`, and `SnapshotManager`. + +Important constructors and helpers: + +- `AccountsDb::new(config, root_dir, max_slot)`: opens or creates `/accountsdb/main`, honors `config.reset`, initializes storage/index/snapshots, and calls `restore_state_if_needed(max_slot)`. +- `AccountsDb::open(directory)`: tooling/test helper using default config. +- `database_directory()`: returns the snapshots directory (the parent of `main`). + +Important account APIs: + +- `insert_account(pubkey, account)`: upserts one account and commits the LMDB transaction. +- `insert_batch(accounts)`: upserts many accounts and rolls back committed `AccountSharedData` values on failure via `unsafe { rollback() }`. +- `get_account(pubkey)`: provided by `AccountsBank`; returns an `AccountSharedData` that is often borrowed from mmap. +- `remove_account(pubkey)`, `remove_where(predicate)`, `contains_account(pubkey)`, `account_count()`. +- `get_program_accounts(program, filter)`: scans the owner/program index and reads matching accounts without a full-db deserialization pass. +- `account_matches_owners(account, owners)`: checks owner bytes directly through the account buffer. +- `reader()`: creates an `AccountsReader` with one LMDB read transaction/cursor for repeated lookups. +- `iter_all()`: iterates all indexed accounts by LMDB account table order. + +Lifecycle and maintenance APIs: + +- `slot()` / `set_slot(slot)`: read/update the slot in the mmap header. +- `unsafe take_snapshot(slot) -> checksum`: flushes storage/index, computes checksum, copies/reflinks active state, and spawns background archive registration. +- `restore_state_if_needed(target_slot)`: rolls back to the nearest snapshot when current state is ahead of the target. +- `insert_external_snapshot(slot, archive_bytes)`: registers or fast-forwards from a network-provided snapshot only when current DB slot is `0`. +- `reset_bank(validator_id)`: removes stale non-delegated/non-protected accounts after startup/replay. +- `unsafe defragment()`: compacts live allocations leftward and clears deallocation holes. This is best-effort and not crash-recoverable. +- `unsafe checksum()`: hashes active delegated/ephemeral/undelegating/confined borrowed accounts in key-sorted order. +- `flush()` and `storage_size()`. + +### `AccountsBank` trait + +`src/traits.rs` defines the narrow bank interface consumed by account sync and execution code: + +```rust +pub trait AccountsBank: Send + Sync + 'static { + fn get_account(&self, pubkey: &Pubkey) -> Option; + fn remove_account(&self, pubkey: &Pubkey); + fn remove_where( + &self, + predicate: impl FnMut(&Pubkey, &AccountSharedData) -> bool, + ) -> AccountsDbResult; +} +``` + +Keep this trait small. It is implemented by production `AccountsDb` and test stubs; broadening it can force account-sync, processor, and test crates to grow storage-specific knowledge. + +### `AccountsReader` and `AccountsScanner` + +- `AccountsReader::read(pubkey, reader)` and `contains(pubkey)` reuse one LMDB cursor/transaction for repeated reads. It is marked `Send`/`Sync`; cursor/transaction lifetime and drop-order assumptions in `index/utils.rs` must remain valid. +- `AccountsScanner` is the iterator returned by `get_program_accounts`; it holds an index iterator and reads accounts lazily from storage, applying the provided filter. + +### Errors + +`AccountsDbError` normalizes IO, LMDB, missing-snapshot, and internal errors. `lmdb::Error::NotFound` maps to `AccountsDbError::NotFound`; callers often rely on missing accounts not being fatal. + +## Runtime flows + +### Normal account upsert/read flow + +1. A consumer calls `insert_account` or `insert_batch` with `AccountSharedData`. +2. `AccountsDb::upsert` opens/reuses one LMDB write transaction. +3. Closed ephemeral accounts (`account.ephemeral() && owner == Pubkey::default()`) are removed from indexes instead of written. +4. Borrowed accounts are committed in place. If the owner changed, `AccountsDbIndex::ensure_correct_owner` repairs owner/program secondary indexes first. +5. Owned accounts are serialized to a recycled hole when possible, otherwise to a new mmap allocation from `AccountsStorage::allocate`. +6. `AccountsDbIndex::upsert_account` writes pubkey -> allocation, owner -> `(offset, pubkey)`, owner metadata, and records old allocations as recyclable holes. +7. Reads use the pubkey index to resolve an offset and `AccountsStorage::read_account` to deserialize a borrowed account from mmap. + +Pitfalls: + +- Borrowed account data may point directly into mmap. Do not hold borrowed accounts across concurrent writes/maintenance unless account locking or explicit exclusive access makes it safe. +- Owner changes must update both `owners` and `programs` indexes; otherwise `get_program_accounts` and `account_matches_owners` diverge from `get_account`. +- `insert_batch` expects the iterator to be cloneable so it can rollback previously committed accounts on failure. + +### Program-account scan flow + +1. `get_program_accounts(program, filter)` opens an LMDB read transaction and creates an `OffsetPubkeyIter` over duplicate values in the `programs` table. +2. Each iterator item yields `(offset, pubkey)` encoded in the program index. +3. `AccountsScanner::next` reads the account at that offset and applies the caller filter. + +This is an RPC/account-query hot path. Avoid changing it to full-db scans or per-account LMDB transactions unless there is no safer option. + +### Snapshot and rollback flow + +```text +processor superblock / replication bootstrap + -> pause execution or otherwise guarantee no concurrent state transitions + -> unsafe AccountsDb::take_snapshot(slot) + -> flush mmap + LMDB + -> unsafe checksum over active ER-state accounts + -> create snapshot directory using reflink when available, legacy deep copy otherwise + -> background thread archives snapshot-.tar.gz and registers it +``` + +Rollback (`restore_state_if_needed`) chooses the nearest snapshot at or before the target slot, extracts it, atomically swaps it into `main`, prunes invalidated newer snapshots, and reloads storage/index handles. + +Pitfalls: + +- `take_snapshot` and `checksum` are `unsafe` because callers must guarantee state cannot change while the snapshot/checksum observes storage. `magicblock-processor` pauses executors before snapshotting. +- Snapshot archiving runs in a background thread. Tests wait for `snapshot_exists`; production callers must not assume the archive is registered synchronously after `take_snapshot` returns. +- Archive validation only checks that bytes are a valid gzip tar; it does not prove the contained accounts DB is semantically valid. +- `insert_external_snapshot` fast-forwards only for an uninitialized DB (`current_slot == 0`) and refuses duplicate archive slots. + +### Startup reset and defragmentation flow + +1. `magicblock-api` initializes ledger, opens `AccountsDb`, replays ledger if needed, then optionally calls `unsafe defragment()` before enabling normal scheduler/replication/tick/task work. +2. Non-replica modes call `reset_bank(validator_id)` after replay to remove stale ordinary accounts while preserving delegated, undelegating, ephemeral, confined, feature-owned, sysvar/native, Magic Program, validator identity, and other protected accounts. +3. Primary mode sends a replication reset message so replicas perform reset in stream order. + +Defragmentation updates indexes before moving bytes, then copies live allocations leftward, zeroes the old tail, flushes storage and index, and clears the deallocation table. + +Pitfalls: + +- Defragmentation is not crash-recoverable; interruption after index updates can make storage inconsistent. +- Defragmentation must not run while any reader/writer or borrowed account reference is live. +- `reset_bank` is a lifecycle operation, not a generic garbage collector. Do not remove protected or lifecycle-marked accounts casually. + +## Important internals and caveats + +### Memory-mapped storage layout + +`accounts.db` starts with a 256-byte `StorageHeader` containing atomic `write_cursor`, `slot`, `block_size`, `capacity_blocks`, and `recycled_count`. Account bytes begin immediately after the header. `block_size` is fixed when the database is created and must be one of `Block128`, `Block256`, or `Block512`. + +Allocation uses `AtomicU64::fetch_add` on the write cursor. If capacity is exceeded the cursor is best-effort rolled back with `compare_exchange`; callers must still treat `Database full` as a hard insert failure. + +### LMDB indexes + +`AccountsDbIndex` owns four LMDB tables: + +| Table | Key | Value | Purpose | +|---|---|---|---| +| `accounts-idx` | account pubkey | `(offset, blocks)` | Fast account lookup. | +| `programs-idx` | owner pubkey | duplicate `(offset, pubkey)` | Program-account scans. | +| `deallocations-idx` | block count | duplicate `(offset, blocks)` | Best-fit-ish allocation recycling and split holes. | +| `owners-idx` | account pubkey | owner pubkey | Owner-change cleanup for `programs-idx`. | + +The byte packing macro uses unaligned reads/writes for compactness. Any layout change is a persistence compatibility change and must update tests and migration/restore expectations. + +### Self-referential cursor wrappers + +`OffsetPubkeyIter` and `AccountOffsetFinder` store LMDB transactions and cursors in one struct using `unsafe transmute` plus field-drop-order assumptions. If these structs are edited, preserve the invariant that iter/cursor drops before the transaction it borrows from. + +### Snapshot strategy + +`SnapshotManager` detects filesystem reflink support once and prefers CoW directory copies. On filesystems without reflink, legacy copy captures active mmap bytes into memory for `accounts.db` to avoid copying stale on-disk bytes. Large active databases can make this path memory- and I/O-heavy; report this risk for snapshot changes. + +## Important invariants + +1. **Borrowed mmap account safety:** borrowed `AccountSharedData` must not be held across concurrent mutation, reset, snapshot checksum, or defragmentation unless external synchronization guarantees exclusivity. +2. **Index/storage consistency:** every live account in `accounts-idx` must point to a valid storage allocation, have an `owners-idx` entry, and have exactly one matching owner entry in `programs-idx`. +3. **Owner changes must repair secondary indexes:** updates where `account.owner_changed()` is true must call `ensure_correct_owner` before committing borrowed data. +4. **Closed ephemeral accounts are removed:** an ephemeral account with default owner represents closure and must be removed from indexes rather than serialized as live state. +5. **Snapshots/checksums require quiescence:** callers of `take_snapshot` and `checksum` must ensure no concurrent state transitions mutate accountsdb. +6. **Defragmentation requires exclusive access and is best-effort:** do not run it after scheduler/RPC/replication work can hold accounts; do not treat it as an atomic migration. +7. **Startup reset must preserve lifecycle/protected accounts:** delegated, undelegating, ephemeral, confined, feature-owned, sysvar/native, Magic Program, and validator identity accounts must survive reset. +8. **External snapshots do not overwrite initialized databases:** `insert_external_snapshot` must not replace a DB whose slot is already greater than zero. +9. **Hot-path reads must remain indexed and low allocation:** avoid full scans, repeated LMDB transaction setup inside loops, unnecessary owned-account conversion, or serialization in transaction/RPC read paths. +10. **Persistence layout changes are compatibility-sensitive:** storage header fields, account serialization assumptions, snapshot archive shape, and LMDB table encodings affect restart, rollback, replicas, and tools. + +## Common change areas and what to inspect + +### Account read/write behavior + +Start with `src/lib.rs::upsert`, `AccountsBank for AccountsDb`, `src/storage.rs::read_account`, and `src/index.rs::upsert_account`. Inspect processor commit paths and chainlink/cloner callers if semantics change. Verify owner indexes, ephemeral closure, allocation recycling, and rollback behavior. + +### Program-account/RPC scan behavior + +Start with `get_program_accounts`, `AccountsScanner`, `src/index/iterator.rs`, and `programs-idx` updates. Check `magicblock-aperture` RPC callers and tests that rely on owner filtering. Preserve lazy indexed scans. + +### Snapshot, rollback, and replication bootstrap + +Start with `src/snapshot.rs`, `AccountsDb::take_snapshot`, `restore_state_if_needed`, `insert_external_snapshot`, and `magicblock-processor/src/scheduler/mod.rs::handle_superblock`. Check `magicblock-api` startup snapshot insertion for replicas. Validate archive registration timing and rollback pruning. + +### Startup cleanup and maintenance + +Start with `reset_bank`, `src/reset.rs`, `defragment`, and `magicblock-api/src/magic_validator.rs::start`. Preserve protected accounts and lifecycle flags. Any new Magic Program/sysvar/native account that must survive reset belongs in `protected_accounts` and tests. + +### Config changes + +Start with `magicblock-config/src/config/accounts.rs`, config tests, and `AccountsDb::new`. Changing `database_size`, `block_size`, `index_size`, `max_snapshots`, `defragment_on_startup`, or `reset` behavior affects startup and persistence. Keep TOML/env naming aligned with `serde(rename_all = "kebab-case")` and existing config tests. + +### Unsafe/lifetime code + +Start with `src/storage.rs`, `src/index/iterator.rs`, `src/index/utils.rs`, `AccountSharedData::{serialize_to_mmap, deserialize_from_mmap}`, `take_snapshot`, `checksum`, and `defragment`. Require narrow reasoning and targeted tests for any unsafe edit. + +## Tests and validation + +For documentation-only changes, verify paths and cross-references are correct. For this guide, useful checks are: + +```bash +test -f .agents/context/crates/magicblock-accounts-db.md +grep -n "magicblock-accounts-db.md" AGENTS.md .agents/context/crate-map.md +``` + +For code changes in this crate, run targeted checks first: + +```bash +cargo fmt +cargo nextest run -p magicblock-accounts-db +``` + +If `cargo nextest` is unavailable, use: + +```bash +cargo test -p magicblock-accounts-db +``` + +For changes touching config parsing, also run: + +```bash +cargo nextest run -p magicblock-config accountsdb +``` + +For changes touching execution commits, snapshots, reset, or replication lifecycle, add focused consumer checks such as: + +```bash +cargo nextest run -p magicblock-processor +cd test-integration && make test-restore-ledger +``` + +Before handing off any Rust change, follow the workspace baseline in `.agents/rules/testing-and-validation.md` when practical: + +```bash +cargo fmt +cargo clippy --workspace --all-targets -- -D warnings +cargo nextest run --workspace +``` + +Performance-sensitive expectations: + +- For read/write/index changes, reason about allocations, LMDB transaction count, lock/transaction lifetime, and mmap deserialization. Run a focused benchmark or representative test if available; otherwise report that performance was not measured. +- For snapshot/defragment changes, report I/O, memory, and startup/shutdown impact, especially on non-reflink filesystems. +- For security/correctness, explicitly confirm signer/authority behavior is untouched, base-layer sync semantics are not weakened, and no new attacker-triggerable stall/resource-exhaustion path was introduced through untrusted RPC/transaction-triggered account operations. + +## Related docs + +- `AGENTS.md` for required agent guidance and documentation update rules. +- `.agents/context/overview.md` for validator runtime context. +- `.agents/rules/validator-goals.md` for security, correctness, performance, and persistence goals. +- `.agents/specs/validator-specification.md` for account cloning, execution, snapshots/recovery, and writable-account invariants. +- `.agents/context/architecture.md` for local persistence and execution/account-sync boundaries. +- `.agents/context/crate-map.md` for crate ownership and consumers. +- `.agents/rules/testing-and-validation.md` for repository validation workflow. +- `magicblock-accounts-db/README.md` for the crate's existing overview and mmap borrowed-state warning. +- `magicblock-config/src/config/accounts.rs` for accountsdb config fields. +- `magicblock-api/src/magic_validator.rs` and `magicblock-processor/src/scheduler/mod.rs` for startup, reset, defragmentation, and snapshot call sites. diff --git a/.agents/context/crates/magicblock-accounts.md b/.agents/context/crates/magicblock-accounts.md new file mode 100644 index 000000000..28f2fd824 --- /dev/null +++ b/.agents/context/crates/magicblock-accounts.md @@ -0,0 +1,409 @@ +# `magicblock-accounts` + +## Purpose + +`magicblock-accounts` is the validator-side glue between Magic Program scheduled intents, Chainlink account lifecycle tracking, and the committor service. In the current implementation it does **not** own the account-cloning manager described by its historical README; account fetching/cloning is owned by `magicblock-chainlink` and `magicblock-account-cloner`. This crate's active runtime responsibility is scheduled commit processing. + +At a high level it: + +- exposes the `ScheduledCommitsProcessor` trait used by `magicblock-api`'s slot ticker; +- implements `ScheduledCommitsProcessorImpl`, which drains accepted `ScheduledIntentBundle`s from the Magic Program global transaction scheduler; +- forwards scheduled and recovered intent bundles to `magicblock-committor-service`; +- tracks per-intent metadata needed to signal `ScheduledCommitSent` back into local validator execution after base-layer intent execution finishes; +- notifies Chainlink when accounts are about to be undelegated so base-layer undelegation completion can be watched; +- starts a one-shot recovery pass for persisted pending intent bundles after ledger replay and account-bank reset. + +This crate sits on the commit/undelegation settlement path and interacts with the transaction scheduler. It is not part of ordinary SVM transaction execution, but its ticker-triggered work can affect slot progression, local commit lifecycle state, Chainlink subscription state, and committor throughput. Avoid adding blocking work, unbounded locks, duplicate intent scheduling, or heavy per-intent processing without an explicit performance tradeoff. + +## Update requirement + +Update this document in the same change whenever behavior in `magicblock-accounts` changes, or when another crate changes the flows/contracts consumed here. This file is useful only if it reflects the current implementation. + +Update it for changes to: + +- `ScheduledCommitsProcessor` trait methods or `ScheduledCommitsProcessorImpl::new` wiring; +- scheduled intent draining, metadata tracking, result processing, or `SentCommit` construction; +- pending intent recovery timing, delegation checks, or recovered scheduling behavior; +- Chainlink undelegation notification behavior; +- Magic Program scheduled intent types, `TransactionScheduler` global state, `ScheduledCommitSent`, or `SentCommit` fields; +- committor result subscription, broadcast error handling, patched-error/callback-report handling, or execution-output variants; +- startup/shutdown ordering in `magicblock-api` that affects this processor; +- validation commands or integration suites relevant to schedule intents, commits, or undelegation; +- performance characteristics of scheduled commit processing, result handling, or recovery. + +## Where it sits in the repository + +Primary files and nearby contracts: + +| Path | Role | +|---|---| +| `magicblock-accounts/Cargo.toml` | Crate dependencies. Pulls in Chainlink/cloner aliases, committor service, core scheduler types, metrics, and Magic Program scheduled intent types. | +| `magicblock-accounts/README.md` | Historical notes about an `AccountsManager`/`ensure_accounts` design. Treat source and this guide as canonical for current behavior. | +| `magicblock-accounts/src/lib.rs` | Public exports: `config::*`, `traits::*`, `errors`, and `scheduled_commits_processor`. | +| `magicblock-accounts/src/config.rs` | Defines a local `LifecycleMode` enum and `requires_ephemeral_validation`; currently no tracked consumer uses this type. Do not confuse it with `magicblock-config::config::LifecycleMode`, which is the validator config type in use. | +| `magicblock-accounts/src/traits.rs` | Defines the `ScheduledCommitsProcessor` async trait consumed by the API slot ticker. | +| `magicblock-accounts/src/scheduled_commits_processor.rs` | Main implementation: scheduled intent draining, committor scheduling, pending recovery, Chainlink undelegation notifications, result subscription loop, and `SentCommit` signaling. | +| `magicblock-accounts/src/errors.rs` | Error enums and result aliases for scheduled commit processing plus older account/committor error variants. | +| `magicblock-api/src/magic_validator.rs` | Production wiring. Constructs `ScheduledCommitsProcessorImpl`, starts recovery after replay/reset, passes it to the slot ticker, and stops it before stopping the committor service. | +| `magicblock-api/src/tickers.rs` | Slot ticker checks `MagicContext::has_scheduled_commits`, executes `AcceptScheduleCommits`, then calls `ScheduledCommitsProcessor::process`. | +| `programs/magicblock/src/magic_context.rs` | `MagicContext` storage and `has_scheduled_commits` fast check used by the slot ticker. | +| `programs/magicblock/src/schedule_transactions/process_accept_scheduled_commits.rs` | Magic Program instruction that moves scheduled intents from `MagicContext` into global `TransactionScheduler` state. | +| `programs/magicblock/src/magic_scheduled_base_intent.rs` | Defines `ScheduledIntentBundle`, `MagicIntentBundle`, and helpers such as `get_all_committed_pubkeys` / `has_undelegate_intent`. | +| `magicblock-committor-service/src/service.rs` | `BaseIntentCommittor` trait and `CommittorService` oneshot APIs used by this crate. | +| `magicblock-committor-service/src/committor_processor.rs` | Persists and schedules normal intents; loads pending intent bundles for recovery; schedules recovered bundles without re-persisting. | +| `test-integration/test-schedule-intent/` and `test-integration/schedulecommit/` | Integration coverage for schedule intent, commit, and commit-and-undelegate behavior. | + +Main consumers: + +- `magicblock-api` is the only production consumer of `ScheduledCommitsProcessorImpl`. +- `magicblock-api::tickers` depends on the `ScheduledCommitsProcessor` trait rather than the concrete implementation. +- The committor service is the downstream executor for base-layer intents. +- Chainlink is notified when accounts are entering undelegation so it can watch/update base-layer state. +- The Magic Program is both upstream (schedules/accepts intents) and downstream (receives local `ScheduledCommitSent` signals). + +Important boundaries: + +- This crate does not fetch or clone transaction accounts for ordinary execution; use `magicblock-chainlink` / `magicblock-account-cloner` for account availability changes. +- This crate does not build base-layer commit transactions; that belongs to `magicblock-committor-service`. +- This crate does not enforce final SVM writable-account access rules; that belongs to the processor/SVM path. + +## Public API shape / Main public types and APIs + +`magicblock-accounts/src/lib.rs` exports: + +```rust +mod config; +pub mod errors; +pub mod scheduled_commits_processor; +mod traits; + +pub use config::*; +pub use traits::*; +``` + +### `ScheduledCommitsProcessor` trait + +Defined in `src/traits.rs`: + +- `async fn process(&self) -> ScheduledCommitsProcessorResult<()>` drains accepted scheduled intents and hands them to the committor; +- `fn scheduled_commits_len(&self) -> usize` returns the count of accepted intents in Magic Program global scheduler state; +- `fn clear_scheduled_commits(&self)` clears that global scheduler state; +- `fn stop(&self)` cancels processor background work. + +The trait is `Send + Sync + 'static` and is used by `magicblock-api/src/tickers.rs` to keep slot ticker code generic. + +### `ScheduledCommitsProcessorImpl` + +Defined in `src/scheduled_commits_processor.rs` and constructed with: + +```rust +pub fn new( + committor: Arc, + chainlink: Arc, + internal_transaction_scheduler: TransactionSchedulerHandle, + latest_block: impl LatestBlockProvider, +) -> Self +``` + +Stored state: + +- `committor: Arc` schedules intent bundles and provides result broadcasts / pending persisted intents; +- `chainlink: Arc` receives `undelegation_requested(pubkey)` calls before commit-and-undelegate execution; +- `cancellation_token: CancellationToken` stops the result-processing loop; +- `intents_meta_map: Arc>>` maps intent IDs to local metadata needed when committor results return; +- `transaction_scheduler: magicblock_program::TransactionScheduler` accesses the Magic Program's global scheduled action store. + +Important public method outside the trait: + +- `spawn_pending_intents_recovery(self: &Arc)` starts a one-shot task that loads persisted pending intent bundles from committor storage, filters them through Chainlink delegation checks, and schedules recoverable bundles. It must run only after ledger replay and account-bank reset. + +### Type aliases + +- `InnerChainlinkImpl = ProdInnerChainlink` +- `ChainlinkImpl = ProdChainlink` + +These encode the current production Chainlink/cloner stack. If Chainlink generic wiring changes, update these aliases and this guide together. + +### Errors + +`ScheduledCommitsProcessorError` wraps: + +- `tokio::sync::oneshot::error::RecvError` when committor oneshot requests fail; +- boxed `CommittorServiceError` from scheduling, recovery, or result-subscription operations. + +`AccountsError` still contains broader account/committor/cloner variants from older APIs. Check whether a variant is actually consumed before relying on it for new behavior. + +## Runtime flows + +### Normal scheduled commit flow + +```text +program invokes Magic Program schedule instruction + -> MagicContext stores ScheduledIntentBundle(s) + -> slot ticker sees MagicContext::has_scheduled_commits + -> validator-signed AcceptScheduleCommits transaction runs locally + -> Magic Program moves bundles into global TransactionScheduler state + -> ScheduledCommitsProcessorImpl::process drains them + -> committor service persists/schedules base-layer intent execution + -> committor broadcasts result + -> processor registers SentCommit and schedules local ScheduledCommitSent transaction +``` + +Ordered details: + +1. User/program code schedules commit, commit-and-undelegate, commit-finalize, and/or action intent bundles through the Magic Program. +2. `magicblock-api::init_slot_ticker` periodically reads `MAGIC_CONTEXT_PUBKEY` from `AccountsDb` and calls `MagicContext::has_scheduled_commits`. +3. If there are pending scheduled commits, `handle_scheduled_commits` builds `InstructionUtils::accept_scheduled_commits(latest_block.blockhash)` and submits it through the internal `TransactionSchedulerHandle`. +4. `process_accept_scheduled_commits` validates the validator authority signer, drains `MagicContext::scheduled_base_intents`, and calls `TransactionScheduler::default().accept_scheduled_base_intent(...)`. +5. `ScheduledCommitsProcessorImpl::process` calls `take_scheduled_intent_bundles()` on its `magicblock_program::TransactionScheduler` handle. Empty drains are no-ops. +6. For non-empty drains it increments `magicblock_metrics::metrics::inc_committor_intents_count_by` and calls `process_intent_bundles`. +7. `prepare_intent_bundles_for_scheduling` stores `ScheduledBaseIntentMeta` for every intent ID and gathers pubkeys from undelegation intents. +8. `process_undelegation_requests` concurrently calls `chainlink.undelegation_requested(pubkey)` for gathered pubkeys; failures are logged but do not abort the commit. +9. The committor receives the bundles via `CommittorService::schedule_intent_bundles` and returns through a oneshot when scheduling is accepted. +10. The background `result_processor` receives `BroadcastedIntentExecutionResult`s from the committor broadcast channel. +11. `process_intent_result` removes the intent metadata, builds/registers a `SentCommit`, creates or reuses the `ScheduledCommitSent` transaction, encodes it with `with_encoded`, and submits it to the internal scheduler. + +Caveats: + +- The slot ticker has a TODO about possible delay between accepting and processing scheduled commits. Do not add extra sleeps or slow work to this path. +- `process_undelegation_requests` logs subscription failures and continues. That can leave undelegating accounts in a problematic local state; changing this policy is a lifecycle decision, not a local cleanup. +- `intents_meta_map` is keyed by intent ID. Duplicate IDs or unexpected duplicate results can cause missing metadata and are logged as errors. + +### Pending intent recovery flow + +```text +validator start/restart + -> ledger replay + -> account-bank reset/cleanup + -> spawn_pending_intents_recovery + -> committor loads persisted pending bundles + -> Chainlink verifies accounts delegated on base and ER + -> recoverable bundles schedule through committor without re-persisting +``` + +Ordered details: + +1. `magicblock-api::MagicValidator::start` processes ledger replay and clears Magic Program global scheduled actions before starting the normal slot ticker. +2. After account-bank reset, and only in the branch where replay/reset is performed, the API calls `processor.spawn_pending_intents_recovery()`. +3. `recover_pending_intents` asks the committor for `get_pending_intent_bundles().await??`. +4. `recoverable_intent_bundles` checks every bundle's committed pubkeys with `chainlink.accounts_delegated_on_base_and_er(&pubkeys, AccountFetchOrigin::GetAccount)`. +5. Bundles with any non-delegated account, or bundles whose checks error, are skipped and logged. +6. Recoverable bundles go through `process_intent_bundles` with `CommittorService::schedule_recovered_intent_bundles`, which schedules without re-persisting rows. +7. If scheduling recovered bundles fails, metadata for those intent IDs is removed to avoid stale entries. + +Caveats: + +- Recovery must happen after replay/reset because Chainlink delegation checks read local bank state. +- Recovery currently only runs when `MagicValidator::start` enters the replay/reset path. Changing startup branches can affect pending intent durability. +- Filtering requires all committed accounts in a bundle to be delegated on both base and ER; this protects against re-sending stale or already-undelegated work. + +### Result-to-local-signal flow + +When the committor completes an intent: + +1. `BroadcastedIntentExecutionResult` includes the intent ID, success/error output, patched errors, and callback scheduling report. +2. `ScheduledBaseIntentMeta` supplies the original slot, blockhash, payer, committed pubkeys, optional prebuilt `sent_transaction`, and undelegation flag. +3. `build_sent_commit` converts execution output into chain signatures: + - `ExecutionOutput::SingleStage(signature)` becomes one signature; + - `ExecutionOutput::TwoStage { commit_signature, finalize_signature }` becomes two signatures; + - errors try to expose any available commit/finalize signatures through `err.signatures()`. +4. Patched errors and callback scheduling results are stringified into `SentCommit` fields. +5. `register_scheduled_commit_sent(sent_commit)` stores the result for the Magic Program's local `ScheduledCommitSent` processor. +6. The internal transaction scheduler executes the validator-signed `ScheduledCommitSent` transaction so local execution can observe the sent-commit result. + +If the original intent did not carry a signed `sent_transaction`, the processor builds a new one with the current latest blockhash. This fallback is important for recovered intents reconstructed from persistence. + +### Shutdown flow + +`MagicValidator::stop` cancels the validator token, then calls `scheduled_commits_processor.stop()` before `committor_service.stop()`. Preserve this ordering: the result processor must be told to stop while the committor is still available enough to shut down cleanly, and the committor is intentionally stopped last among these services. + +## Important internals and caveats + +### Magic Program global scheduler state + +`magicblock_program::TransactionScheduler::default()` is used as a handle to global scheduled action state. The `AcceptScheduleCommits` instruction writes into that state, and `ScheduledCommitsProcessorImpl::process` drains it. During ledger replay, `magicblock-api` clears this state to avoid re-committing accepted intents replayed from the ledger. + +Do not replace this with an ordinary per-instance queue unless the Magic Program and replay semantics are updated together. + +### Metadata map and locking + +`intents_meta_map` is protected by a standard `Mutex`. Current critical sections are intentionally small: insert metadata and remove metadata by intent ID. Avoid holding this lock across `.await`, committor calls, Chainlink calls, scheduler execution, or expensive logging/formatting. + +The code uses `expect(POISONED_MUTEX_MSG)` in the normal processing paths. One recovery cleanup path handles poisoning by logging and taking the inner map. Treat mutex poisoning as a serious invariant violation. + +### Undelegation notifications are best-effort + +Before scheduling an intent bundle, the processor calls `chainlink.undelegation_requested` for accounts from commit-and-undelegate intents. Errors are aggregated and logged but do not fail scheduling. This favors settlement progress over local watcher correctness; if you change it, document how accounts that are already locally immutable/undelegating recover from missed base-layer subscriptions. + +### Historical README/API drift + +The README describes `AccountsManager`, `ExternalAccountsManager`, `BankAccountProvider`, `RemoteAccountCloner`, `Transwise`, and `ensure_accounts`. These are not present in the current crate source. Do not implement new features against those names without first checking current Chainlink/cloner/API ownership and updating/removing stale docs. + +### Local `LifecycleMode` drift + +`magicblock-accounts/src/config.rs` defines a `LifecycleMode` separate from `magicblock-config::config::LifecycleMode`. Current repository usages of validator lifecycle mode use `magicblock-config`, not this local type. Avoid adding new configuration wiring through the local enum unless that duplication is intentional and documented. + +## Important invariants + +1. `process()` must drain only accepted scheduled intent bundles from Magic Program global scheduler state; it must not re-read `MagicContext` directly. +2. `AcceptScheduleCommits` must be executed before `process()` so MagicContext intents are moved and cleared atomically by the Magic Program. +3. Intent metadata must be inserted before committor scheduling so result processing can build a correct `SentCommit`. +4. Intent metadata must be removed on result handling and on failed recovered scheduling; stale entries can make later duplicate IDs or results misleading. +5. Recovery must run only after ledger replay and local account-bank reset, because delegation checks depend on current local account state. +6. Recovered bundles must be filtered so all committed accounts are delegated on base and ER before scheduling. +7. Recovered bundles must use the committor recovered scheduling path, not the normal persistence path, to avoid duplicating persisted rows. +8. Commit-and-undelegate intents must notify Chainlink before scheduling whenever possible so base-layer undelegation completion can be tracked. +9. The result processor must not block indefinitely on slow local scheduler work without observing cancellation between results. +10. Broadcast lag from committor results is unexpected and requires investigation; silently dropping lagged results would leave local sent-commit state incomplete. +11. `SentCommit` fields must stay aligned with Magic Program `ScheduledCommitSent` expectations, including chain signatures, patched errors, callback reports, included pubkeys, payer, slot, blockhash, and undelegation flag. +12. Documentation and code must not describe this crate as the active account ensure/cloning owner unless that behavior is restored in source. + +## Common change areas and what to inspect + +### Changing scheduled commit acceptance or slot ticker behavior + +Inspect first: + +- `magicblock-api/src/tickers.rs` (`init_slot_ticker`, `handle_scheduled_commits`); +- `programs/magicblock/src/magic_context.rs` (`has_scheduled_commits`, `take_scheduled_commits`); +- `programs/magicblock/src/schedule_transactions/process_accept_scheduled_commits.rs`; +- `magicblock-accounts/src/scheduled_commits_processor.rs::process`. + +Risks: + +- accepting without processing can leave intents stranded in global scheduler state; +- processing without accepting misses MagicContext-staged intents; +- replay can re-populate global scheduled actions unless startup clear semantics are preserved. + +### Changing committor scheduling or result handling + +Inspect first: + +- `ScheduledCommitsProcessorImpl::process_intent_bundles`; +- `prepare_intent_bundles_for_scheduling`; +- `result_processor`, `process_intent_result`, and `build_sent_commit`; +- `magicblock-committor-service/src/service.rs` `BaseIntentCommittor` methods; +- `magicblock-committor-service/src/intent_execution_manager/intent_execution_engine.rs` result fields. + +Risks: + +- missing result subscriptions can prevent local `ScheduledCommitSent` signaling; +- new `ExecutionOutput` variants must be converted into `SentCommit.chain_signatures` deliberately; +- callback or patched-error fields are user/operator-visible through Magic Program result state. + +### Changing undelegation lifecycle behavior + +Inspect first: + +- `prepare_intent_bundles_for_scheduling` and `process_undelegation_requests`; +- `ScheduledIntentBundle::get_undelegate_intent_pubkeys` / `has_undelegate_intent`; +- `magicblock-chainlink::undelegation_requested`; +- Magic Program code that marks accounts undelegating/immutable when scheduling commit-and-undelegate. + +Risks: + +- missing Chainlink notification can leave local state out of sync with base-layer undelegation; +- failing the whole commit on notification errors may affect settlement availability; +- ordinary ER execution must not continue mutating accounts after commit-and-undelegate has been scheduled. + +### Changing pending intent recovery + +Inspect first: + +- `spawn_pending_intents_recovery`, `recover_pending_intents`, `recoverable_intent_bundles`, and `process_recovered_intent_bundles`; +- `magicblock-api/src/magic_validator.rs` startup ordering around replay/reset; +- `magicblock-committor-service/src/committor_processor.rs::pending_intent_bundles`; +- Chainlink `accounts_delegated_on_base_and_er` behavior. + +Risks: + +- running recovery before local bank repair can schedule invalid or stale commits; +- skipping delegation checks can re-send intents for accounts no longer delegated to this ER; +- normal scheduling can duplicate persistence rows for recovered work. + +### Cleaning up stale account-manager remnants + +Inspect first: + +- `magicblock-accounts/README.md`; +- `src/config.rs` and `src/errors.rs` for unused historical APIs; +- `.agents/context/crate-map.md` and this guide; +- current owners in `magicblock-chainlink`, `magicblock-account-cloner`, and `magicblock-api`. + +Risks: + +- deleting exported symbols can be a public API break even if no current workspace consumer uses them; +- account availability behavior belongs outside this crate in the current architecture. + +## Tests and validation + +For documentation-only changes: + +```bash +git diff --check -- .agents/context/crates/magicblock-accounts.md .agents/context/crate-map.md AGENTS.md +``` + +Also verify: + +- `.agents/context/crates/magicblock-accounts.md` exists; +- `.agents/context/crate-map.md` points future agents to this guide; +- `AGENTS.md` mentions the new crate guide if the example crate-guide list is kept current; +- no files under `prompts/**` are staged or committed. + +For Rust changes in this crate, run targeted checks first: + +```bash +cargo fmt +cargo clippy -p magicblock-accounts --all-targets -- -D warnings +cargo nextest run -p magicblock-accounts +``` + +For changes that affect scheduled commits, pending recovery, or undelegation, also run relevant adjacent checks when practical: + +```bash +cargo nextest run -p magicblock-api +cargo nextest run -p magicblock-program process_schedule_commit +cargo nextest run -p magicblock-committor-service +``` + +Integration suites for end-to-end behavior: + +```bash +cd test-integration +make test-schedule-intents +make test-committor +make test-task-scheduler +``` + +Use narrower committor targets from `.agents/rules/testing-and-validation.md` when full committor coverage is too slow but the change is localized. + +Broader baseline validation remains the repository standard from `.agents/rules/testing-and-validation.md`: + +```bash +cargo fmt +cargo clippy --workspace --all-targets -- -D warnings +cargo nextest run --workspace +``` + +Performance validation expectations: + +- Documentation-only changes have no runtime performance impact. +- Changes to slot-ticker commit processing, intent metadata handling, Chainlink undelegation notification, or result handling should report whether they add work to periodic slot processing, committor scheduling, or transaction-scheduler submission. +- If a change can increase commit latency or result-processing lag, validate with schedule-intent/committor tests or a targeted measurement and report any unmeasured risk. + +## Related docs + +- `AGENTS.md` — required agent workflow and documentation-memory rules. +- `.agents/context/overview.md` — validator runtime model and core concepts. +- `.agents/specs/validator-specification.md` — commit, undelegation, committor service, and recovery behavior. +- `.agents/context/architecture.md` — account synchronization and base-layer settlement boundaries. +- `.agents/context/crate-map.md` — crate ownership map and pointer back to this guide. +- `.agents/rules/testing-and-validation.md` — repository validation commands and integration suite names. +- `.agents/memory/agent-memory-and-docs.md` — rules for keeping agent documentation current. +- `.agents/context/crates/magicblock-account-cloner.md` — current account-cloner guide; useful because this crate aliases the production cloner stack through Chainlink. +- `.agents/context/crates/magicblock-chainlink.md` — Chainlink lifecycle/delegation guide, especially `undelegation_requested` and delegation checks. +- `magicblock-accounts/README.md` — historical summary; verify against source before using. +- `magicblock-api/src/tickers.rs` — slot ticker that invokes this crate. +- `magicblock-api/src/magic_validator.rs` — startup/shutdown and recovery wiring. +- `programs/magicblock/src/schedule_transactions/` — Magic Program scheduling and acceptance processors. +- `magicblock-committor-service/` — base-layer intent execution, persistence, and recovery source. diff --git a/.agents/context/crates/magicblock-aml.md b/.agents/context/crates/magicblock-aml.md new file mode 100644 index 000000000..4a15d539d --- /dev/null +++ b/.agents/context/crates/magicblock-aml.md @@ -0,0 +1,354 @@ +# `magicblock-aml` + +## Purpose + +`magicblock-aml` owns the validator's optional address risk-scoring integration for post-delegation action signers. It wraps the external Range risk API, validates signer risk scores against `magicblock-config`'s `[chainlink.risk]` settings, and persists a small local SQLite cache under the validator ledger path. + +High-level responsibilities: + +- construct an optional `RiskService` from `RiskConfig`; +- reject enabled risk checking when required configuration is invalid or incomplete; +- query Range's `GET /risk/address?network=solana&address=` endpoint with bearer authentication; +- cache address risk scores in `risk-cache.db` with TTL-based freshness; +- deduplicate concurrent in-flight requests for the same address; +- return `RiskError::HighRiskAddresses` when any checked address score is greater than or equal to the configured threshold. + +This crate sits on the `magicblock-chainlink` account synchronization path for accounts with post-delegation actions. It deliberately moves SQLite reads/writes to `spawn_blocking`, but external HTTP calls and cache misses can still delay clone/delegation-action preparation. Keep changes bounded and avoid adding synchronous I/O or unbounded work to Chainlink hot paths. + +## Update requirement + +Update this document in the same change whenever `magicblock-aml` behavior, public APIs, configuration contract, persistence layout, error semantics, or Chainlink integration changes. This guide is useful only if it reflects the current implementation. + +Update it for changes to: + +- `RiskService`, `RiskError`, `RiskResult`, cache aliases, or other exported API shape; +- Range endpoint path, query parameters, auth scheme, response parsing, or expected JSON fields; +- `[chainlink.risk]` config keys/defaults in `magicblock-config` or `config.example.toml`; +- cache filename, schema, TTL handling, write timing, or SQLite concurrency behavior; +- in-flight request deduplication, cache-writer lifecycle, or error cleanup; +- which post-delegation action accounts Chainlink risk-checks; +- validation commands, tests, or performance expectations for risk checks. + +## Where it sits in the repository + +| Path | Role | +|---|---| +| `magicblock-aml/Cargo.toml` | Crate manifest. Depends on `magicblock-config`, `reqwest`, `rusqlite`, `tokio`, `serde_json`, and supporting async/error crates. | +| `magicblock-aml/src/lib.rs` | Entire public API and implementation: service construction, cache reads/writes, in-flight deduplication, Range HTTP fetches, and unit tests. | +| `magicblock-config/src/config/chain.rs` | Defines `RiskConfig` under `ChainLinkConfig::risk`; uses kebab-case TOML/env field names. | +| `magicblock-config/src/consts.rs` | Default Range base URL, cache TTL, request timeout, and score threshold. | +| `config.example.toml` | Operator-facing `[chainlink.risk]` documentation and environment variable names. | +| `magicblock-chainlink/src/chainlink/mod.rs` | Creates `RiskService::try_from_config(&chainlink_config.risk, ledger_path)` during Chainlink initialization when a remote account provider exists. | +| `magicblock-chainlink/src/chainlink/fetch_cloner/mod.rs` | Calls `RiskService::check_addresses` for unique signer pubkeys from post-delegation action instructions before action dependencies are fetched/cloned. | +| `magicblock-chainlink/src/chainlink/errors.rs` | Wraps `RiskError` as `ChainlinkError::RangeRisk`. | + +Main consumers: + +- `magicblock-chainlink` is the only runtime consumer. It stores `Option>` in `FetchCloner` and skips risk checks when the option is `None`. +- `magicblock-config` owns the config model consumed by this crate; do not duplicate config defaults in AML code. + +Important upstream/downstream relationships: + +- Upstream input is `RiskConfig` plus the validator ledger path. +- Runtime addresses come from parsed delegation action account metas in Chainlink, currently only metas with `is_signer = true`. +- Downstream effects are Chainlink clone success or `ChainlinkError::RangeRisk(RiskError::...)`; `magicblock-aml` does not mutate account state directly. + +## Public API shape / Main public types and APIs + +`magicblock-aml` exports everything from `src/lib.rs`. + +### Type aliases + +- `RiskResult = Result` — crate-local result type. +- `PartitionedCache = (Vec<(Option, String)>, Vec<(Option, String)>)` — helper shape used when splitting cached and uncached addresses. + +### Data structs + +- `AddressRiskAssessment { is_high_risk: bool, risk_score: u64 }` — public data shape for risk assessment. It is currently not used by `RiskService::check_addresses`, which returns `Ok(())` or an error instead of per-address assessment data. +- `RiskService` — main service handle. It owns: + - a `reqwest::Client` with configured timeout; + - an `Arc>` for the local SQLite cache; + - an async mutex-protected in-flight map keyed by address string; + - an unbounded channel to a background cache writer task; + - Range `base_url`, API key, cache TTL, and score threshold. + +### Errors + +`RiskError` is part of the public contract and is converted into `ChainlinkError::RangeRisk`: + +- `MissingApiKey` when risk checks are enabled without `api_key`; +- `SqliteInit` for opening or initializing `ledger_path/risk-cache.db`; +- `CacheDirectory` exists in the enum but is not currently emitted by the implementation; +- `Request`, `InvalidJson`, `Sqlite`, `Join` for lower-level failures; +- `RiskScoreNotFound` when the Range response lacks numeric `riskScore`; +- `HighRiskAddresses(Vec)` when checked addresses meet/exceed threshold; +- `InvalidRiskScoreThreshold(u64)` when threshold is greater than `10`; +- `PoisonedLock` for poisoned SQLite mutex; +- `InFlightFetch(String)` when a shared in-flight Range request fails. + +### Constructors and methods + +- `RiskService::try_from_config(config: &RiskConfig, ledger_path: &Path) -> RiskResult>` + - returns `Ok(None)` when `config.enabled == false`; + - requires `api_key` when enabled; + - rejects `risk_score_threshold > 10`; + - opens `ledger_path/risk-cache.db` and creates `address_risk_cache` if needed; + - starts the async cache-writer task; + - trims trailing `/` from `base_url`. +- `RiskService::check_addresses(addresses: Vec) -> RiskResult<()>` + - reads fresh cache entries; + - short-circuits on high-risk cached addresses; + - fetches missing/stale addresses via Range with in-flight deduplication; + - asynchronously writes fetched scores to cache; + - returns `HighRiskAddresses` for fetched scores at or above threshold. + +Private helpers such as `fetch_risk_score`, `read_cache`, `write_cache`, `spawn_cache_writer`, and `now_unix_seconds` are implementation details. Do not make callers bypass `RiskService::check_addresses` unless you are intentionally changing the service contract. + +## Runtime flows + +### Service initialization flow + +```text +magicblock-api startup + -> magicblock-chainlink::InnerChainlink::try_new_from_config + -> RiskService::try_from_config(chainlink_config.risk, ledger_path) + -> FetchCloner::new(..., risk_service: Option>) +``` + +1. `ChainLinkConfig::risk` is loaded by `magicblock-config` from defaults, TOML, env, and CLI-supported layers. +2. Chainlink creates a `RiskService` only if a remote account provider is active and `RiskConfig::enabled` is true. +3. `RiskService::try_from_config` validates the API key and threshold before opening the SQLite cache. +4. The cache table is created if missing: + - `address TEXT NOT NULL PRIMARY KEY` + - `risk_score INTEGER NOT NULL` + - `fetched_at_unix_s INTEGER NOT NULL` +5. A background Tokio task is spawned to receive fetched scores and write them through `spawn_blocking`. +6. `FetchCloner` stores the service as `Option>`; `None` means post-delegation action signer checks are disabled. + +Caveats: + +- The implementation opens `ledger_path/risk-cache.db`; ensure the ledger directory exists before enabling risk checks. +- `CacheDirectory` is currently not used; opening failures surface as `SqliteInit`. +- Enabling risk checks without `api_key` fails Chainlink initialization. + +### Post-delegation signer validation flow + +```text +delegated account with post-delegation actions + -> FetchCloner::clone_account_with_post_delegation_action_invariants + -> ensure_delegation_action_dependencies + -> validate_post_delegation_action_signers + -> RiskService::check_addresses(unique signer strings) +``` + +1. Chainlink parses delegation records and associated post-delegation actions. +2. Before dependency fetch/clone work, `ensure_delegation_action_dependencies` calls `validate_post_delegation_action_signers`. +3. The validator collects all `AccountMeta` pubkeys with `is_signer = true` from each action instruction. +4. Signers are sorted and deduplicated before calling AML. +5. If no `RiskService` is configured, or if no signers exist, validation returns `Ok(())`. +6. `RiskService::check_addresses` must succeed before Chainlink continues to dependency fetching and target cloning. +7. `HighRiskAddresses` or Range/cache errors abort the Chainlink clone path through `ChainlinkError::RangeRisk`. + +Caveats: + +- AML currently checks signer metas only. Non-signer accounts and program IDs are handled by Chainlink dependency rules, not by `RiskService`. +- A cache miss adds external HTTP latency to the account synchronization path for affected delegation-action clones. + +### Address check and cache flow + +1. `check_addresses` calls `read_cache(addresses).await`. +2. `read_cache` runs SQLite reads inside `tokio::task::spawn_blocking` and locks the single `rusqlite::Connection` with `std::sync::Mutex`. +3. Entries older than `cache_ttl` are treated as uncached; stale rows are not deleted immediately. +4. Cached scores are checked first. Any cached score `>= risk_score_threshold` returns `HighRiskAddresses` without fetching uncached addresses. +5. Uncached/stale addresses are fetched concurrently with `try_join_all`. +6. `get_or_insert_in_flight` ensures concurrent callers for the same address share one boxed future. +7. If any fetch fails, AML removes the uncached addresses from the in-flight map and returns the failure. +8. Successful fetched scores are sent to the background cache writer and are immediately checked against the threshold. +9. The cache writer upserts all scores and then removes those addresses from the in-flight map. + +Caveats: + +- Cache writes are asynchronous; tests poll the DB with `assert_eventually_cached` because `check_addresses` can return before the write is visible. +- The cache writer channel is unbounded. Avoid creating call paths that submit unbounded distinct address batches under load. +- A score equal to the threshold is high risk (`>=`, not `>`). + +### Range HTTP flow + +For each uncached address, `fetch_risk_score`: + +1. builds `GET {base_url}/risk/address`; +2. sends query parameters `network=solana` and `address=
`; +3. adds bearer auth using `RiskConfig::api_key`; +4. applies `error_for_status()` to reject non-2xx responses; +5. reads the response body as text; +6. parses JSON and extracts a numeric top-level `riskScore` field. + +This crate does not currently expose retries, backoff, circuit breaking, batch Range requests, or metrics. + +## Important internals and caveats + +### SQLite cache and blocking boundaries + +The cache is a single `rusqlite::Connection` behind `Arc>`. Reads and writes use `spawn_blocking`, which prevents blocking Tokio worker threads but still serializes SQLite access through one connection. Preserve this boundary when changing cache behavior. + +Do not perform SQLite operations directly on async runtime tasks. If the cache schema changes, include a migration or compatibility story for existing `risk-cache.db` files. + +### In-flight deduplication + +The in-flight map stores `Shared>>>` by address. This allows multiple concurrent `check_addresses` calls to await the same Range request. Entries are removed after successful cache writes or after failed fetch batches. + +Be careful when changing error handling: leaving failed futures in the map would cause repeated callers to observe stale failures; removing entries too early would reintroduce duplicate external calls. + +### Threshold and score assumptions + +`RiskConfig::risk_score_threshold` must be in the inclusive `0..=10` range. `try_from_config` only rejects values greater than `10`; `0` means every fetched/cached numeric score is high risk. + +The Range response parser accepts any JSON document with top-level unsigned integer `riskScore`. It does not validate upper bound on returned scores and does not parse nested assessment details. + +### Chainlink ownership boundary + +`magicblock-aml` knows nothing about delegation records, cloned accounts, post-delegation dependency ordering, or local account writes. Chainlink decides when to call AML and which addresses to check. Keep account synchronization and delegation-action invariants in Chainlink; keep external risk lookup and cache behavior in AML. + +## Important invariants + +1. Disabled risk config must return `Ok(None)` and must not require an API key, open SQLite, or spawn background tasks. +2. Enabled risk config must fail fast without an API key and when `risk_score_threshold > 10`. +3. The cache file must remain under the validator ledger path as `risk-cache.db` unless a migration and documentation update are included. +4. SQLite reads/writes must not run directly on async runtime worker threads; keep them behind `spawn_blocking` or an equivalent non-hot-path boundary. +5. Cached scores and fetched scores must use the same high-risk comparison: `score >= risk_score_threshold`. +6. Concurrent requests for the same uncached address must remain deduplicated to avoid avoidable Range API amplification. +7. Failed in-flight fetches must be removed from the in-flight map so later calls can retry. +8. Cache writes may be asynchronous, but fetched scores must still be checked before `check_addresses` returns success. +9. Chainlink must be able to disable AML by passing `None`; AML must not become mandatory for ordinary account cloning. +10. Do not log or expose API keys; keep bearer auth confined to the HTTP request builder. +11. Changes that add latency, retries, batching, or backpressure must explicitly account for Chainlink account-sync hot-path impact. + +## Common change areas and what to inspect + +### Changing Range request or response handling + +Inspect first: + +- `magicblock-aml/src/lib.rs::fetch_risk_score`; +- `magicblock-aml/src/lib.rs` mock server tests; +- `config.example.toml` and `magicblock-config/src/consts.rs` if base URL or auth behavior changes. + +Risks: + +- breaking Range API compatibility; +- changing error classification surfaced through `ChainlinkError::RangeRisk`; +- accidentally logging tokens or full sensitive URLs. + +### Changing config defaults or keys + +Inspect first: + +- `magicblock-config/src/config/chain.rs::RiskConfig`; +- `magicblock-config/src/consts.rs` default constants; +- `magicblock-config/src/tests.rs` config parsing tests; +- `config.example.toml` operator docs; +- `RiskService::try_from_config` validation. + +Risks: + +- TOML/env field names use kebab-case, for example `risk-score-threshold` and `MBV_CHAINLINK__RISK__RISK_SCORE_THRESHOLD`; +- enabling risk by default would add external network calls to account synchronization and requires an API-key story. + +### Changing cache persistence + +Inspect first: + +- `RiskService::try_from_config` table creation; +- `read_cache`, `write_cache`, and `spawn_cache_writer`; +- unit helpers `load_cached_scores` and `assert_eventually_cached`. + +Risks: + +- schema changes can break existing cache files; +- removing `spawn_blocking` can block Tokio runtime threads; +- asynchronous writes mean immediate DB assertions can race unless tests poll. + +### Changing which addresses are checked + +Inspect first: + +- `magicblock-chainlink/src/chainlink/fetch_cloner/mod.rs::validate_post_delegation_action_signers`; +- delegation action parsing in `magicblock-chainlink/src/chainlink/fetch_cloner/delegation.rs`; +- post-delegation action clone tests in `magicblock-chainlink/src/chainlink/fetch_cloner/tests.rs`. + +Risks: + +- checking non-signers or dependencies may materially increase Range requests; +- failing to deduplicate addresses can amplify external calls; +- moving checks later can allow expensive dependency work before rejection. + +### Adding retries, batching, metrics, or rate limiting + +Inspect first: + +- `check_addresses` concurrency and error behavior; +- `get_or_insert_in_flight` shared future lifecycle; +- `magicblock-metrics` for appropriate low-cardinality metric definitions if observability is added; +- Chainlink caller expectations around latency and fail-fast behavior. + +Risks: + +- retries can delay account clone availability; +- batch APIs may change partial-failure semantics; +- metrics labels must not include addresses or API keys. + +## Tests and validation + +For documentation-only changes: + +```bash +git status --short +ls .agents/context/crates/magicblock-aml.md +grep -n "magicblock-aml" AGENTS.md .agents/context/crate-map.md .agents/context/crates/magicblock-aml.md +``` + +For `magicblock-aml` Rust/source changes, run focused crate checks: + +```bash +cargo fmt +cargo clippy -p magicblock-aml --all-targets -- -D warnings +cargo nextest run -p magicblock-aml +``` + +When changing `RiskConfig` or config defaults, also run: + +```bash +cargo nextest run -p magicblock-config +``` + +When changing Chainlink call sites or which post-delegation action accounts are checked, also run targeted Chainlink tests, for example: + +```bash +cargo nextest run -p magicblock-chainlink +``` + +Broader baseline validation remains the repository standard from `.agents/rules/testing-and-validation.md` for Rust behavior changes: + +```bash +cargo fmt +cargo clippy --workspace --all-targets -- -D warnings +cargo nextest run --workspace +``` + +Performance validation expectations: + +- Documentation-only changes have no runtime performance impact. +- Any change that increases cache misses, HTTP calls, retries, checked address cardinality, SQLite contention, or Chainlink wait time must report the account-sync hot-path impact and, when practical, validate with representative post-delegation action clone workloads. + +## Related docs + +- `AGENTS.md` — required agent workflow and documentation-memory rules. +- `.agents/context/overview.md` — validator runtime model and important concepts. +- `.agents/context/architecture.md` — account synchronization layer and cross-crate boundaries. +- `.agents/context/crate-map.md` — workspace crate ownership map and pointer back to this guide. +- `.agents/rules/testing-and-validation.md` — repository validation commands and reporting expectations. +- `.agents/memory/agent-memory-and-docs.md` — rules for keeping agent documentation current. +- `.agents/context/crates/magicblock-chainlink.md` — Chainlink account/delegation coordination guide; read it before changing AML call sites. +- `config.example.toml` — operator-facing `[chainlink.risk]` configuration example. +- `magicblock-config/src/config/chain.rs` — `RiskConfig` source of truth. +- `magicblock-chainlink/src/chainlink/fetch_cloner/mod.rs` — runtime caller for post-delegation action signer risk checks. diff --git a/.agents/context/crates/magicblock-aperture.md b/.agents/context/crates/magicblock-aperture.md new file mode 100644 index 000000000..2cf7659cb --- /dev/null +++ b/.agents/context/crates/magicblock-aperture.md @@ -0,0 +1,482 @@ +# `magicblock-aperture` + +## Purpose + +`magicblock-aperture` owns the validator's external Solana-compatible ingress and event egress surface. It exposes JSON-RPC over HTTP, PubSub over WebSocket, local request caches, and dynamic Agave Geyser plugin notifications. `magicblock-api` constructs it during validator startup through `initialize_aperture`, then runs the returned `JsonRpcServer` as part of the validator service graph. + +High-level responsibilities: + +- bind the HTTP JSON-RPC listener and adjacent WebSocket PubSub listener; +- parse, validate, route, and encode supported Solana JSON-RPC requests plus Magic Router compatibility methods; +- serve local account, ledger, block, transaction, node, token, and mocked Solana RPC reads; +- submit and simulate transactions through `magicblock-core` dispatch channels and the processor scheduler; +- trigger Chainlink/account-cloner ensure paths for RPC reads and transactions when the validator is primary; +- maintain short-lived transaction and blockhash caches used for replay prevention, status reads, and blockhash validity; +- maintain WebSocket subscription registries and push account, program, signature, logs, and slot notifications; +- load and notify Agave Geyser plugins from configured JSON files. + +Aperture sits directly on performance-sensitive RPC, PubSub, event-processing, account-sync, and transaction-submission paths. Keep per-request and per-event work lean: avoid blocking I/O, unbounded allocations, high-cardinality metrics labels, duplicate account fetches, and slow plugin work in hot paths. Aperture is an ingress/router layer, not the protocol source of truth for execution validity, delegation lifecycle, or settlement. + +## Update requirement + +Update this document in the same change whenever `magicblock-aperture` behavior, public APIs, request/response compatibility, configuration contract, event flow, cache behavior, Geyser integration, metrics, tests, or performance characteristics change. This guide is useful only if it reflects the current implementation. + +Update it for changes to: + +- `initialize_aperture`, `JsonRpcServer`, `SharedState`, `NodeContext`, exported modules, or startup/shutdown ordering; +- HTTP or WebSocket method inventories in `JsonRpcHttpMethod` / `JsonRpcWsMethod`; +- RPC handler semantics, request validation, response encoding, JSON-RPC error codes, or HTTP status mapping; +- transaction preparation, replay prevention, blockhash checks, primary/replica gating, account ensuring, submission, or simulation; +- `TransactionsCache`, `BlocksCache`, `ExpiringCache`, subscription DBs, signature expiration, or notification encoders; +- Geyser plugin config keys, loading lifecycle, event payload fields, notification ordering, or error policy; +- `[aperture]` config keys in `magicblock-config`, `config.example.toml`, or CLI/env support; +- metrics names/labels from `magicblock-metrics` used by this crate; +- unit, integration, or manual validation commands for RPC, PubSub, Geyser, or performance-sensitive paths. + +## Where it sits in the repository + +| Path | Role | +|---|---| +| `magicblock-aperture/Cargo.toml` | Crate manifest. Depends on account cloner, accounts DB, Chainlink, config, core, ledger, metrics, version, Hyper, Tokio, fastwebsockets, Agave Geyser, and Solana RPC/status types. | +| `magicblock-aperture/README.md` | Human-facing overview of HTTP, WebSocket, Geyser plugin support, and request lifecycle. Keep it aligned with this guide when operator-facing behavior changes. | +| `magicblock-aperture/src/lib.rs` | Public crate entrypoint. Exports `initialize_aperture`, `JsonRpcServer`, `error`, `server`, and `state`; binds sockets before starting event processors. | +| `magicblock-aperture/src/server/http/mod.rs` | Hyper/Tokio HTTP accept loop and per-connection task spawning. Stops accepting immediately on cancellation. | +| `magicblock-aperture/src/server/http/dispatch.rs` | Central HTTP JSON-RPC dispatcher, batch handling, CORS, `/health/primary`, metrics, and method routing. | +| `magicblock-aperture/src/server/websocket/mod.rs` | WebSocket listener, HTTP upgrade handling, and per-connection task spawning. | +| `magicblock-aperture/src/server/websocket/connection.rs` | Per-client WebSocket loop: request reads, responses, subscription updates, ping/inactivity handling, and shutdown. | +| `magicblock-aperture/src/server/websocket/dispatch.rs` | Per-connection subscription dispatcher and unsubscribe guard ownership. | +| `magicblock-aperture/src/requests/mod.rs` | JSON-RPC request shapes, supported HTTP/WebSocket method enums, method string mapping, and `parse_params!`. | +| `magicblock-aperture/src/requests/http/*.rs` | Individual HTTP RPC handlers and shared helpers for parsing bodies, reading/ensuring accounts, transaction preparation, and account ensuring. | +| `magicblock-aperture/src/requests/http/transaction_validation.rs` | Additional transaction-shape guardrails: rejects v0 address lookup tables and program ID indices outside the runtime packet-derived limit. | +| `magicblock-aperture/src/requests/websocket/*.rs` | Individual WebSocket subscribe handlers for account, program, signature, logs, and slot subscriptions. | +| `magicblock-aperture/src/state/*.rs` | `SharedState`, `NodeContext`, caches, subscription registries, and signature expiration. | +| `magicblock-aperture/src/encoder.rs` | Notification encoders for account, program, signature, logs, and slot PubSub payloads. | +| `magicblock-aperture/src/geyser.rs` | Dynamic Agave Geyser plugin manager and conversion from validator events to `Replica*Info*` types. | +| `magicblock-aperture/src/error.rs` | `ApertureError`, JSON-RPC `RpcError`, Solana transaction error mapping, and HTTP status override for shutdown-like failures. | +| `magicblock-aperture/tests/*.rs` | Crate-level RPC/PubSub tests using a live `JsonRpcServer` and `solana_rpc_client` / `solana_pubsub_client`. | +| `magicblock-api/src/magic_validator.rs` | Runtime consumer: builds `SharedState`, calls `initialize_aperture`, and runs the RPC server. | +| `magicblock-config/src/config/aperture.rs` | Defines `ApertureConfig` (`listen`, `event_processors`, `geyser_plugins`). | +| `config.example.toml` | Operator-facing `[aperture]` example, including `geyser-plugins`. | +| `test-integration/test-pubsub/` and `test-integration/test-magicblock-api/` | Integration suites that exercise validator-level PubSub and MagicBlock API behavior. | + +Main consumers: + +- `magicblock-api` is the runtime consumer and owns service orchestration around Aperture. +- External Solana/RPC clients consume the HTTP and WebSocket APIs. +- Dynamically loaded shared libraries implementing `agave_geyser_plugin_interface::GeyserPlugin` consume Geyser notifications. +- `test-kit` and `magicblock-aperture/tests/setup.rs` provide test backends for crate-local RPC tests. + +Important upstream/downstream relationships: + +- Upstream state comes from `AccountsDb`, `Ledger`, `Chainlink`, `DispatchEndpoints`, and `magicblock-config`. +- Transaction submission/simulation flows downstream into the processor scheduler via `TransactionSchedulerHandle`. +- Account read and transaction account availability flows downstream into `magicblock-chainlink` / `magicblock-account-cloner` ensure APIs. +- Event egress consumes `magicblock-core` account/transaction/block channels and feeds WebSocket subscriptions, local caches, and Geyser plugins. + +## Public API shape / Main public types and APIs + +Public surface from `src/lib.rs`: + +- `initialize_aperture(config, state, dispatch, cancel) -> ApertureResult` — binds HTTP and WebSocket listeners, constructs the server, then starts `EventProcessor` workers. Event processors start only after sockets bind successfully so startup retries do not leak background tasks. +- `JsonRpcServer` — returned service handle with: + - `http_addr()` and `ws_addr()` for bound listener addresses; + - `run(self)` to concurrently run HTTP and WebSocket servers until cancellation. +- `error` module — exports `ApertureError` and public `RpcError` type used in responses and by `magicblock-api` error wrapping. +- `state` module — exports `SharedState`, `NodeContext`, `ChainlinkImpl`, and `InnerChainlinkImpl` for construction by `magicblock-api` and tests. +- `server` module — public module for lower-level server types, though most internals remain `pub(crate)`. + +Key configuration/API contracts: + +- `ApertureConfig::listen` binds the HTTP listener. The WebSocket listener uses `listen.port() + 1`; if HTTP port is `0`, WebSocket also binds port `0` and the actual bound ports are available from `JsonRpcServer`. +- `ApertureConfig::event_processors` controls the number of event-processing Tokio tasks. +- `ApertureConfig::geyser_plugins` is a list of JSON config paths. Each JSON file must contain `libpath` pointing to a shared library exporting `_create_plugin`. +- `NodeContext` carries validator identity, base fee, feature set, and block time. `SharedState::new` uses block time to compute blockhash validity and initializes transaction/block/subscription caches. + +Supported HTTP method enum (`JsonRpcHttpMethod`) includes standard reads, transaction methods, token helpers, and Magic Router compatibility methods: + +- transaction methods: `sendTransaction`, `simulateTransaction`; +- local/ledger reads: `getAccountInfo`, `getMultipleAccounts`, `getBalance`, `getBlock`, `getBlocks`, `getBlockHeight`, `getBlockTime`, `getLatestBlockhash`, `isBlockhashValid`, `getTransaction`, `getSignatureStatuses`, `getSignaturesForAddress`, `getSlot`, `getVersion`, and related Solana methods; +- token helpers: `getTokenAccountBalance`, `getTokenAccountsByDelegate`, `getTokenAccountsByOwner`; +- mocked compatibility methods such as `getHealth`, `getGenesisHash`, `getEpochInfo`, and others under `mocked.rs`; +- Magic Router compatibility: `getRoutes`, `getBlockhashForAccounts` as an alias of `getLatestBlockhash`, and `getDelegationStatus`. + +Supported WebSocket method enum (`JsonRpcWsMethod`) includes: + +- `accountSubscribe` / `accountUnsubscribe`; +- `programSubscribe` / `programUnsubscribe`; +- `signatureSubscribe` / `signatureUnsubscribe`; +- `logsSubscribe` / `logsUnsubscribe`; +- `slotSubscribe` / `slotUnsubscribe`; +- `ping`. + +Important internal service handles: + +- `HttpDispatcher` is the shared per-request context for HTTP handlers. It owns cloned handles to node context, accounts DB, ledger, Chainlink, transaction/block caches, and transaction scheduler. +- `WsDispatcher` is per WebSocket connection. It owns that client's cleanup guards and signature expirer while sharing global subscription DB and transaction cache. +- `EventProcessor` is a background worker. Each worker subscribes to account, transaction, and block event channels and forwards events to subscriptions, Geyser plugins, and caches. +- `GeyserPluginManager` owns loaded plugin trait objects and `Library` handles. The library handles must outlive plugin objects. + +## Runtime flows + +### Startup and shutdown flow + +```text +magicblock-api + -> SharedState::new(...) + -> initialize_aperture(config, state, dispatch, cancel) + -> bind HTTP listener + -> derive and bind WebSocket listener + -> construct WebsocketServer and HttpServer + -> unsafe GeyserPluginManager::new(config.geyser_plugins) + -> spawn config.event_processors EventProcessor tasks + -> JsonRpcServer::run() + -> join HTTP and WebSocket accept loops +``` + +Important details: + +1. Socket binding happens before event processors start. Preserve this ordering to avoid leaked event tasks after bind failures in tests/startup retries. +2. HTTP and WebSocket accept loops stop accepting new connections when `cancel` is triggered. +3. Active HTTP/WebSocket connection tasks are not drained indefinitely; shutdown favors fast validator restart. +4. Geyser plugins are loaded during `EventProcessor::start`; plugin loading failures fail Aperture initialization. +5. Dropping `GeyserPluginManager` calls `plugin.on_unload()` for each loaded plugin. + +### HTTP request dispatch flow + +```text +TCP connection + -> Hyper HttpServer + -> HttpDispatcher::dispatch + -> extract_bytes (1 MiB cap) + -> parse_body (single or batch) + -> method-specific handler + -> ResponsePayload / ResponseErrorPayload +``` + +Important details: + +1. `OPTIONS` receives CORS headers without JSON-RPC parsing. +2. `/health/primary` returns `503 Service Unavailable` unless `CoordinationMode::current() == Primary`. +3. Request bodies are capped at 1 MiB. `Data::SingleChunk` avoids allocating for common single-chunk requests; only multi-chunk bodies allocate a `Vec`. +4. Batch requests run handlers through `FuturesOrdered`, preserving response order while allowing concurrent futures. +5. `RPC_REQUESTS_COUNT` and `RPC_REQUEST_HANDLING_TIME` are labeled by bounded method names from the enum; do not replace them with user-controlled labels. +6. JSON-RPC errors usually render with HTTP 200; shutdown-like `TransactionError::ClusterMaintenance` maps to HTTP 503 so retry-aware proxies can absorb restart gaps. + +### Account read ensure flow + +```text +getAccountInfo / getMultipleAccounts / simulation account reads + -> CoordinationMode check + -> Chainlink ensure_accounts when primary + -> AccountsDb get_account + -> LockedAccount race-free encode + -> JSON-RPC response with BlocksCache slot context +``` + +Important details: + +1. In replica/non-primary modes, read helpers skip on-chain interactions and return local `AccountsDb` state only. +2. `getAccountInfo` and `getMultipleAccounts` pass `mark_empty_if_not_found` so missing accounts can be represented locally, then render synthetic empty placeholder system accounts as JSON-RPC `null`. +3. Ensure failures for account reads are logged and the handler returns whatever is currently in `AccountsDb`; transaction account ensure failures are stricter. +4. Encoding uses `LockedAccount` to avoid races while reading account data. +5. `getMultipleAccounts` does **not** enforce agave's 100-pubkey-per-request limit. The handler in `requests/http/get_multiple_accounts.rs` processes every pubkey in the input array with no count cap; the only bound is the global 1 MiB request-body limit (see HTTP flow detail 3). Clients relying on agave's rejection of >100 keys will not get that error here. + +### Transaction submission flow + +```text +sendTransaction + -> require primary + -> decode base58/base64 transaction + -> validate_supported_transaction_shape + -> validate recent blockhash against BlocksCache + -> sanitize and verify signatures + -> reserve signature in TransactionsCache + -> Chainlink ensure_transaction_accounts + -> scheduler.execute (preflight) OR scheduler.schedule (skipPreflight) + -> return signature +``` + +Important details: + +1. `sendTransaction` and `simulateTransaction` are primary-only. Preserve this gating; replicas must not perform on-chain account ensures or schedule execution. +2. Replay prevention reserves the signature in `TransactionsCache` before account ensuring and scheduling. Duplicate signatures return `AlreadyProcessed`. +3. `TransactionsCache` TTL is 75 seconds, intentionally longer than the 60-second blockhash window to avoid replay after premature cache eviction. +4. v0 transactions with address lookup tables are currently rejected. +5. Program ID indices are limited to `1232 / size_of::()` because downstream runtime compute-budget code assumes packet-bounded program indices. +6. `skip_preflight = true` schedules fire-and-forget and increments `TRANSACTION_SKIP_PREFLIGHT`; otherwise the handler awaits scheduler execution. + +### Simulation flow + +```text +simulateTransaction + -> require primary + -> decode and validate transaction + -> optionally replace recent blockhash + -> Chainlink ensure_transaction_accounts + -> scheduler.simulate + -> optionally merge requested post-simulation account states + -> encode RpcSimulateTransactionResult +``` + +Important details: + +1. `replace_recent_blockhash` mutates the transaction before sanitization and can return `replacement_blockhash` in the response. +2. Requested account snapshots reject binary/base58 encodings and too many requested accounts. +3. If simulation fails, requested account snapshots are returned as `None` entries. +4. Current response leaves some Solana fields as `None`, including fee, loaded address data, balances, and loaded accounts data size. + +### WebSocket subscription flow + +```text +WebSocket TCP connection + -> HTTP upgrade + -> ConnectionHandler task + -> WsDispatcher per connection + -> subscribe handler registers global subscriber + -> CleanUp guard stored in connection unsubs map + -> EventProcessor sends encoded notification to connection channel + -> ConnectionHandler writes text frame +``` + +Important details: + +1. Each connection has an MPSC channel with capacity `4096` for outbound updates. +2. Connection IDs are generated with a relaxed atomic counter. +3. `CleanUp` guards provide RAII unsubscription. Dropping/removing the guard removes the subscriber from global registries. +4. `signatureSubscribe` is one-shot. It is removed atomically on notification and also tracked by a per-connection `SignaturesExpirer` with a 90-second TTL checked every 5 seconds. +5. The connection loop sends WebSocket pings every 30 seconds and closes connections inactive for more than 60 seconds. +6. `WsDispatcher::drop` drains pending signature subscriptions to avoid orphaned global entries. + +### Event processor and Geyser flow + +```text +processor/ledger dispatch channels + -> EventProcessor workers + -> WebSocket subscription DB notifications + -> GeyserPluginManager notifications + -> TransactionsCache / BlocksCache updates +``` + +Event ordering in the current implementation: + +1. Block update: send slot WebSocket notification, notify Geyser slot, notify Geyser block, then update `BlocksCache`. +2. Account update: send account WebSocket notification, send program WebSocket notification, then notify Geyser account. +3. Transaction status: send signature WebSocket notification, send logs WebSocket notification, notify Geyser transaction, then push final status into `TransactionsCache`. +4. Individual Geyser notification errors are logged with `warn!` and do not stop event processing. + +Geyser caveats: + +- Plugin callbacks run inline on event-processor tasks. Slow or blocking plugins can delay WebSocket notifications and cache updates handled by that task. +- Account Geyser notifications set `txn: None` and `write_version: 0`. +- Slot notifications use `SlotStatus::Rooted` and parent `slot.checked_sub(1)`. +- Block metadata currently uses placeholder `parent_blockhash`, `executed_transaction_count`, and `entry_count` values. +- Plugin JSON uses `libpath`; error text in `geyser.rs` still mentions `path` in some messages, but the implemented compatibility contract is `libpath`. + +## Important internals and caveats + +### Coordination mode boundaries + +Aperture consults `CoordinationMode::current()` to decide whether on-chain interactions are allowed. Primary mode can ensure accounts and submit/simulate transactions. Replica mode should serve local reads only and reject transaction-affecting RPC methods. Do not bypass `require_primary_rpc_method` or `needs_onchain_interactions` when adding write-like or fetch-amplifying RPC methods. + +### Cache semantics + +- `ExpiringCache` evicts lazily only on `push`; `get` and `contains` do not remove expired entries. +- Updating an existing key in `ExpiringCache` does not renew its lifetime because expiration records are only queued for new keys. +- `BlocksCache` stores one latest block through `ArcSwapAny` plus a 60-second blockhash cache. `block_validity` is scaled by `SOLANA_BLOCK_TIME / NodeContext::blocktime` and `MAX_VALID_BLOCKHASH_SLOTS`. +- `SharedState::new` panics if `NodeContext::blocktime` is zero through `BlocksCache::new`. +- `TransactionsCache` stores `None` for reserved/in-flight signatures and `Some(SignatureResult)` after status events. + +### Subscriptions and encoders + +Subscription grouping depends on encoders implementing stable ordering/equality. `AccountEncoder` includes encoding and data slice; `ProgramAccountEncoder` includes filters. Changing these comparisons can merge or split subscription groups and affects memory use and notification fanout. + +`LogsSubscribe` supports all logs or `mentions(pubkey)` filtering by transaction account keys. Program subscriptions filter account data in `ProgramAccountEncoder::encode`; non-matching updates return `None` and are skipped. + +### JSON and Solana compatibility + +The crate intentionally implements a subset of Solana JSON-RPC behavior plus MagicBlock-specific methods. Some methods are mocked for compatibility. Do not silently change method names, response context slots, JSON-RPC error codes, or unsupported-encoding behavior without updating tests and operator/client documentation. + +### Geyser FFI safety + +`GeyserPluginManager::new` is unsafe because plugins cross a Rust trait-object FFI boundary. Plugins must be ABI-compatible with the validator's Agave/Solana and Rust toolchain versions. The manager stores `Library` handles beside plugin boxes so loaded symbols remain valid while plugins are used; preserve that lifetime relationship. + +## Important invariants + +1. HTTP listener binding must succeed before event processors are spawned. +2. The WebSocket listener must bind to `HTTP port + 1`, except when HTTP port is `0`, where both listeners request OS-assigned ports. +3. Aperture must remain a lean ingress/egress layer; it must not duplicate SVM execution, delegation lifecycle, commit, or settlement protocol logic. +4. Primary/replica gating must prevent replicas from submitting/simulating transactions or performing on-chain account ensure work. +5. `sendTransaction` must reserve signatures before scheduling to preserve replay protection. +6. `TransactionsCache` TTL must remain longer than the blockhash validity window unless replay protection is redesigned. +7. Transaction encoded bytes must be preserved for execution/replication when `replace_blockhash = false`. +8. v0 address lookup table rejection and program-index guardrails must remain aligned with runtime support. +9. Account read handlers must render synthetic empty placeholder system accounts as JSON-RPC `null`. +10. Request and metric labels must come from bounded method/config values, not client-controlled arbitrary strings. +11. Subscription cleanup guards must be retained for the life of a WebSocket subscription and dropped on unsubscribe/disconnect. +12. `signatureSubscribe` must remain one-shot and bounded by expiration to avoid unbounded memory growth. +13. Geyser plugin libraries must outlive plugin trait objects, and notification failures must not accidentally kill event processors unless the availability policy is intentionally changed. +14. Avoid adding blocking I/O, slow locks, unbounded serialization, or excessive cloning to HTTP dispatch, account ensure, transaction submission, event processing, or WebSocket notification hot paths. + +## Common change areas and what to inspect + +### Adding or changing an HTTP RPC method + +Inspect first: + +- `magicblock-aperture/src/requests/mod.rs` for enum and `as_str()` entries; +- `magicblock-aperture/src/server/http/dispatch.rs` for routing and metrics; +- matching `magicblock-aperture/src/requests/http/.rs` handler; +- `magicblock-aperture/tests/` and relevant integration suites; +- `.agents/specs/validator-specification.md` for protocol-level methods like delegation, routing, or commits. + +Risks: + +- missing method string mapping breaks metrics and routing; +- handler may need primary-mode gating or must avoid on-chain fetches in replica mode; +- account reads may require Chainlink ensure but should avoid fetch amplification; +- response context slot and error shape are client-visible compatibility contracts. + +### Changing transaction submission or simulation + +Inspect first: + +- `send_transaction.rs`, `simulate_transaction.rs`, `requests/http/mod.rs`, and `transaction_validation.rs`; +- `magicblock-core` transaction dispatch types and scheduler handle behavior; +- `magicblock-chainlink` account ensure behavior; +- `magicblock-aperture/tests/transactions.rs` and `transaction_primary_mode.rs`. + +Risks: + +- weakening replay protection; +- scheduling in replica mode; +- losing encoded transaction bytes needed by replication/execution paths; +- accepting transaction shapes the runtime cannot safely process; +- turning account ensure soft failures into silent execution of unavailable state. + +### Changing account or token read methods + +Inspect first: + +- `get_account_info.rs`, `get_multiple_accounts.rs`, `get_balance.rs`, `get_program_accounts.rs`, token handler files, and shared read helpers; +- `magicblock-chainlink` ensure APIs and `AccountFetchOrigin` metrics; +- `magicblock-aperture/tests/accounts.rs`. + +Risks: + +- returning placeholder accounts instead of JSON-RPC `null`; +- breaking result ordering for multi-account reads; +- adding expensive scans or remote fetches to hot read paths; +- changing SPL token layout offsets without tests. + +### Changing WebSocket/PubSub behavior + +Inspect first: + +- `server/websocket/connection.rs`, `server/websocket/dispatch.rs`, `requests/websocket/*.rs`; +- `state/subscriptions.rs`, `state/signatures.rs`, and `encoder.rs`; +- `magicblock-aperture/tests/websocket.rs` and `test-integration/test-pubsub/`. + +Risks: + +- orphaning subscriptions on disconnect; +- unbounded per-connection queues or signature subscription memory; +- changing notification JSON shape or subscription IDs; +- slow clients causing backpressure on event-processing paths. + +### Changing event processing or caches + +Inspect first: + +- `processor.rs`, `state/blocks.rs`, `state/transactions.rs`, `state/cache.rs`; +- `magicblock-ledger` latest-block watch behavior; +- transaction status event producers in `magicblock-processor`. + +Risks: + +- reordering cache updates relative to client notifications; +- invalid blockhash acceptance/rejection due to blocktime or TTL changes; +- removing lazy eviction assumptions; +- increasing event processor contention with additional shared locks. + +### Changing Geyser plugin support + +Inspect first: + +- `geyser.rs`, `processor.rs`, `magicblock-config/src/config/aperture.rs`, `config.example.toml`; +- Agave `Replica*Info*` version changes in dependency updates. + +Risks: + +- ABI/version mismatch and memory unsafety; +- changing operator config from `libpath`; +- blocking event processors inside plugin callbacks; +- changing placeholder event fields that downstream plugins may already consume. + +## Tests and validation + +For documentation-only changes to this guide, verify paths and cross-references only; no Rust checks are required by the batch plan. + +Minimum targeted checks for `magicblock-aperture` Rust changes: + +```bash +cargo fmt +cargo clippy -p magicblock-aperture --all-targets -- -D warnings +cargo nextest run -p magicblock-aperture +``` + +Useful targeted tests by area: + +```bash +cargo nextest run -p magicblock-aperture --test accounts +cargo nextest run -p magicblock-aperture --test transactions +cargo nextest run -p magicblock-aperture --test transaction_primary_mode +cargo nextest run -p magicblock-aperture --test websocket +cargo nextest run -p magicblock-aperture transaction_validation +``` + +Integration checks for validator-level behavior: + +```bash +cd test-integration +make test-magicblock-api +make test-pubsub +``` + +If isolating PubSub tests, use the workflow in `.agents/rules/testing-and-validation.md`, for example: + +```bash +cd test-integration +make setup-pubsub-both +# in another terminal: +RUST_LOG=info cargo test -p test-pubsub -- --test-threads=1 --nocapture +``` + +Broader baseline validation remains the repository standard from `.agents/rules/testing-and-validation.md`: + +```bash +cargo fmt +cargo clippy --workspace --all-targets -- -D warnings +cargo nextest run --workspace +``` + +Performance validation expectations: + +- RPC/account-read changes should report whether they add account fetches, blocking work, extra serialization, or heavy scans. +- Transaction submission/simulation changes should report scheduler/account-ensure latency risk and replay-protection impact. +- PubSub/Geyser/event changes should consider notification throughput, queue growth, plugin callback latency, and cache-update ordering. +- If no practical performance test is run for a hot-path change, explicitly state the residual risk. + +## Related docs + +- `AGENTS.md` — required agent workflow and documentation-memory rules. +- `.agents/context/overview.md` — validator runtime model and core concepts. +- `.agents/specs/validator-specification.md` — RPC/router, account cloning, transaction execution, and lifecycle invariants. +- `.agents/context/architecture.md` — cross-crate service boundaries and ingress/event-processing role. +- `.agents/context/crate-map.md` — workspace crate ownership map and pointer back to this guide. +- `.agents/rules/testing-and-validation.md` — repository validation workflow and integration-test commands. +- `.agents/memory/agent-memory-and-docs.md` — rules for keeping crate guides current. +- `.agents/context/crates/magicblock-account-cloner.md` — account/program clone behavior used by Aperture through Chainlink. +- `.agents/context/crates/magicblock-chainlink.md` — account synchronization and ensure behavior used by RPC reads/transactions. +- `magicblock-aperture/README.md` — human-facing crate overview. +- `magicblock-config/src/config/aperture.rs` and `config.example.toml` — operator-facing Aperture configuration. +- `test-integration/test-magicblock-api/` and `test-integration/test-pubsub/` — validator-level RPC/PubSub integration suites. diff --git a/.agents/context/crates/magicblock-api.md b/.agents/context/crates/magicblock-api.md new file mode 100644 index 000000000..bb3944026 --- /dev/null +++ b/.agents/context/crates/magicblock-api.md @@ -0,0 +1,345 @@ +# `magicblock-api` + +## Purpose + +`magicblock-api` is the validator orchestration crate. It turns `magicblock-config::ValidatorParams` into a running validator service graph, owns the `MagicValidator` lifecycle, and wires together storage, account synchronization, RPC/pubsub, transaction execution, settlement, replication, metrics, task scheduling, and operator-facing on-chain registration flows. + +High-level responsibilities: + +- initialize persistent ledger and AccountsDb state; +- sync and verify the validator keypair stored beside the ledger; +- build genesis/sys accounts required by the local runtime; +- create Chainlink/account-cloning, committor, scheduled-commit, replication, metrics, task-scheduler, and Aperture RPC services; +- spawn transaction execution, RPC runtime, slot/system tickers, ledger truncation, and optional replication service threads/tasks; +- transition the scheduler out of `StartingUp` after replay/reset/recovery; +- register/unregister the validator in the Magic Domain Program and manage validator fee-vault setup in standalone ephemeral mode; +- stop services in an order that protects in-flight commits and flushes durable state. + +This crate sits directly on startup and shutdown paths and coordinates hot-path services owned by other crates. Changes here can introduce latency, ordering bugs, persistence/recovery failures, or primary/replica divergence even when the changed code is not itself in an execution hot loop. + +## Update requirement + +Update this document in the same change whenever `magicblock-api` behavior or contracts change. This file is useful only if it reflects the current implementation. + +Update it for changes to: + +- `MagicValidator::try_from_config`, `start`, `stop`, `prepare_ledger_for_shutdown`, or validator registration/unregistration behavior; +- startup/shutdown ordering, service ownership, cancellation semantics, or thread/runtime spawning; +- ledger, AccountsDb, keypair, genesis, MagicContext, ephemeral vault, or native mint initialization; +- Chainlink, committor, scheduled commit recovery, replication, task scheduler, RPC/Aperture, metrics, or ledger truncator wiring; +- scheduler mode transitions for standalone, primary, or replica roles; +- MagicSys/commit nonce lookup behavior; +- domain registry, fee vault, claim-fees, or other base-layer operator flows; +- public APIs, exported modules, error types, validation commands, or integration-test expectations. + +## Where it sits in the repository + +| Path | Role | +|---|---| +| `magicblock-api/src/lib.rs` | Public module exports: `domain_registry_manager`, `errors`, `ledger`, and `magic_validator`; internal wiring modules stay private. | +| `magicblock-api/src/magic_validator.rs` | Main `MagicValidator` implementation and startup/shutdown service graph. | +| `magicblock-api/src/domain_registry_manager.rs` | Magic Domain Program registration, sync, fetch, and unregister helpers. | +| `magicblock-api/src/ledger.rs` | Ledger open/reset helpers, ledger lockfile helpers used by the binary, and validator-keypair persistence beside the ledger. | +| `magicblock-api/src/fund_account.rs` | Local genesis/runtime account seeding for validator identity, `MagicContext`, and ephemeral vault. | +| `magicblock-api/src/genesis_utils.rs` | Local genesis account construction, native mint setup, and feature activation helpers. | +| `magicblock-api/src/magic_sys_adapter.rs` | Adapter from Magic Program `MagicSys` trait calls to `CommittorService` commit-nonce lookups. | +| `magicblock-api/src/tickers.rs` | Slot ticker for accepting scheduled commits and system metrics ticker for storage/account gauges. | +| `magicblock-api/src/errors.rs` | `ApiError` and `ApiResult`, including boxed conversions from downstream service errors. | +| `magicblock-api/README.md` | Short public usage example for constructing and starting `MagicValidator`. | +| `magicblock-validator/src/main.rs` | Primary consumer: parses config, creates `MagicValidator`, takes the ledger lock, starts/stops the validator, and drives TUI/headless lifecycle. | +| `test-integration/test-magicblock-api/` | Integration coverage for domain registry, claim fees, and block/transaction timestamp stability. | +| `config.example.toml` | Operator-facing configuration consumed through `ValidatorParams` and wired by this crate. | + +Main consumers: + +- `magicblock-validator` is the only production binary consumer of `MagicValidator`. +- `test-integration/test-magicblock-api` directly uses `DomainRegistryManager` and exercises runtime behavior through integration contexts. +- Other crates are mostly downstream services wired by this crate rather than consumers of it. + +Important downstream dependencies include `magicblock-accounts-db`, `magicblock-ledger`, `magicblock-processor`, `magicblock-aperture`, `magicblock-chainlink`, `magicblock-account-cloner`, `magicblock-accounts`, `magicblock-committor-service`, `magicblock-replicator`, `magicblock-task-scheduler`, `magicblock-metrics`, `magicblock-services`, `magicblock-program`, and `magicblock-validator-admin`. + +## Public API shape / Main public types and APIs + +The exported public API is intentionally small for production use: + +- `magic_validator::MagicValidator` + - `try_from_config(ValidatorParams) -> ApiResult` builds the service graph and spawns some long-lived components such as transaction execution and the RPC runtime. + - `start(&mut self) -> ApiResult<()>` performs ledger replay/reset/recovery, switches scheduler/replication mode, and starts post-start services such as slot ticking, ledger truncation, claim-fees, and the task scheduler. + - `stop(self)` consumes the validator and shuts down services, joins threads/tasks where available, flushes AccountsDb and ledger, and performs ledger shutdown. + - `start_unregister_validator_on_chain(&mut self)` sends a best-effort unregister transaction for standalone ephemeral validators that use on-chain coordination. + - `prepare_ledger_for_shutdown(&mut self)` stops truncation, cancels manual compactions, and flushes the ledger before final shutdown. + - `ledger(&self) -> &Ledger` exposes the ledger for the binary to lock and report paths. +- `domain_registry_manager::DomainRegistryManager` + - builds Solana RPC clients for the Magic Domain Program; + - fetches `ErRecord` data; + - registers, syncs, and unregisters validator records; + - has static helper variants used by `MagicValidator` background startup/shutdown flows. +- `ledger` + - `ledger_lockfile` and `lock_ledger` are public for the binary to prevent multiple validators using the same ledger path. + - validator-keypair and reset/open helpers are crate-private. +- `errors::{ApiError, ApiResult}` is the crate-level error surface. + +Private but important internal APIs: + +- `MagicSysAdapter` implements `magicblock_core::traits::MagicSys` so Magic Program execution can synchronously ask the committor service for current commit nonces. +- `init_slot_ticker` periodically accepts scheduled commits through the transaction scheduler and then asks `ScheduledCommitsProcessor` to process them. +- `init_system_metrics_ticker` periodically updates storage/account gauges. + +## Runtime flows + +### Construction flow: `MagicValidator::try_from_config` + +```text +ValidatorParams + -> ledger open/reset + keypair sync + -> AccountsDb open/snapshot/genesis/sys accounts + -> committor + MagicSys adapter + -> chainlink/account cloner + -> replication service (optional) + -> metrics + system metrics ticker + -> scheduled commit processor + -> program loading + validator authority init + -> transaction scheduler/execution thread + -> Aperture RPC runtime thread + -> task scheduler construction +``` + +Current order matters: + +1. Create a cancellation token and clone the configured validator keypair. +2. Build local genesis accounts from the validator pubkey and base fee. +3. Open/reset the ledger, derive `last_slot`, and sync the validator keypair file stored beside the ledger. +4. Open AccountsDb at the ledger-derived slot. +5. Connect to a replication broker for primary/replica modes; a fresh replica may import an external snapshot and set `config.accountsdb.reset = true` so Chainlink does not prune replica state. +6. Create the committor service and install `MagicSysAdapter` with `init_magic_sys`. +7. Create Chainlink unless the role is `ReplicationMode::Replica`, where `ChainlinkImpl::disabled()` is used. +8. Create replication service if a broker was configured. +9. Insert missing genesis accounts, initialize validator identity, MagicContext, and ephemeral vault accounts. +10. Start metrics service and system metrics ticker. +11. Create the scheduled-commit processor if a committor service exists. +12. Load configured upgradeable programs and initialize Magic Program validator authority/override. +13. Build the SVM environment, transaction scheduler state, and executor count; spawn the transaction execution thread. +14. Build Aperture shared state and start the RPC server on its own Tokio runtime thread. +15. Construct the task scheduler database path next to the ledger parent and create `TaskSchedulerService`. + +Caveats: + +- `try_from_config` already spawns long-lived execution and RPC resources before `start()` is called. +- Startup timing is logged via `log_timing`; preserve useful timings if changing slow startup sections. +- The RPC runtime uses roughly half available CPUs minus one worker, while the process main runtime is created in `magicblock-validator`. + +### Start flow: replay, reset, mode transition, recovery + +1. `maybe_process_ledger` skips replay when `ledger.reset` is set or AccountsDb slot is already at least the ledger slot. +2. Ledger replay uses `process_ledger` with a blockhash age derived from configured block time. +3. After replay, Magic Program scheduled actions are cleared so replayed accept-commit transactions do not re-commit. +4. Optional AccountsDb defragmentation runs before normal work starts; this is explicitly marked safe only before cleanup, scheduler mode changes, replication, slot ticks, or task recovery. +5. Standalone/primary nodes reset the bank and, for primaries, send a replication `Message::Reset`. +6. Pending commit intent recovery runs only after replay and reset. +7. Standalone nodes switch the scheduler to `SchedulerMode::Primary`; primary/replica modes spawn the replication service instead. +8. Standalone ephemeral nodes start background base-layer setup: funding check, magic fee vault init/delegation, optional startup fee claim, and optional domain registration. +9. Claim-fees periodic task, slot ticker, ledger truncator, and primary-only task scheduler are started. + +### Scheduled commit tick flow + +```text +slot ticker sleep + -> read MagicContext from AccountsDb + -> if scheduled commits exist, execute AcceptScheduleCommits tx through scheduler + -> ScheduledCommitsProcessor::process() + -> committor service schedules base-layer work +``` + +The ticker uses the same transaction scheduler path as normal execution for the validator-signed accept transaction. It currently has a known TODO about possible delay between accept and processing; do not hide or worsen that behavior without an explicit fix. + +### Shutdown flow: `MagicValidator::stop` + +1. Send unregister transaction in the background when standalone ephemeral on-chain coordination is enabled. +2. Set the local `exit` flag and cancel the shared cancellation token. +3. Stop scheduled commit processor, then committor service. The code comment says the committor service should be stopped last, but the current implementation stops it early after scheduled-commit processing; treat this ordering as compatibility-sensitive and inspect in-flight intent behavior before changing it. +4. Stop claim-fees task. +5. Join RPC thread, slot ticker, ledger truncator, replication thread, and transaction execution thread. +6. Flush AccountsDb. +7. Flush and shut down ledger. +8. Join unregister confirmation thread only if it has already finished; shutdown does not wait for that confirmation. + +Durable state is flushed only after workers that can admit, commit, truncate, or replicate state are stopped. + +### Domain registry and base-layer operator flow + +Standalone ephemeral validators that need on-chain interactions may: + +1. verify the validator authority has at least 5 SOL on the base layer; +2. initialize and delegate the validator magic fee vault if missing; +3. optionally claim existing validator fees on startup and periodically thereafter; +4. register or sync an `ErRecord` in the Magic Domain Program; +5. send an unregister transaction during shutdown and confirm it in a background thread with an 8s timeout and 400ms polling interval. + +The domain registry manager sends ordinary Solana transactions against the configured base-layer RPC URL. Changes here affect operator-facing registration semantics and should be tested against the `test-magicblock-api` integration suite when possible. + +## Important internals and caveats + +### Startup/shutdown ordering is the main contract + +`magicblock-api` is mostly glue, but the glue order is correctness-critical. Ledger replay must finish before account reset and scheduler mode transition. Pending intent recovery must run after replay/reset. Replica startup must not let Chainlink prune state imported from the primary. Shutdown must prevent more state changes before flushing storage. + +### Replication modes intentionally diverge + +`ReplicationMode::Standalone` and `ReplicationMode::Primary` use enabled Chainlink. `ReplicationMode::Replica` uses disabled Chainlink and waits for replicated state. The unit tests in `magic_validator.rs` cover this helper. Do not “simplify” the modes into one startup path without checking replication service expectations. + +### `MagicSysAdapter` blocks on a synchronous committor response + +Magic Program execution can call `fetch_current_commit_nonces`; the adapter converts this into a committor-service sync-channel request and waits up to 30 seconds. This is not a place to add unbounded blocking, and error codes are surfaced as `InstructionError::Custom` values. + +### Local sys accounts are special + +`fund_account.rs` seeds the validator identity as privileged, `MagicContext` as delegated and Magic Program-owned, and the ephemeral vault as ephemeral and Magic Program-owned. Those flags are part of execution/access-control behavior and must stay aligned with Magic Program/SVM invariants. + +### Ledger keypair verification protects operators + +The ledger stores a validator keypair file beside the blockstore parent. With `verify_keypair` enabled, a mismatch between the configured keypair and stored ledger keypair fails startup. Do not bypass this without understanding operator safety and persisted-state identity assumptions. + +## Important invariants + +1. Ledger replay, optional defragmentation, bank reset, pending intent recovery, and scheduler mode transition must remain ordered so execution does not race recovery or cleanup. +2. Replicas must not use enabled Chainlink or local cleanup in ways that diverge from primary-replicated state. +3. Scheduled commit recovery must run only after replay and reset, because it reads local account state for delegation checks. +4. `MagicContext` must exist before the slot ticker can run, and it must remain Magic Program-owned and delegated locally. +5. Ephemeral vault initialization must preserve the ephemeral flag and Magic Program owner. +6. Validator identity must remain privileged in local AccountsDb. +7. Committor service wiring must stay available before Magic Program execution can fetch commit nonces or schedule settlement work. +8. Do not add blocking RPC, slow I/O, or expensive serialization to scheduler/executor hot paths from this crate; keep such work in startup/background services where possible. +9. Shutdown must cancel/stop services before flushing AccountsDb and ledger. +10. Operator-facing config keys and base-layer registration/fee-vault behavior are compatibility-sensitive. + +## Common change areas and what to inspect + +### Changing validator startup or service wiring + +Inspect first: + +- `magicblock-api/src/magic_validator.rs` around `try_from_config` and `start`; +- downstream service constructors (`initialize_aperture`, `CommittorService::try_start`, `ProdInnerChainlink::try_new_from_endpoints`, `TransactionScheduler::new`, `TaskSchedulerService::new`); +- `.agents/context/architecture.md` startup path and service boundaries; +- `magicblock-validator/src/main.rs` for main runtime, ledger lock, TUI/headless lifecycle. + +Risks: + +- starting RPC or task scheduler before replay/reset/recovery is complete; +- double-spawning services that are currently constructed in `try_from_config` but started in `start`; +- breaking primary/replica mode transitions. + +### Changing shutdown behavior + +Inspect first: + +- `MagicValidator::stop`; +- `prepare_ledger_for_shutdown` and `magicblock-validator/src/main.rs` shutdown sequence; +- downstream service `stop`/`join` semantics for scheduled commits, committor, ledger truncator, replication, transaction scheduler, task scheduler, and RPC. + +Risks: + +- flushing state while workers can still mutate it; +- dropping Tokio runtimes before required async cleanup; +- waiting indefinitely for unregister confirmation or background services. + +### Changing commit, fee vault, or domain registration flows + +Inspect first: + +- `magicblock-api/src/domain_registry_manager.rs`; +- `spawn_primary_onchain_setup`, `ensure_magic_fee_vault_on_chain`, `register_validator_on_chain`, and `start_unregister_validator_on_chain`; +- `magicblock-validator-admin/src/claim_fees.rs`; +- Magic Domain Program and Delegation Program instruction builders; +- `test-integration/test-magicblock-api/tests/test_domain_registry.rs` and `test_claim_fees.rs`. + +Risks: + +- base-layer RPC commitment or confirmation changes affecting operator startup; +- fee vault not being delegated before commit scheduling; +- registering the wrong authority when replication authority override is active. + +### Changing local account/genesis setup + +Inspect first: + +- `fund_account.rs`; +- `genesis_utils.rs`; +- Magic Program API constants for `MAGIC_CONTEXT_PUBKEY` and `EPHEMERAL_VAULT_PUBKEY`; +- SVM/account access validation docs in `.agents/specs/validator-specification.md`. + +Risks: + +- missing local sys accounts causing startup or scheduled-commit ticker panics; +- changing account flags that SVM access validation relies on; +- primary/replica genesis state divergence. + +### Changing validation, replay, or timestamps + +Inspect first: + +- `maybe_process_ledger`; +- `magicblock-ledger::blockstore_processor::process_ledger`; +- `test-integration/test-magicblock-api/tests/test_clocks_match.rs`; +- `test-integration/test-magicblock-api/tests/test_get_block_timestamp_stability.rs`. + +Risks: + +- replaying scheduled commit accept transactions in a way that re-commits; +- inconsistent block time across transaction status, ledger block, and RPC block-time APIs. + +## Tests and validation + +For documentation-only changes: + +```bash +git status --short +ls .agents/context/crates/magicblock-api.md +``` + +Also verify `AGENTS.md` and `.agents/context/crate-map.md` mention the new guide when appropriate. Do not run `mbv-check` for markdown-only crate-guide updates unless source files were changed. + +For Rust changes in this crate, run focused checks first: + +```bash +cargo fmt +cargo clippy -p magicblock-api --all-targets -- -D warnings +cargo nextest run -p magicblock-api +``` + +Relevant integration checks: + +```bash +cd test-integration +make test-magicblock-api +``` + +When isolating integration tests, use the setup targets from `.agents/rules/testing-and-validation.md`, especially `make setup-magicblock-api-both` or the narrower devnet/ephem setup that matches the changed flow. + +Broader baseline validation remains: + +```bash +cargo fmt +cargo clippy --workspace --all-targets -- -D warnings +cargo nextest run --workspace +``` + +Performance validation expectations: + +- Documentation-only changes have no runtime performance impact. +- Startup/shutdown changes should report timing-sensitive effects and preserve `log_timing` coverage where useful. +- Changes that move work into RPC, account sync, scheduler/executor, ledger, replication, or committor paths need focused performance reasoning or measurement, because this crate controls when those hot-path services run. + +## Related docs + +- `AGENTS.md` — required agent workflow and crate-guide discovery list. +- `.agents/context/overview.md` — validator purpose and core lifecycle concepts. +- `.agents/specs/validator-specification.md` — protocol lifecycle, commit/undelegation, RPC/router, startup/shutdown, and recovery behavior. +- `.agents/context/architecture.md` — cross-crate service graph and startup/shutdown boundaries. +- `.agents/context/crate-map.md` — crate ownership map and pointer back to this guide. +- `.agents/rules/testing-and-validation.md` — validation command selection and integration-test setup. +- `.agents/context/crates/magicblock-aperture.md` — RPC/pubsub service details wired by this crate. +- `.agents/context/crates/magicblock-account-cloner.md`, `.agents/context/crates/magicblock-accounts.md`, and `.agents/context/crates/magicblock-chainlink.md` — account sync and scheduled-commit dependencies wired by this crate. +- `magicblock-api/README.md` — short usage example for `MagicValidator`. +- `magicblock-validator/src/main.rs` — production binary lifecycle around `MagicValidator`. +- `test-integration/test-magicblock-api/` — integration tests for API-owned flows. diff --git a/.agents/context/crates/magicblock-chainlink.md b/.agents/context/crates/magicblock-chainlink.md new file mode 100644 index 000000000..1b2322ab9 --- /dev/null +++ b/.agents/context/crates/magicblock-chainlink.md @@ -0,0 +1,492 @@ +# `magicblock-chainlink` + +## Purpose + +`magicblock-chainlink` is the validator's base-chain account synchronization crate. It is the bridge between Solana RPC/pubsub state and the validator's local `AccountsDb`. + +At a high level it: + +- fetches accounts from the base layer when RPC reads or transaction submission need them locally, +- subscribes to base-layer account/program updates and turns those updates into local clone operations, +- resolves delegation records for DLP-owned accounts and rewrites local account metadata so delegated accounts execute under their original owners, +- keeps local copies fresh while avoiding duplicate concurrent fetches/clones, +- handles program-account loading, associated-token/eATA projection, post-delegation action dependencies, and undelegation tracking, +- owns subscription capacity/LRU bookkeeping and defensive eviction signaling. + +This crate prepares local state for execution. It does **not** decide final post-execution write validity; the processor/SVM path still enforces MagicBlock writable-account invariants. + +Chainlink is on the account-availability hot path for RPC reads and transaction submission. Changes must preserve low-latency fetch/clone behavior, bounded subscription overhead, deduplication, and low contention. Do not introduce avoidable duplicate remote fetches/clones, subscription churn, blocking work, excessive logging, or heavy per-account allocations/serialization; call out any unavoidable performance tradeoff explicitly. + +## Update requirement + +Whenever an agent changes behavior in `magicblock-chainlink`, or changes another crate in a way that changes Chainlink flows, this document must be updated in the same change. This file is useful only if it reflects the current implementation. Update it for changes to: + +- account fetch/clone classification, +- delegation-record resolution or local delegated/confined/undelegating flags, +- subscription ownership, LRU eviction, reconnection, or update ordering, +- program loading, +- ATA/eATA projection, +- post-delegation action dependency handling, +- lifecycle-mode behavior, +- public APIs used by `magicblock-api`, `magicblock-aperture`, `magicblock-accounts`, `magicblock-account-cloner`, or `programs/magicblock`, +- tests or validation commands relevant to this crate, +- performance characteristics of fetch/clone, deduplication, subscription, LRU/eviction, or update-ordering paths. + +## Where it sits in the repository + +Primary source files: + +| Path | Role | +|---|---| +| `magicblock-chainlink/src/lib.rs` | Crate exports. Re-exports Chainlink types and `AccountFetchOrigin`. | +| `src/chainlink/mod.rs` | Public Chainlink facade, replication-mode wrapper, transaction/account ensure entrypoints, removed-account eviction listener. | +| `src/chainlink/fetch_cloner/` | Main fetch/clone pipeline, delegation handling, subscription-update processing, ATA/eATA projection, pending operation deduplication. | +| `src/remote_account_provider/` | RPC/pubsub provider, subscription ownership, LRU capacity, websocket/gRPC clients, program-account resolution. | +| `src/submux/` | Multiplexes multiple pubsub clients, deduplicates/debounces updates, reconnects clients, fans updates into one stream. | +| `src/cloner/mod.rs` | `Cloner` trait implemented by `magicblock-account-cloner`; request types passed from Chainlink to the clone executor. | +| `src/accounts_bank.rs` | Test/mock-oriented `AccountsBank` helpers for this crate. | +| `src/testing/` | Test support behind `dev-context`. | +| `tests/` | Integration-style Chainlink tests for account ensure, delegation, redelegation, ordering, and race recovery. | + +Main consumers: + +- `magicblock-api` constructs the production Chainlink stack during validator startup. +- `magicblock-aperture` uses Chainlink for RPC read misses and transaction submission account availability. +- `magicblock-accounts` uses Chainlink/account cloning glue for account-manager flows and scheduled commit integration. +- `magicblock-account-cloner` implements the `Cloner` trait and submits clone/program/evict transactions into the local validator. +- `programs/magicblock` uses `dev-context` Chainlink helpers in tests and validator-only program flows. + +## Main public types and APIs + +### Chainlink facade + +`src/chainlink/mod.rs` defines the main stack: + +- `InnerChainlink`: active Chainlink implementation parameterized by RPC client, pubsub client, accounts bank, and cloner. +- `ReplicationModeAwareChainlink`: wrapper with `Enabled` and `Disabled` modes. +- `ProdInnerChainlink` / `ProdChainlink`: production aliases using `ChainRpcClientImpl`, `SubMuxClient`, `AccountsDb`, and a configurable cloner. + +Important methods: + +- `try_new_from_endpoints(...)`: builds `RemoteAccountProvider`, `FetchCloner`, risk service, and subscription update channel from configured base-layer endpoints. +- `ensure_transaction_accounts(tx)`: ensures all transaction account keys, plus a possible fee-payer ephemeral balance PDA, are present locally. No-op system transfers are skipped. +- `ensure_accounts(pubkeys, mark_empty_if_not_found, fetch_origin)`: fetches/clones accounts but returns only fetch/clone status. +- `fetch_accounts(pubkeys, fetch_origin)`: ensures accounts and then reads them from the local bank. +- `accounts_delegated_on_base_and_er(pubkeys, fetch_origin)`: checks that each account is DLP-owned on base and represented as delegated/DLP-owned locally. +- `undelegation_requested(pubkey)`: called by committor/account flows before an account is undelegated so Chainlink keeps watching for base-layer completion. +- `fetch_count()` / `is_watching()`: mainly observability/testing helpers. + +Disabled replication mode is intentionally conservative: + +- `ensure_accounts` is a no-op success. +- `fetch_accounts` returns `None` for each requested account. +- `ensure_transaction_accounts` errors with `DisabledForNonPrimaryMode`. +- undelegation tracking is ignored. + +### `Cloner` interface + +`src/cloner/mod.rs` defines the boundary between Chainlink and local clone execution: + +- `AccountCloneRequest` carries `pubkey`, resolved `AccountSharedData`, optional `commit_frequency_ms`, post-delegation `DelegationActions`, and optional `delegated_to_other` authority. +- `DelegationActions` wraps post-delegation action instructions from delegation records. +- `Cloner` trait methods: + - `clone_account(request)`, + - `clone_program(LoadedProgram)`, + - `evict_account(pubkey)`. + +Chainlink should build accurate clone requests; the cloner owns how those requests are materialized in the local validator. + +## Runtime flow: transaction account ensure + +`ensure_transaction_accounts` performs the normal transaction-preparation flow: + +1. Skip no-op system transfer transactions (`filters/noop_system_transfer.rs`). +2. Collect all account keys from the sanitized transaction. +3. Derive `ephemeral_balance_pda_from_payer(fee_payer, 0)` and add it if absent locally. +4. Mark all collected pubkeys as `mark_empty_if_not_found`; missing transaction accounts are cloned as empty placeholders when appropriate. +5. Call `ensure_accounts` with `AccountFetchOrigin::SendTransaction(signature)`. +6. `ensure_accounts` promotes accounts in the subscription LRU and calls `FetchCloner::fetch_and_clone_accounts_with_dedup`. + +Pitfalls: + +- This method only ensures availability. It must not loosen execution access rules. +- `mark_empty_if_not_found` is broad for transaction submission by design; changing it can affect how missing fee-payer/escrow/transaction accounts appear to execution. +- The fee-payer balance PDA logic must stay aligned with Magic Program ephemeral balance handling. + +## Runtime flow: fetch and clone pipeline + +The central implementation is `FetchCloner::fetch_and_clone_accounts_with_dedup` and its inner `fetch_and_clone_accounts`. + +### Deduplication and bank fast path + +Before fetching remotely: + +1. Blacklisted accounts are filtered out. +2. Existing non-undelegating accounts in `AccountsDb` are treated as ready. +3. Existing undelegating accounts are checked asynchronously by `should_refresh_undelegating_in_bank_account` to see whether base-layer undelegation completed. +4. Remaining pubkeys enter `pending_requests` ownership coordination. + +Only the first caller for a pubkey owns the fetch/clone operation. Later callers become waiters and receive the owner's result. Preserve this behavior for both correctness and performance; regressions here can amplify RPC traffic, clone transactions, and transaction-submission latency. Pending owners have: + +- generation IDs to avoid stale cleanup, +- cancellation hooks, +- a default timeout of `FETCH_CLONE_OPERATION_TIMEOUT` (60 seconds), +- waiter-specific result filtering so each caller sees only the entries for its pubkey. + +There is a second dedup layer for actual clone transactions: `pending_clones` is keyed by `(pubkey, remote_slot)`, so concurrent fetch and subscription paths do not submit duplicate local clone operations for the same account version. + +### Remote fetch + +`RemoteAccountProvider::try_get_multi` subscribes before fetching so subscription updates that arrive during the fetch can win over stale RPC data. It: + +1. Claims entries in `fetching_accounts` for pubkeys not already being fetched. +2. Sets up direct account subscriptions for claimed pubkeys. +3. Starts an RPC fetch with `min_context_slot` equal to the observed chain slot or requested slot. +4. Waits for either RPC results or a subscription update that is at least as new as the fetch start slot. +5. Returns results in input order. + +RPC fetches use Base64Zstd encoding, commitment from the RPC client, `min_context_slot`, timeout/retry handling, and metrics for success/found/not-found/failure. + +### Classification + +`pipeline::classify_remote_accounts` divides fetched accounts into: + +- `not_found`: missing on chain, +- `plain`: normal non-executable accounts not owned by DLP, +- `owned_by_deleg`: accounts currently owned by the Delegation Program, +- `programs`: executable accounts, +- `atas`: associated token accounts recognized by supported token-program layouts. + +`partition_not_found` further separates missing accounts into: + +- `clone_as_empty`: requested via `mark_empty_if_not_found`, +- `not_found`: left absent so later code fails naturally if it needs them. + +### Delegated account resolution + +DLP-owned accounts must be resolved with their delegation record before cloning: + +1. Derive `delegation_record_pda_from_delegated_account(account_pubkey)`. +2. Acquire a `DelegationRecord` subscription reason for the record PDA. +3. Fetch account and delegation record with slot matching via `try_get_multi_until_slots_match`. +4. Parse `DelegationRecord` and optional post-delegation actions. +5. Apply local metadata: + - owner is set to `delegation_record.owner`, + - `confined` is set when `authority == Pubkey::default()`, + - `delegated` is set when authority is this validator or confined, except raw eATA PDAs are not marked delegated directly, + - `commit_frequency_ms` is included only for accounts delegated/confined to this validator. +6. If authority belongs to another validator, `delegated_to_other` is set on the clone request. +7. Missing non-internal delegation records are reported in `FetchAndCloneResult::missing_delegation_record`. + +Important caveats: + +- Invalid delegation records are fatal for the fetch/clone operation because local ownership would be ambiguous. +- Post-delegation actions are parsed/decrypted only when the record authority is this validator. +- Confined accounts (`authority == Pubkey::default()`) are treated as locally delegated for execution purposes but also marked confined. +- DLP-internal accounts may be cloned without a delegation record if `is_internal_dlp_account_data` recognizes the layout. +- Delegated direct account subscriptions are cleaned up after delegation is discovered; delegated state is locally authoritative until undelegation tracking is requested. + +### Post-delegation actions + +Delegation records may carry encrypted or cleartext post-delegation actions. Chainlink: + +- parses actions from data after `DelegationRecord::size_with_discriminator()`, +- decrypts them with the validator keypair when needed, +- validates signer addresses through `RiskService` when configured, +- collects action dependencies from instruction program IDs and account metas, +- force-refreshes writable dependencies that are absent or not currently delegated, +- errors with `MissingDelegationActionAccounts` if required delegated writable dependencies cannot be resolved. + +Do not execute or ignore these actions blindly. They are part of clone-time invariants for post-delegation behavior. + +### Program account resolution + +Executable accounts are converted into `LoadedProgram` values and passed to `Cloner::clone_program`. + +Supported loader handling lives in `remote_account_provider/program_account.rs`: + +- Loader V1: deprecated; subscription updates for V1 are unexpected. +- Loader V2: single account contains metadata/data. +- Loader V3: program account plus separate program-data account; Chainlink fetches both with matching slots and holds a `ProgramData` subscription reason while resolving. +- Loader V4: single account with loader-v4 state and deployable data handling. + +Program clone restrictions: + +- `allowed_programs` from config, when non-empty, limits program cloning. +- native loader accounts should be blacklisted and are not cloned. +- LoaderV3 program-data subscriptions must be released on success and error paths. + +### ATA/eATA projection + +Chainlink has special handling for associated token accounts and ephemeral ATAs: + +- Base ATAs are recognized via `magicblock_core::token_programs::is_ata`. +- For each ATA, Chainlink derives the companion eATA PDA with `try_derive_eata_address_and_bump`. +- It subscribes to both ATA and eATA using `SubscriptionReason::AtaProjection`. +- If the eATA exists, has a delegation record for this validator, and can be projected, Chainlink clones a projected delegated ATA into the local bank. +- Projection preserves the base ATA's owner and data length, which is important for Token-2022 extensions. +- Missing eATAs can be remembered in `known_empty_eatas`, but only after confirmed `NotFound` while an eATA subscription is live. +- Raw eATA PDAs are not marked delegated directly; their state is projected into the corresponding base ATA. + +Pitfalls: + +- Do not rebuild Token-2022 accounts as legacy SPL Token accounts; use the projection helpers that preserve layout. +- Same-slot undelegate/redelegate cases intentionally allow a delegated refresh even when the bank already has the same remote slot. +- Undelegating ATAs may remain in bank while a companion eATA is still delegated to this validator. + +## Runtime flow: subscription updates + +Base-layer subscription updates flow through: + +```text +ChainUpdatesClient / ChainPubsubClientImpl / ChainLaserClientImpl + -> SubMuxClient + -> RemoteAccountProvider::listen_for_account_updates + -> FetchCloner::start_subscription_listener + -> FetchCloner::process_subscription_update + -> Cloner::clone_account / clone_program +``` + +Key behavior: + +- Clock sysvar updates update `chain_slot` and are not forwarded to the fetch cloner. +- Non-clock updates become `ForwardedSubscriptionUpdate` with a `SubscriptionSource` (`Account` or program source). +- If a subscription update arrives while an RPC fetch is pending and its slot is at least the fetch start slot, it resolves the pending fetch waiters instead of being forwarded as a separate update. +- Account-subscription updates for pubkeys no longer watched are dropped and can enqueue a removal update if stale local state exists. +- Program-subscription updates are allowed even if the pubkey is not in the direct-account LRU; delegated accounts may be tracked only by owner-program subscriptions. +- Non-advancing updates are ignored unless they represent a same-slot delegated refresh needed for undelegate/redelegate recovery. +- Delegated updates cause direct subscription cleanup; undelegation-completion updates retain/directly ensure subscriptions as appropriate and release `UndelegationTracking` ownership. + +### Greedy discovery + +If a subscription update discovers a DLP-owned account absent from the bank, Chainlink may greedily fetch and clone it if the delegation record says it belongs to this validator (or is confined). This is especially important for delegated eATA discovery and owner-program subscriptions. + +Updates delegated to other validators are ignored after discovery so this validator does not clone state it cannot execute against. + +## RemoteAccountProvider internals + +`RemoteAccountProvider` owns direct remote access and subscription state. + +### Endpoints + +Endpoint setup requires at least one RPC endpoint and at least one usable pubsub endpoint when lifecycle mode needs remote sync. + +Supported pubsub endpoint variants: + +- WebSocket via `ChainPubsubClientImpl`, +- gRPC/Laserstream via `ChainLaserClientImpl`, +- RPC endpoints are used for fetches, not pubsub. + +Startup chooses gRPC clients first when any gRPC endpoint exists because they can backfill subscriptions cheaply. WebSocket clients may be attached later as deferred clients. If gRPC startup fails and WebSocket fallback exists, startup retries with WebSocket. + +### Chain slot + +`chain_slot` is monotonic and updated from: + +- clock account websocket updates, +- gRPC slot updates. + +Fetches use `min_context_slot` to avoid serving account data older than the freshest observed slot or required companion slot. + +### Subscription ownership reasons + +A pubkey can be held for multiple reasons: + +- `DirectAccount`: normal account monitoring and normal LRU capacity management. +- `DelegationRecord`: temporary/explicit monitoring for delegation record PDAs. +- `ProgramData`: LoaderV3 program-data accounts. +- `UndelegationTracking`: protected monitoring while an account is expected to complete undelegation on base. +- `AtaProjection`: ATA/eATA projection monitoring. + +Ownership is reference-counted per reason. Releasing one reason does not unsubscribe while other reasons remain. + +`ensure_subscription` differs from `acquire_subscription`: it does not increment an already-held reason. This is used by eATA projection to keep an LRU entry warm without unbounded refcount growth. + +### LRU and defensive eviction + +`AccountsLruCache` bounds monitored direct-account subscriptions. On capacity pressure: + +- never-evicted accounts are skipped, +- accounts currently delegated or undelegating in the bank are protected, +- accounts with `UndelegationTracking` ownership are protected, +- if no candidate can be evicted, the new subscription is unsubscribed and `NoEvictableSubscriptionCapacity` is returned. + +When an account is evicted from subscription capacity, the provider sends a removal update. `InnerChainlink::subscribe_account_removals` listens for these and may submit `Cloner::evict_account` to remove stale local state, but only if the bank account is neither delegated nor undelegating. + +Removal handling is serialized with same-pubkey subscription transitions via `evict_unwatched_with_subscription_lock`, preventing an evict transaction from being submitted after a fresh subscription re-watches the same pubkey. + +### Reconciliation + +If subscription metrics are enabled, a background task periodically runs `subscription_reconciler::reconcile_subscriptions` to compare the LRU with actual pubsub-client subscriptions, update metrics, and notify removal for subscriptions that vanished. + +## SubMuxClient internals + +`SubMuxClient` wraps multiple pubsub clients and implements `ChainPubsubClient`. + +Responsibilities: + +- fan out account subscribe/unsubscribe requests to inner clients, +- fan out program subscriptions, +- fan in updates into one receiver, +- suppress duplicate `(pubkey, slot)` updates across clients within a dedupe window, +- debounce high-frequency account streams by forwarding at most the latest update per interval, +- never debounce the clock sysvar, +- reconnect clients after abort signals and resubscribe all tracked accounts/program subscriptions, +- expose subscription union/intersection and connection metrics. + +Default timing constants: + +- output channel size: `5_000`, +- dedupe window: `2_000ms`, +- debounce interval: `2_000ms`, +- debounce detection window: 5x the selected interval by default. + +Changing SubMux behavior can affect ordering, duplicate clone submissions, and perceived account freshness. Use the ordering and redelegation tests when changing it. + +## Lifecycle mode and configuration + +`ChainlinkConfig` wraps `RemoteAccountProviderConfig` and currently includes `remove_confined_accounts`. + +`RemoteAccountProviderConfig` includes: + +- subscription LRU capacity (`DEFAULT_MAX_MONITORED_ACCOUNTS` by default), +- validator lifecycle mode, +- subscription metrics flag, +- startup program subscriptions (defaults to the Delegation Program), +- resubscription delay (`DEFAULT_RESUBSCRIPTION_DELAY_MS` by default), +- global gRPC config. + +The remote provider is constructed only when `lifecycle_mode().needs_remote_account_provider()` is true. Offline/disabled modes must keep bank-only/no-op behavior intact. + +## Important invariants + +This crate is security-critical: it is the validator's only source of truth about base-layer (Solana) account state, and that truth ultimately governs which funds can move and settle. Keeping local state in sync with the base layer is a security requirement, not just a correctness/performance one (see `.agents/rules/validator-goals.md` and `.agents/specs/validator-specification.md`). Under no circumstances may a change make synchronization weaker, less stable, or more permissive than it is today: + +- Subscriptions (websocket/gRPC), fetching, delegation-record resolution, slot/`min_context_slot`/commitment handling, and clone-freshness checks must stay at least as strong and stable as now. +- The validator must never serve or execute against stale, forged, or out-of-sync state, never mark an account delegated without the authority checks below, and never miss base-layer updates that change delegation/undelegation truth. +- Because subscription/fetch updates are driven by external base-layer events and untrusted submissions, treat the dedup, slot-matching, ordering, LRU-protection, and bounded-capacity logic as security controls against races, stale-overwrite, and resource exhaustion. Do not relax them for performance. + +Preserve these invariants when editing this crate: + +1. **Never clone DLP-owned state as writable delegated state without a valid delegation record**, except explicitly recognized internal DLP accounts. +2. **Delegated local accounts must be presented with their original owner**, not the Delegation Program owner. +3. **Authority matters**: this validator can mark accounts delegated only when the record authority is this validator or the confined/default authority. +4. **Delegated and undelegating local accounts are protected from subscription-capacity eviction and defensive bank eviction.** +5. **Subscription update ordering must not overwrite fresher local state with older or duplicate data.** Same-slot delegated refresh is a narrow redelegation recovery exception. +6. **Fetches that need companion accounts must use matching slots or a minimum context slot** so account and delegation/program-data records are coherent. +7. **Pending request and pending clone deduplication must clean up by generation/key** to avoid stale owners unblocking or deleting newer work. +8. **Program-data subscriptions for LoaderV3 must be cleaned up on all paths.** +9. **ATA/eATA projection must preserve base ATA layout and token-program ownership.** +10. **Post-delegation action dependencies must be available before clone-time action handling.** +11. **Disabled/non-primary mode must not perform remote fetches or transaction account ensures.** +12. **This crate must not weaken processor/SVM access validation.** It only prepares local account state. +13. **Fetch/clone and subscription paths must remain performance-conscious.** Preserve deduplication, bounded waiting, LRU protections, low subscription churn, and non-blocking behavior unless a documented correctness requirement forces a tradeoff. + +## Common change areas and what to inspect + +### Account not found, stale account, or wrong owner + +Start with: + +- `InnerChainlink::ensure_accounts`, +- `FetchCloner::fetch_and_clone_accounts_with_dedup`, +- `FetchCloner::fetch_and_clone_accounts`, +- `pipeline::classify_remote_accounts`, +- `pipeline::resolve_delegated_accounts`, +- `delegation::apply_delegation_record_to_account`. + +Check whether the account is blacklisted, already in bank, undelegating, missing a delegation record, delegated to another validator, or projected from eATA. + +### Subscription update bugs + +Start with: + +- `RemoteAccountProvider::listen_for_account_updates`, +- `FetchCloner::process_subscription_update`, +- `RemoteAccountProvider::{acquire_subscription, release_single_subscription, release_subscription_reason_silently_for_delegated_account}`, +- `SubMuxClient` dedupe/debounce/reconnect logic, +- `subscription_reconciler`. + +Pay special attention to `SubscriptionSource::Account` vs program-source updates. + +### LRU/eviction bugs + +Start with: + +- `RemoteAccountProvider::register_subscription`, +- `CapacityEvictionProtection`, +- `InnerChainlink::subscribe_account_removals`, +- `RemoteAccountProvider::evict_unwatched_with_subscription_lock`. + +Do not evict delegated or undelegating local state. + +### Redelegation or undelegation bugs + +Start with: + +- `FetchCloner::should_refresh_undelegating_in_bank_account`, +- `FetchCloner::process_subscription_update`, +- `account_still_undelegating_on_chain.rs`, +- `undelegation_requested`, +- tests `04` through `09`. + +Same-slot cases are intentionally covered by separate tests. + +### Program clone bugs + +Start with: + +- `pipeline::resolve_programs_with_program_data`, +- `program_loader::handle_executable_sub_update`, +- `remote_account_provider/program_account.rs`, +- `allowed_programs` config. + +### ATA/eATA bugs + +Start with: + +- `ata_projection.rs`, +- `delegation::parse_raw_eata_pda`, +- `maybe_greedily_clone_discovered_delegated_account`, +- `process_subscription_update` projected clone path. + +## Tests and validation + +For documentation-only changes to this file, verify the file exists and links/paths are accurate. + +For Rust changes in `magicblock-chainlink`, run at least targeted formatting/checks and the Chainlink crate tests. If the change touches fetch/clone, subscription, LRU, or update-ordering hot paths, also include the smallest practical test or measurement that can expose duplicate fetches/clones, increased latency, contention, or subscription churn; if skipped, report the residual performance risk. + +Minimum targeted commands: + +```bash +cargo fmt +cargo nextest run -p magicblock-chainlink +``` + +For broader validation, use the repository baseline: + +```bash +cargo clippy --workspace --all-targets -- -D warnings +cargo nextest run --workspace +``` + +Relevant integration test command from `.agents/rules/testing-and-validation.md`: + +```bash +cd test-integration +make test-chainlink +``` + +Useful Chainlink test files: + +- `magicblock-chainlink/tests/basics.rs` — basic fetch/read behavior. +- `magicblock-chainlink/tests/01_ensure-accounts.rs` — account ensure behavior. +- `magicblock-chainlink/tests/03_deleg_after_sub.rs` — delegation after subscription. +- `magicblock-chainlink/tests/04_redeleg_other_separate_slots.rs` — redelegated to another validator, separate slots. +- `magicblock-chainlink/tests/05_redeleg_other_same_slot.rs` — redelegated to another validator, same slot. +- `magicblock-chainlink/tests/06_redeleg_us_separate_slots.rs` — redelegated to this validator, separate slots. +- `magicblock-chainlink/tests/07_redeleg_us_same_slot.rs` — redelegated to this validator, same slot. +- `magicblock-chainlink/tests/08_subupdate-ordering.rs` — subscription update ordering. +- `magicblock-chainlink/tests/09_waiter_reconciliation_race.rs` — concurrent waiter/reconciliation race recovery. + +When changing subscription or race behavior, prefer adding deterministic tests that wait for observable state (`pending_request_waiter_count`, bank state, subscription ownership) instead of relying on fixed sleeps. diff --git a/.agents/context/crates/magicblock-committor-program.md b/.agents/context/crates/magicblock-committor-program.md new file mode 100644 index 000000000..fff164748 --- /dev/null +++ b/.agents/context/crates/magicblock-committor-program.md @@ -0,0 +1,273 @@ +# `magicblock-committor-program` + +## Purpose + +`magicblock-committor-program` is the base-layer helper program and shared Rust API used by the committor service to upload large commit payloads into temporary on-chain buffer accounts. The Delegation Program commit/finalize instructions can then read those buffers when account data or diffs are too large to fit directly in one transaction. + +High-level responsibilities: + +- define the committor program id `ComtrB2KEaWgXsW1dhr1xYL4Ht4Bjj3gXnnL6KMdABq` and Solana entrypoint; +- derive deterministic validator-authority-scoped `chunks` and `buffer` PDAs for `(authority, delegated account pubkey, commit_id)`; +- initialize temporary buffer/chunk-tracker accounts, grow buffers over multiple instructions, write account-data chunks, and close/refund temporary accounts; +- provide client-side instruction builders used by `magicblock-committor-service`; +- provide shared `Changeset`, `ChangedAccount`, `CommitableAccount`, `Chunks`, and `ChangesetChunks` types used to split account changes into retryable chunks. + +This crate is on the base-layer settlement path. Its on-chain processor is compute- and transaction-size-sensitive, and its exported wire formats/PDA seeds are compatibility-sensitive. Changes can affect commit delivery, retry/recovery behavior, and whether large state commits fit in Solana transactions. + +## Update requirement + +Update this guide in the same change whenever behavior or contracts in `magicblock-committor-program` change. In particular, update it for changes to: + +- `CommittorInstruction` variants, account order, Borsh layout, instruction-size constants, or program id; +- PDA seed strings, seed order, bump handling, authority scoping, or `commit_id` encoding; +- signer requirements, account-allocation checks, close/refund behavior, or buffer/chunks ownership assumptions; +- max allocation, max instruction count, max instruction data size, write chunk sizing, or realloc chunking behavior; +- `Changeset`, `ChangedAccount`, `CommitableAccount`, `Chunks`, `ChangesetChunks`, or bundle/undelegation metadata semantics; +- instruction-builder APIs consumed by `magicblock-committor-service`; +- committor-service buffer preparation, retry, cleanup, or recovery flows that rely on this crate. + +Because this crate defines on-chain wire/API contracts, also update downstream docs if another crate changes how committor buffers are prepared, consumed, retried, or cleaned up. + +## Where it sits in the repository + +| Path | Role | +|---|---| +| `magicblock-committor-program/Cargo.toml` | Package metadata, Solana/Borsh dependencies, `cdylib`/`lib` crate types, and `no-entrypoint`, `custom-heap`, `custom-panic` features. | +| `magicblock-committor-program/src/lib.rs` | Public crate surface, program id declaration, entrypoint registration, processor export, and shared state/type re-exports. | +| `magicblock-committor-program/src/instruction.rs` | `CommittorInstruction` wire enum plus instruction-size constants used for transaction packing. | +| `magicblock-committor-program/src/processor.rs` | On-chain instruction processor for `Init`, `ReallocBuffer`, `Write`, and `Close`. Enforces signer and PDA checks. | +| `magicblock-committor-program/src/pdas.rs` | PDA seed helpers and `verified_seeds_and_pda!` macro for chunks/buffer accounts. | +| `magicblock-committor-program/src/instruction_builder/` | Client-side builders for init, realloc, write, and close instructions. Main consumer is `magicblock-committor-service`. | +| `magicblock-committor-program/src/instruction_chunks.rs` | Splits init/realloc instructions into transaction-sized groups. | +| `magicblock-committor-program/src/state/changeset.rs` | Shared account-change model, metadata extraction, bundling, undelegation markers, and `CommitableAccount` chunk iteration. | +| `magicblock-committor-program/src/state/chunks.rs` | Borsh-serialized bitfield that tracks which buffer chunks have landed. | +| `magicblock-committor-program/src/state/changeset_chunks.rs` | Iterators that turn account data/diffs into `ChangesetChunk { offset, data_chunk }` values and retry only missing chunks. | +| `magicblock-committor-program/src/utils/` | Low-level program assertions and close/refund helper. | +| `magicblock-committor-program/bin/magicblock_committor_program.so` | Checked-in program binary artifact used by validator/test setup. Treat as a deployment artifact, not source documentation. | +| `magicblock-committor-service/src/tasks/commit_task.rs` and `commit_finalize_task.rs` | Build buffer preparation stages and derive buffer PDAs consumed by Delegation Program commit/finalize-from-buffer instructions. | +| `magicblock-committor-service/src/transaction_preparator/delivery_preparator.rs` | Sends init/realloc/write instructions, retries missing chunks, handles cleanup, and persists buffer-preparation status. | +| `test-integration/test-committor-service/` | Integration coverage for committor delivery preparation, transactions, intent execution, and PDA/buffer behavior. | + +Main consumers: + +- `magicblock-committor-service`, which re-exports `ChangedAccount`, `Changeset`, and `ChangesetMeta`, builds buffer tasks, and sends committor-program instructions; +- `magicblock-accounts`, which uses `ChangesetMeta` in scheduled-commit error paths; +- integration tests under `test-integration/test-committor-service` and config tests that allow/deny the committor program id; +- the workspace root, which depends on this crate with `features = ["no-entrypoint"]` for client-side/library use. + +## Public API shape / Main public types and APIs + +### Crate exports + +`src/lib.rs` exports: + +- modules: `consts`, `error`, `instruction`, `instruction_chunks`, `pdas`, and `instruction_builder`; +- `process` for the Solana entrypoint/processor; +- shared state types: `ChangedAccount`, `ChangedAccountMeta`, `ChangedBundle`, `Changeset`, `ChangesetBundles`, `ChangesetMeta`, `CommitableAccount`, `ChangesetChunk`, `ChangesetChunks`, and `Chunks`. + +### Instructions and builders + +`CommittorInstruction` currently has four Borsh-serialized variants: + +1. `Init` creates the chunks PDA and buffer PDA. The buffer is rent-funded for the final desired size but initially allocated only up to `MAX_ACCOUNT_ALLOC_PER_INSTRUCTION_SIZE` bytes. +2. `ReallocBuffer` grows the buffer account by at most `MAX_ACCOUNT_ALLOC_PER_INSTRUCTION_SIZE` bytes per invocation until it reaches the requested size. +3. `Write` copies one `data_chunk` into the buffer at `offset` and marks the corresponding offset as delivered in `Chunks`. +4. `Close` zero-resizes and refunds both temporary accounts to the validator authority. + +The instruction builders in `src/instruction_builder/` derive PDAs and return ordinary `Instruction` values: + +- `create_init_ix(CreateInitIxArgs) -> (Instruction, chunks_pda, buffer_pda)`; +- `create_realloc_buffer_ixs(CreateReallocBufferIxArgs) -> Vec`; +- `create_write_ix(CreateWriteIxArgs) -> Instruction`; +- `create_close_ix(CreateCloseIxArgs) -> Instruction`. + +The account order documented in `CommittorInstruction` and emitted by these builders is part of the contract with `processor.rs`; keep them synchronized. + +### PDAs + +PDA derivation is deterministic and scoped by validator authority: + +```text +chunks seeds = [committor_program_id, b"comittor_chunks", authority, account_pubkey, commit_id_le_bytes] +buffer seeds = [committor_program_id, b"comittor_buffer", authority, account_pubkey, commit_id_le_bytes] +``` + +The seed strings intentionally use `comittor_*` spelling as implemented. Do not rename or correct them without a migration plan for all buffer producers/consumers. + +### Changeset and chunk types + +- `Changeset` stores committed accounts, the ER slot that requested the commit, and accounts to undelegate after commit. +- `ChangedAccount::Full` stores lamports, full data, original delegated-account owner, and `bundle_id`. `ChangedAccount::Diff` is only a placeholder; current methods `unreachable!` on it. +- `ChangesetMeta` extracts cheap per-account metadata for diagnostics without cloning full account data. +- `Changeset::into_committables(chunk_size)` creates one `CommitableAccount` per account and preserves undelegation and bundle metadata. +- `Chunks` is a compact Borsh bitfield of delivered chunks. `Chunks::new` asserts the serialized tracker fits within one allocation instruction. +- `ChangesetChunks` iterates all chunks or only missing chunks for retry. + +## Runtime flows + +### Buffer delivery flow + +```text +CommitTask/CommitFinalizeTask chooses *InBuffer delivery + -> PreparationTask creates Chunks from buffer_data and MAX_WRITE_CHUNK_SIZE + -> DeliveryPreparator::initialize_buffer_account + -> create_init_ix + -> create_realloc_buffer_ixs + -> chunk_realloc_ixs groups init/reallocs into transaction-sized batches + -> send init, then send realloc batches in parallel + -> DeliveryPreparator::write_buffer_with_retries + -> PreparationTask::write_instructions + -> create_write_ix for every ChangesetChunk + -> retry missing chunks using on-chain Chunks state + -> Delegation Program commit/finalize-from-buffer reads buffer PDA + -> create_close_ix cleanup closes chunks/buffer and refunds authority +``` + +The committor service persists intermediate buffer statuses around this flow. If initialization finds already-initialized accounts, it attempts cleanup, invalidates the cached blockhash, restores preparation stage, and retries once. + +### On-chain `Init` + +1. Requires exactly `[authority signer, chunks writable, buffer writable, system_program]`. +2. Verifies program id and derives both PDAs from the supplied authority, target account pubkey, commit id, and bumps. +3. Requires both PDAs to be unallocated. +4. Creates the chunks account at the requested chunks-account size. +5. Creates the buffer account with rent for the full requested size but data length capped at `MAX_ACCOUNT_ALLOC_PER_INSTRUCTION_SIZE`. +6. Serializes an empty `Chunks::new(chunk_count, chunk_size)` tracker into the chunks account. + +### On-chain `ReallocBuffer` + +1. Requires `[authority signer, buffer writable]` unless the buffer is already at least the requested size, in which case it returns `Ok(())` before signer/PDA checks. +2. Verifies the buffer PDA for the supplied authority/account/commit id/bump. +3. Grows data length by at most `MAX_ACCOUNT_ALLOC_PER_INSTRUCTION_SIZE`, capped at the final buffer size. +4. Does not require extra rent or system program because `Init` pre-funded rent for the full desired buffer size. + +### On-chain `Write` + +1. Requires `[authority signer, chunks writable, buffer writable]`. +2. Verifies both PDAs. +3. Checks `offset + data_chunk.len()` for overflow and bounds against current buffer length. +4. Copies bytes into the buffer. +5. Deserializes `Chunks`, marks `offset / chunk_size` delivered, and writes the tracker back. + +Offsets must be multiples of `chunk_size`; this is enforced through `Chunks::set_offset_delivered`. + +### On-chain `Close` + +1. Requires `[authority signer, chunks writable, buffer writable]`. +2. Verifies both PDAs. +3. Calls `close_and_refund_authority` on chunks and buffer accounts. +4. `close_and_refund_authority` resizes account data to zero before transferring lamports to mitigate refund/remaining-instruction account reuse attacks. + +## Important internals and caveats + +### Transaction-size constants + +`consts.rs` and `instruction.rs` carry hand-tuned constants used by `magicblock-committor-service/src/consts.rs` to calculate `MAX_WRITE_CHUNK_SIZE`. `MAX_INSTRUCTION_DATA_SIZE` is based on empirical Solana transaction-size limits, while `IX_*_SIZE` constants approximate serialized instruction overhead. Changing these values can make buffer writes exceed transaction size or underutilize transaction capacity; validate with committor transaction-preparator tests and integration tests. + +### Allocation and chunk tracker sizing + +`MAX_ACCOUNT_ALLOC_PER_INSTRUCTION_SIZE` is `10_240`. `Init` creates the initial buffer at this size or less; larger buffers require realloc instructions. `Chunks::new` panics if the chunk tracker itself would exceed one allocation instruction. This is treated as a programming/configuration bug, not a recoverable on-chain validation error. + +### Retry model + +The retry model assumes each successful `Write` updates the `Chunks` account atomically with the buffer write. The service can fetch chunks state, call `CommitableAccount::set_chunks`, and retry only `iter_missing()`. Do not decouple buffer writes from chunk tracking unless recovery and retry semantics are redesigned. + +### Placeholder diffs in `ChangedAccount` + +`ChangedAccount::Diff` exists as a placeholder but is not supported. Several accessors and conversions call `unreachable!` for diffs. Diff delivery currently happens in `magicblock-committor-service` by computing byte diffs for Delegation Program instructions, not by storing `ChangedAccount::Diff` in this crate. + +### Authority and PDA scope + +All temporary accounts are validator-authority scoped. The authority must sign all mutating instructions. This prevents one authority from modifying or closing another authority's buffers for the same delegated account and commit id. Do not relax signer checks or remove authority from seeds. + +## Important invariants + +1. The validator authority signer requirement for `Init`, `ReallocBuffer`, `Write`, and `Close` must be preserved. +2. Chunks and buffer PDAs must remain derived from program id, fixed seed string, authority pubkey, target account pubkey, and little-endian `commit_id`. +3. Instruction-builder account metas must match the account order expected by `processor.rs`. +4. `Init` must create only unallocated temporary accounts and must own them with the committor program id. +5. Buffer accounts must be rent-funded for the full requested size before no-rent reallocs are attempted. +6. Each `Write` must bounds-check offset and length before mutating buffer data. +7. Chunk delivery state must be updated only for offsets aligned to `chunk_size`. +8. Cleanup must verify PDAs before refunding lamports and must zero-resize accounts before lamport transfer. +9. Transaction-size and allocation constants must remain aligned with committor-service packing and compute-budget assumptions. +10. `ChangedAccount::Full.owner` must continue to mean the original owner of the delegated account on the base layer. +11. `bundle_id` and `accounts_to_undelegate` metadata must survive conversion into committable/bundled forms. +12. Do not introduce unbounded per-instruction work or excessive logging in the on-chain processor; every instruction is paid compute on the base layer. + +## Common change areas and what to inspect + +### Changing instruction layout or accounts + +Start with `src/instruction.rs`, `src/processor.rs`, and `src/instruction_builder/`. Then inspect `magicblock-committor-service/src/tasks/mod.rs`, `transaction_preparator/delivery_preparator.rs`, and integration tests under `test-integration/test-committor-service`. Verify Borsh layouts, account order, signer flags, PDA derivation, and transaction-size constants. + +### Changing buffer size, chunk size, or packing limits + +Inspect `src/consts.rs`, `src/instruction.rs`, `src/instruction_chunks.rs`, `src/state/chunks.rs`, `src/state/changeset_chunks.rs`, and `magicblock-committor-service/src/consts.rs`. Validate that init/realloc batches still fit under transaction-size and instruction-trace limits, and that write instructions leave room for compute-budget instructions. + +### Changing PDA derivation or authority behavior + +Inspect `src/pdas.rs`, the `verified_seeds_and_pda!` macro, all instruction builders, `CommitTask::commit_state_from_buffer_ix`, and `CommitFinalizeTask::commit_finalize_from_buffer_ix`. PDA changes are wire-contract changes and require a migration/compatibility plan. + +### Changing commit/change-set metadata + +Inspect `src/state/changeset.rs`, `magicblock-committor-service/src/tasks/task_builder.rs`, `magicblock-accounts/src/errors.rs`, and any recovery/persistence code that stores commit metadata. Preserve owner, slot, undelegation, and bundle semantics. + +### Changing cleanup behavior + +Inspect `src/utils/account.rs`, `processor::process_close`, and `DeliveryPreparator::cleanup`. Preserve PDA verification, signer requirement, and zero-resize-before-refund behavior. + +## Tests and validation + +For documentation-only changes to this guide, at minimum verify file paths and cross-references are correct: + +```bash +git diff --check +``` + +For source changes in `magicblock-committor-program`, run targeted checks first: + +```bash +cargo fmt +cargo test -p magicblock-committor-program +cargo clippy -p magicblock-committor-program --all-targets -- -D warnings +``` + +For changes that affect buffer preparation or public APIs consumed by the service, also run service tests: + +```bash +cargo test -p magicblock-committor-service +cargo clippy -p magicblock-committor-service --all-targets -- -D warnings +``` + +For behavior changes in actual base-layer commit delivery, prefer integration coverage through the committor suite: + +```bash +cd test-integration +make test-committor +``` + +or a narrower committor target such as `make test-committor-preparators`, `make test-committor-commitfinalize`, or `make test-committor-intent-executor` when appropriate. + +Before handing off any code change, also run the broader baseline from `.agents/rules/testing-and-validation.md` when practical: + +```bash +cargo fmt +cargo clippy --workspace --all-targets -- -D warnings +cargo nextest run --workspace +``` + +Performance/security validation expectations: + +- Report any change that increases buffer transaction count, chunk count, compute units, RPC sends, or retry rounds. +- Confirm signer/PDA checks are not relaxed. +- Confirm base-layer transaction size still fits after instruction-format changes. +- Confirm retry and cleanup paths still recover from partially initialized buffers. + +## Related docs + +- `.agents/specs/validator-specification.md` — commit, undelegation, committor service, and buffer-task behavior. +- `.agents/context/architecture.md` — base-layer settlement layer and service boundaries. +- `.agents/context/crate-map.md` — crate ownership and dependency map. +- `magicblock-committor-service/README.md` — current high-level intent-execution architecture notes for the main runtime consumer. +- `test-integration/test-committor-service/` — integration tests for committor delivery and intent execution. diff --git a/.agents/context/crates/magicblock-committor-service.md b/.agents/context/crates/magicblock-committor-service.md new file mode 100644 index 000000000..887098e83 --- /dev/null +++ b/.agents/context/crates/magicblock-committor-service.md @@ -0,0 +1,360 @@ +# `magicblock-committor-service` + +## Purpose + +`magicblock-committor-service` is the validator-side settlement service that turns Magic Program scheduled intent bundles into Solana base-layer transactions. It executes commits, commit-and-undelegates, commit-finalizes, undelegates, and Magic Actions by building atomic tasks, packing them into transactions, preparing delivery resources such as buffers and address lookup tables, sending transactions through `magicblock-rpc-client`, and persisting status for operator queries and restart recovery. + +High-level responsibilities: + +- expose `CommittorService` / `BaseIntentCommittor` as the async service boundary used by `magicblock-api`, `magicblock-accounts`, and account cloning; +- schedule intent bundles without executing mutually conflicting committed accounts in parallel; +- fetch Delegation Program metadata, commit nonces, rent reimbursements, and base accounts needed for task construction; +- choose commit delivery strategies: state args, diff args, state buffers, diff buffers, and optional ALTs; +- prepare and clean up committor-program buffer accounts and TableMania lookup-table reservations; +- execute single-stage or two-stage base-layer transaction flows and schedule action callbacks; +- persist commit rows, strategies, signatures, and pending intents in SQLite for status APIs and recovery. + +This crate is on the base-layer settlement hot path. Changes can affect fund safety, undelegation liveness, commit ordering/nonces, restart recovery, RPC load, transaction count, and latency. Security and correctness take priority over throughput: do not weaken signer usage, base-layer freshness/min-context-slot handling, commit nonce sequencing, scheduler conflict blocking, or buffer/ALT cleanup safety. + +## Update requirement + +Update this guide in the same change whenever behavior or contracts in `magicblock-committor-service` change. In particular, update it for changes to: + +- `CommittorService`, `BaseIntentCommittor`, `CommittorServiceExt`, channel messages, startup/shutdown, or cancellation semantics; +- `ChainConfig`, `ComputeBudgetConfig`, action timeout behavior, RPC/websocket construction, or configured commitment assumptions; +- intent scheduling, conflict detection, executor concurrency, backlog capacity, result broadcasting, or metrics; +- `TaskInfoFetcher` commit nonce caching, `min_context_slot` behavior, retry policy, or cache reset rules; +- task building, commit/finalize/undelegate/action task semantics, commit nonce persistence, diff/state thresholds, or rent reimbursement fetches; +- strategy selection, transaction-size limits, buffer/ALT fallback, single-stage versus two-stage choice, or action-stripping/retry logic; +- delivery preparation, committor-program buffer initialization/write/cleanup, TableMania reservations, or RPC send/retry/error mapping; +- SQLite schema, persisted statuses/strategies/signatures, pending-intent recovery windows, or recovery reconstruction; +- integration test commands, performance characteristics, or operator-facing diagnostics. + +Because this crate coordinates settlement across several crates, also update related crate guides when changing contracts with `magicblock-committor-program`, `magicblock-rpc-client`, `magicblock-table-mania`, `magicblock-accounts`, or `magicblock-magic-program-api`. + +## Where it sits in the repository + +| Path | Role | +|---|---| +| `magicblock-committor-service/Cargo.toml` | Package metadata and dependencies on committor program, core, Magic Program, metrics, RPC client, TableMania, SQLite, and Solana crates. | +| `magicblock-committor-service/README.md` | High-level architecture notes for intent execution, schedulers, task builders, strategist, and delivery preparation. | +| `src/lib.rs` | Public crate surface. Re-exports `ComputeBudgetConfig`, `DEFAULT_ACTIONS_TIMEOUT`, committor-program changeset types, `BaseIntentCommittor`, and `CommittorService`. | +| `src/service.rs` | Actor-style service handle, `CommittorMessage`, `CommittorService::try_start`, `BaseIntentCommittor` trait, oneshot request API, and cancellation token. | +| `src/service_ext.rs` | `CommittorServiceExt` wrapper that waits for broadcasted execution results by intent id. Used in tests and synchronous-style callers. | +| `src/config.rs` and `src/compute_budget.rs` | Chain/RPC configuration, default action timeout, and per-task compute-budget helpers. | +| `src/committor_processor.rs` | Constructs `MagicblockRpcClient`, `TableMania`, `IntentPersisterImpl`, `IntentExecutionManager`, and `CacheTaskInfoFetcher`; exposes persistence queries and recovery helpers. | +| `src/intent_execution_manager.rs` | Backpressure boundary between service and execution engine; enqueues bundles and falls back to an internal DB when the channel is full. | +| `src/intent_execution_manager/intent_execution_engine.rs` | Main scheduler loop, executor semaphore (`MAX_EXECUTORS = 50`), result broadcasting, metrics, and successful-cleanup spawning. | +| `src/intent_execution_manager/intent_scheduler.rs` | Pubkey conflict scheduler for committed accounts. Maintains FIFO blocking queues and prevents duplicate/concurrent conflicting intents. | +| `src/intent_executor/` | Intent execution state machine, transaction client, factory, single-stage/two-stage executors, timeout helpers, and commit nonce fetcher/cache. | +| `src/tasks/` | Atomic base-layer task types and task builders/strategist for commit, commit-finalize, undelegate, actions, buffers, ALTs, and compute budgets. | +| `src/transaction_preparator/` | Converts a `TransactionStrategy` into a `VersionedMessage` after preparing buffers and lookup tables; owns buffer/ALT cleanup. | +| `src/persist/` | SQLite persistence for commit rows, bundle signatures, status/strategy enums, and conversion utilities. | +| `src/stubs/` | Feature-gated dev/test stub committor behind `dev-context-only-utils`. | +| `magicblock-api/src/magic_validator.rs` | Starts the service at validator initialization with `committor_service.sqlite`, validator keypair, RPC URL, websocket URL, compute-unit price, and action callback scheduler. | +| `magicblock-accounts/src/scheduled_commits_processor.rs` | Main runtime producer/consumer: takes scheduled intent bundles from the transaction scheduler, schedules them with the committor, consumes result broadcasts, and performs pending-intent recovery after ledger replay. | +| `magicblock-account-cloner/src/account_cloner.rs` | Uses `BaseIntentCommittor` for lookup-table reservation around account cloning and diagnostic mapping of committor errors. | +| `magicblock-api/src/magic_sys_adapter.rs` | Fetches current commit nonces through the committor service for Magic syscalls. | +| `test-integration/test-committor-service/` | Integration coverage for delivery preparators, transaction preparators, intent executor flows, and local commit execution. | + +Main upstream dependencies: + +- `magicblock-program` / `magicblock-magic-program-api` for `ScheduledIntentBundle`, intent bundle structure, validator authority, and Magic Action types; +- `magicblock-committor-program` for buffer/chunks instruction builders and changeset types; +- `magicblock-delegation-program-api` for delegation metadata PDA derivation and commit nonce/rent reimbursement reads; +- `magicblock-rpc-client` for base-layer sends, confirmations, account reads, transaction diagnostics, slot/blockhash caching, and `min_context_slot` RPC calls; +- `magicblock-table-mania` for ALT reservation, finalized table fetch, release, and GC; +- `magicblock-core` for committed-account types and `ActionsCallbackScheduler`. + +## Public API shape / Main public types and APIs + +### Crate exports + +`src/lib.rs` exports: + +- `pub mod config`, `error`, `intent_execution_manager`, `intent_executor`, `persist`, `service_ext`, `tasks`, `transaction_preparator`, and `transactions`; +- `ComputeBudgetConfig` and `DEFAULT_ACTIONS_TIMEOUT`; +- `ChangedAccount`, `Changeset`, and `ChangesetMeta` re-exported from `magicblock-committor-program`; +- `BaseIntentCommittor` and `CommittorService`. + +Most modules are public for tests and consumers, but the intended runtime boundary is the service trait plus status/query helpers. Avoid adding new cross-crate call paths into internals unless the ownership boundary is intentional and documented. + +### `CommittorService` and `BaseIntentCommittor` + +`CommittorService::try_start(authority, persist_file, chain_config, chain_slot, actions_callback_executor)` creates an mpsc-backed actor with capacity `1_000` and spawns it on Tokio. The actor owns a `CommittorProcessor`, and each public method sends a `CommittorMessage` plus an oneshot response channel. `try_send` logs if the actor channel is full or closed; it does not block the caller. + +`BaseIntentCommittor` is the shared trait used by runtime consumers and stubs. Important methods: + +- `reserve_pubkeys_for_committee(committee, owner)` reserves committee-specific pubkeys in TableMania before cloning/use; +- `schedule_intent_bundles(Vec)` schedules fresh intents and persists rows first; +- `subscribe_for_results()` returns a broadcast receiver of `BroadcastedIntentExecutionResult` values; +- `get_commit_statuses(message_id)` and `get_commit_signatures(commit_id, pubkey)` query SQLite status/signature data; +- `get_transaction(signature)` fetches base-layer transaction diagnostics; +- `fetch_current_commit_nonces(pubkeys, min_context_slot)` returns current base-layer nonces without incrementing the cache; +- `stop()` cancels the actor; `stopped()` resolves when cancellation is requested. + +`CommittorService` also exposes inherent helpers not on the trait: common ALT reservation/release, `get_pending_intent_bundles`, `schedule_recovered_intent_bundles`, `get_lookup_tables`, and a blocking-channel `fetch_current_commit_nonces_sync` used where an async oneshot is inconvenient. + +### `CommittorServiceExt` + +`CommittorServiceExt` wraps any `BaseIntentCommittor`, subscribes to broadcast results once, and dispatches results to one pending oneshot per intent id. `schedule_intent_bundles_waiting` registers all intent ids before scheduling and rejects duplicate ids with `RepeatingMessageError`. This prevents a fast execution result from being broadcast before the waiter exists. + +Do not use duplicate intent ids with the extension: `pending_messages` is keyed only by `ScheduledIntentBundle::id`. + +### Config and compute budgets + +`ChainConfig` stores RPC URI, optional websocket URI, Solana commitment, `ComputeBudgetConfig`, and `actions_timeout` (`DEFAULT_ACTIONS_TIMEOUT = 60s`). The validator currently constructs it in `magicblock-api` with confirmed base-layer commitment and the configured commit compute-unit price. + +`ComputeBudgetConfig::new(compute_unit_price)` controls budgets for args processing, buffer close, buffer process-and-close, finalize, undelegate, buffer init/realloc, and buffer writes. Buffer init/realloc/write budgets currently hard-code `compute_unit_price: 1_000_000` rather than the caller-provided price; treat that as current behavior when validating fee/priority-fee changes. + +### Persistence API + +`IntentPersister` is the internal persistence trait. `IntentPersisterImpl` wraps `CommittsDb` behind `Arc>` and creates two tables: + +- `commit_status`, keyed by `(message_id, commit_id, pubkey)`, storing account owner, slot, ER blockhash, undelegate flag, lamports/data, commit type, status, strategy, signatures, timestamps, and retry count; +- `bundle_signature`, keyed by bundle/message id, storing commit-stage and finalize-stage signatures. + +`IntentPersisterImpl::create_commit_rows` creates one row per committed/undelegated account. Empty data is persisted as `CommitType::EmptyAccount` with `data = None`; non-empty data is persisted as `CommitType::DataAccount`. + +## Runtime flows + +### Startup and service wiring + +```text +magicblock-api::MagicValidator::init_committor_service + -> CommittorService::try_start + -> CommittorActor::try_new + -> CommittorProcessor::try_new + -> MagicblockRpcClient from RPC/websocket/chain_slot + -> TableMania with default GC + -> IntentPersisterImpl at storage/committor_service.sqlite + -> CacheTaskInfoFetcher + -> IntentExecutionManager + IntentExecutionEngine + -> actor run loop spawned on Tokio +``` + +The service is initialized before the account manager starts pending-intent recovery. Pending recovery must run after ledger replay so local accounts reflect delegated state before recovered intents are checked. + +### Fresh scheduled intent flow + +```text +Magic Program schedules intent in ER + -> transaction scheduler exposes ScheduledIntentBundle(s) + -> magicblock-accounts::ScheduledCommitsProcessor::process + -> CommittorService::schedule_intent_bundles + -> CommittorProcessor::schedule_intent_bundle + -> IntentPersisterImpl::start_base_intents + -> IntentExecutionManager::schedule + -> IntentExecutionEngine::main_loop + -> IntentScheduler blocks conflicts by committed pubkeys + -> IntentExecutorImpl executes selected intent + -> broadcast result + -> ScheduledCommitsProcessor consumes result and updates local/metadata state +``` + +`CommittorProcessor::schedule_intent_bundle` logs persistence failures but still tries to execute. This is intentionally loud because losing persistence weakens restart recovery; do not hide or downgrade that error path. + +### Recovery flow for pending intents + +1. `magicblock-accounts` calls `get_pending_intent_bundles()` after replay. +2. `CommittorProcessor::pending_intent_bundles` loads SQLite rows with `CommitStatus::Pending` and `created_at` inside the 14-day recovery window. +3. It fetches the current base-layer slot and reconstructs `ScheduledIntentBundle`s grouped by `message_id`. +4. Rows for a message must agree on ER slot and ER blockhash; otherwise that message is skipped. +5. Data-account rows without stored data are skipped because they cannot reconstruct a `CommittedAccount` safely. +6. `magicblock-accounts` filters recovered bundles against current delegated state, then calls `schedule_recovered_intent_bundles` so rows are not inserted again. + +Preserve the no-repersist path for recovered intents. Re-inserting rows can violate primary keys or duplicate status history. + +### Scheduling and concurrency flow + +`IntentExecutionManager::schedule` first checks whether its internal DB/backlog is empty. If it is not empty, new bundles are stored there to preserve order. If the channel is full, the current and remaining bundles are also stored in the DB. The current `DummyDB` is in-memory; durable recovery is handled by SQLite commit rows, not this backlog. + +`IntentExecutionEngine` repeatedly: + +1. handles completed executor join handles first, which lets blocked intents become eligible before accepting new ones; +2. receives a new bundle from the channel or DB if scheduler capacity allows; +3. asks `IntentScheduler` whether it can run now; +4. waits for one of `MAX_EXECUTORS = 50` semaphore permits; +5. creates an executor and spawns intent execution; +6. broadcasts the result, completes the scheduler entry, and cleans buffers/ALTs only after successful execution. + +The scheduler blocks on the union of `ScheduledIntentBundle::get_all_committed_pubkeys()`, including commit and commit-and-undelegate accounts in the same bundle. Standalone base actions with no committed pubkeys do not block on account keys. + +### Intent execution and task strategy flow + +```text +IntentExecutorImpl::execute + -> mark persisted rows Pending + -> TaskBuilderImpl::commit_tasks + finalize_tasks + -> fetch next commit nonces and diffable base accounts using max(remote_slot) + -> persist commit_id for each committed account + -> create commit, commit-finalize, undelegate, finalize, and action tasks + -> TaskStrategist::build_execution_strategy + -> try single transaction when total task count <= 22 and it fits + -> optimize large tasks to buffers when needed + -> use ALTs when buffers alone do not fit + -> choose two-stage when single-stage is too large or ALT latency would be worse + -> TransactionPreparator prepares buffers/ALTs and assembles VersionedMessage + -> SingleStageExecutor or TwoStageExecutor sends base-layer transactions + -> persist final status/signatures and schedule callbacks + -> reset nonce cache on errors or undelegation +``` + +For committed accounts with `data.len() > COMMIT_STATE_SIZE_THRESHOLD` (`256`), the task builder fetches the base account and may use diff-in-args delivery. If the base-account fetch fails, it falls back to full state args and logs a warning. This can increase transaction size and trigger buffer/ALT strategy later. + +### Delivery preparation and cleanup flow + +`TransactionPreparatorImpl::prepare_for_strategy` first compiles against dummy lookup tables to fail early if the message cannot fit. It then calls `DeliveryPreparator::prepare_for_delivery`: + +1. prepare each task concurrently, recording task-preparation metrics; +2. for buffer tasks, persist `BufferAndChunkPartiallyInitialized`, initialize/realloc buffer accounts, persist `BufferAndChunkInitialized`, write missing chunks, then persist `BufferAndChunkFullyInitialized`; +3. if a buffer account is already initialized, cleanup is attempted, the cached blockhash is invalidated, and preparation is retried once; +4. reserve ALTs in TableMania and wait for finalized lookup table accounts; +5. assemble the final versioned message with real lookup table accounts. + +Cleanup closes prepared buffers and releases TableMania pubkeys. `IntentExecutionEngine` intentionally runs cleanup only after successful execution because failed intent cleanup can race with a retried or concurrent intent using the same buffer PDA set. + +## Important internals and caveats + +### Commit nonce cache + +`CacheTaskInfoFetcher` caches commit nonces in a 1,000-entry LRU. It uses per-pubkey async mutexes acquired in sorted order to avoid A→B / B→A deadlocks, and a `retiring` map to keep evicted locks alive while in-flight requests still hold them. `fetch_next_commit_nonces` increments cached values and reserves the next nonce; `fetch_current_commit_nonces` reads/stores the current value without incrementing. After a failed commit or undelegation, `IntentExecutorImpl` resets the cache for affected pubkeys. + +Do not remove sorted lock acquisition or the retiring map without replacing the deadlock/race prevention. Commit nonce races can cause base-layer commit failures and stuck undelegations. + +### `min_context_slot` and freshness + +Task-info RPC reads use the maximum `remote_slot` across committed accounts as `min_context_slot` when fetching delegation metadata and diffable base accounts. This helps avoid building commits against base-layer state older than the ER account snapshot. The fetcher retries `Minimum context slot not reached` up to five times with short sleeps. Preserve this freshness check unless the broader account-sync/settlement contract changes. + +### Persistence is both status API and recovery state + +SQLite rows are used by operator/status APIs and by restart recovery. Updating status mapping is not a cosmetic change: it affects which intents are recoverable, which accounts look failed/stuck, and which signatures are returned. Keep persisted enum string conversions compatible with existing rows. + +### Buffers, ALTs, and transaction fit + +`TaskStrategist` first tries args, then buffer optimization, then ALTs. It chooses two-stage execution in cases where a single-stage ALT transaction would be slower than two no-ALT transactions. Altering thresholds such as `MAX_UNITED_TASKS_LEN = 22`, `COMMIT_STATE_SIZE_THRESHOLD = 256`, transaction-size constants, or buffer chunking changes latency, RPC transaction counts, and fit behavior. + +### Actions and callbacks + +Standalone actions are currently built through commit-task paths even when there are no committed accounts. Base actions with callbacks are extracted and scheduled through the `ActionsCallbackScheduler`. `actions_timeout` applies across action-related execution work. If action execution fails with recoverable CPI/limit errors, the executor can strip actions or move from single-stage to two-stage depending on the path; preserve error visibility through `patched_errors` and callback reports. + +### Service channel backpressure + +The public service API uses nonblocking `try_send`. If the service channel is full or closed, callers receive only a oneshot that may never be answered while an error is logged. This is current behavior; changing it to fail synchronously would be a public contract change that needs consumer updates. + +## Important invariants + +1. Do not execute two intent bundles concurrently when their committed-pubkey sets overlap. +2. Preserve FIFO blocking semantics across indirectly blocked intents; later intents must not bypass an earlier blocked intent sharing any key. +3. Do not schedule duplicate intent ids in the same scheduler/execution-extension context. +4. Commit nonces must be fetched with base-layer freshness (`min_context_slot`) and incremented atomically per account. +5. A failed or undelegating intent must reset cached nonces for affected accounts. +6. Fresh intent scheduling must persist rows before execution when possible; recovered scheduling must not reinsert rows. +7. Pending-intent recovery must reconstruct only rows inside the recovery window and skip inconsistent or incomplete persisted groups. +8. Buffer accounts and ALTs must be prepared before transaction assembly uses them, and released/closed only when safe. +9. Failed intent cleanup must not race with retries using the same buffer PDAs; current cleanup is success-only for that reason. +10. Transaction-size and compute-budget choices must keep produced transactions under Solana wire limits. +11. Base-layer sends must preserve explicit processed/committed confirmation semantics from `magicblock-rpc-client`. +12. Signer/authority requirements for validator-signed commits, committor-program buffers, ALTs, callbacks, and base-layer instructions must not be relaxed. +13. Persistence status/signature updates must continue to expose enough information for diagnostics, retries, and recovery. +14. Avoid adding blocking I/O or unbounded work to service actor, scheduler, executor, task-preparation, or RPC hot paths. + +## Common change areas and what to inspect + +### Changing service API, startup, or shutdown + +Start with `src/service.rs`, `src/committor_processor.rs`, and `magicblock-api/src/magic_validator.rs`. Then inspect `magicblock-accounts/src/scheduled_commits_processor.rs`, `magicblock-account-cloner/src/account_cloner.rs`, and `magicblock-api/src/magic_sys_adapter.rs`. Check oneshot behavior, channel capacity/backpressure, cancellation, and whether consumers need errors instead of logged-only failures. + +### Changing scheduling or concurrency + +Start with `src/intent_execution_manager/intent_scheduler.rs`, `intent_execution_engine.rs`, and tests in those files. Verify conflict sets include all committed accounts in mixed bundles, scheduler capacity remains bounded, semaphore permits are always released, and completion cannot corrupt blocked queues. + +### Changing commit nonce or metadata fetching + +Start with `src/intent_executor/task_info_fetcher.rs` and `src/tasks/task_builder.rs`. Inspect `magicblock-api/src/magic_sys_adapter.rs` for current nonce queries. Preserve sorted lock acquisition, cache reset behavior, `min_context_slot`, Delegation Program PDA derivation, and retry/error classification. + +### Changing task construction or strategy selection + +Start with `src/tasks/task_builder.rs`, `src/tasks/task_strategist.rs`, `src/tasks/commit_task.rs`, `src/tasks/commit_finalize_task.rs`, and `src/tasks/utils.rs`. Then inspect `magicblock-committor-program` instruction builders, `magicblock-delegation-program-api` expectations, and integration tests under `test-integration/test-committor-service`. Validate commit ids, allow-undelegation flags, action ordering, diff-vs-state delivery, buffer conversion, ALT keys, and strategy persistence. + +### Changing delivery preparation or cleanup + +Start with `src/transaction_preparator/mod.rs` and `delivery_preparator.rs`, then inspect `.agents/context/crates/magicblock-committor-program.md`, `.agents/context/crates/magicblock-table-mania.md`, and `.agents/context/crates/magicblock-rpc-client.md`. Check buffer init/realloc/write chunking, retry handling for already-initialized buffers, cached blockhash invalidation, ALT finalized waits, cleanup-on-success only, and release of TableMania refs. + +### Changing persistence or recovery + +Start with `src/persist/db.rs`, `src/persist/commit_persister.rs`, `src/persist/types/`, and `src/committor_processor.rs` recovery helpers. Then inspect `magicblock-accounts/src/scheduled_commits_processor.rs`. Preserve schema compatibility, enum string values, `u64`/`i64` conversions, row grouping by `message_id`, 14-day recovery window, and no-repersist recovery scheduling. + +### Changing metrics or observability + +Start with metric calls in `intent_execution_engine.rs`, `delivery_preparator.rs`, and `intent_execution_client.rs`, plus definitions in `magicblock-metrics/src/metrics/mod.rs`. Keep labels low-cardinality (`intent_kind`, `error_kind`, task labels) and update `.agents/context/crates/magicblock-metrics.md` if names/labels change. + +## Tests and validation + +For documentation-only changes, verify paths and links: + +```bash +git diff --check +rg "magicblock-committor-service.md|magicblock-committor-service" AGENTS.md .agents/context/crate-map.md .agents/context/crates/magicblock-committor-service.md +``` + +For code changes in this crate, run targeted unit tests first: + +```bash +cargo fmt +cargo nextest run -p magicblock-committor-service +cargo clippy -p magicblock-committor-service --all-targets -- -D warnings +``` + +For changes involving commit delivery, buffers, ALTs, action execution, or base-layer confirmations, run relevant integration suites when practical: + +```bash +cd test-integration +make test-committor +``` + +Useful narrower committor targets include: + +```bash +cd test-integration +make test-committor-preparators +make test-committor-ix-order +make test-committor-ix-multi +make test-committor-commitfinalize +make test-committor-intent-executor +make test-committor-intent-executor-recovery +``` + +For changes touching TableMania or RPC-client behavior, also run the corresponding integration suite when practical: + +```bash +cd test-integration +make test-table-mania +``` + +Before handing off Rust behavior changes, run the broader baseline from `.agents/rules/testing-and-validation.md` when time allows: + +```bash +cargo fmt +cargo clippy --workspace --all-targets -- -D warnings +cargo nextest run --workspace +``` + +Performance-sensitive changes should report expected effects on executor parallelism, RPC calls, transaction count, ALT waits, buffer writes/chunks, SQLite writes, and cleanup latency. If no measurement is practical, state the residual risk explicitly. + +Security/correctness validation should explicitly confirm that signer/authority requirements, base-layer `min_context_slot` freshness, commit nonce sequencing, scheduler conflict blocking, and recovery durability were preserved. + +## Related docs + +- `.agents/context/overview.md` for validator runtime context. +- `.agents/rules/validator-goals.md` for security, settlement, recovery, and performance goals. +- `.agents/specs/validator-specification.md` for commit, undelegation, Magic Actions, committor pipeline, and recovery behavior. +- `.agents/context/architecture.md` for the base-layer settlement layer and service boundaries. +- `.agents/context/crate-map.md` for crate ownership and consumers. +- `.agents/rules/testing-and-validation.md` for workspace and integration validation expectations. +- `.agents/context/crates/magicblock-committor-program.md` for buffer/chunks on-chain helper contracts. +- `.agents/context/crates/magicblock-rpc-client.md` for base-layer send/confirm and RPC helper behavior. +- `.agents/context/crates/magicblock-table-mania.md` for ALT lifecycle and finalized-read semantics. +- `.agents/context/crates/magicblock-accounts.md` for scheduled commit processing and pending-intent recovery call sites. +- `magicblock-committor-service/README.md` for current high-level implementation notes. +- `test-integration/test-committor-service/` for integration coverage of delivery and intent execution. diff --git a/.agents/context/crates/magicblock-config.md b/.agents/context/crates/magicblock-config.md new file mode 100644 index 000000000..8c3fd72be --- /dev/null +++ b/.agents/context/crates/magicblock-config.md @@ -0,0 +1,357 @@ +# `magicblock-config` + +## Purpose + +`magicblock-config` owns the validator's typed configuration model and layered configuration loading. It is used by the validator entrypoint and most runtime services to turn defaults, TOML, environment variables, and CLI flags into one `ValidatorParams` value. + +High-level responsibilities: + +- define strongly typed config sections for validator identity, lifecycle, remotes, RPC/pubsub, metrics, gRPC streams, chainlink, accounts DB, ledger, committor, task scheduler, registration metadata, and preloaded programs; +- parse config from CLI, environment variables, and TOML with deterministic precedence; +- provide small helper types for keypairs/pubkeys, bind addresses, storage paths, and remote endpoints; +- enforce post-load remote defaults so the runtime always has at least one HTTP endpoint and one websocket endpoint; +- keep operator-facing config names in `kebab-case` for TOML and mapped `MBV_...` environment variables. + +This crate sits on the startup/configuration path rather than a per-transaction hot path. However, its values control performance-sensitive services such as RPC event processors, account monitoring, gRPC subscription topology, AccountsDb sizing, ledger block timing, metrics collection, and task scheduling. Configuration changes can therefore alter runtime behavior, persistence, operator compatibility, and performance indirectly. + +## Update requirement + +Update this document in the same change whenever `magicblock-config` behavior or contracts change. This file is useful only if it reflects the current implementation. + +Update it for changes to: + +- `ValidatorParams`, config section structs, defaults, serde names, or `deny_unknown_fields` behavior; +- precedence, source merging, CLI overlay semantics, environment variable mapping, or TOML file handling; +- public helper types such as `Remote`, `BindAddress`, `StorageDirectory`, `SerdeKeypair`, or `SerdePubkey`; +- operator-facing keys in `config.example.toml` or `magicblock-config/README.md`; +- remote endpoint defaults or HTTP-to-websocket derivation behavior; +- config fields consumed by startup/shutdown, persistence, replication, Chainlink, Aperture, metrics, committor, or task scheduler flows; +- validation commands, integration test coverage, or known pitfalls for config changes. + +## Where it sits in the repository + +| Path | Role | +|---|---| +| `magicblock-config/src/lib.rs` | Crate root, `ValidatorParams`, layered loading in `ValidatorParams::try_new`, remote default helpers, and URL iterators. | +| `magicblock-config/src/config/mod.rs` | Config module declarations and public re-exports for section types. | +| `magicblock-config/src/config/cli.rs` | Clap-facing CLI overlay structs. Only some settings are exposed on CLI. | +| `magicblock-config/src/config/accounts.rs` | AccountsDb storage sizing, block size, snapshot, defragmentation, and reset settings. | +| `magicblock-config/src/config/aperture.rs` | RPC/websocket listen address, event processor count, and Geyser plugin config paths. | +| `magicblock-config/src/config/chain.rs` | Committor compute budget, chain registration metadata, Chainlink cloning/monitoring options, allowed program filters, and Range risk config. | +| `magicblock-config/src/config/grpc.rs` | Global gRPC stream topology limits for remote account providers. | +| `magicblock-config/src/config/ledger.rs` | Ledger block timing, superblock size, reset, keypair verification, size limit, and replay authority override. | +| `magicblock-config/src/config/lifecycle.rs` | `LifecycleMode` and remote-provider requirement helper. | +| `magicblock-config/src/config/metrics.rs` | Metrics bind address and collection frequency. | +| `magicblock-config/src/config/program.rs` | Startup-loadable program ID/path entries. | +| `magicblock-config/src/config/scheduler.rs` | Task scheduler reset, minimum interval, and failed-task retention/cleanup settings. | +| `magicblock-config/src/config/validator.rs` | Base fee, validator identity, replication mode, redacted replication config, and replication authority override helper. | +| `magicblock-config/src/types/` | Serde/parser helper wrappers for keypairs, pubkeys, bind addresses, remotes, and storage directory. | +| `magicblock-config/src/consts.rs` | Default values and remote aliases used by config structs and parsers. | +| `magicblock-config/src/tests.rs` | Unit tests for defaults, precedence, overlays, env mapping, example config coverage, parser helpers, and redaction. | +| `magicblock-config/README.md` | Human-facing crate overview and usage notes. | +| `config.example.toml` | Operator-facing example and coverage target for available options. | +| `test-integration/test-config/` | Integration coverage for config-to-CLI/validator behavior, including allowed program config. | + +Main consumers: + +- `magicblock-validator/src/main.rs` parses `ValidatorParams`, prints/logs resolved endpoints, and chooses TUI/headless startup. +- `magicblock-api/src/magic_validator.rs` consumes almost every section while constructing ledger, AccountsDb, replication, committor, Chainlink, Aperture, metrics, scheduler, program loading, registration, and recovery flows. +- `magicblock-aperture` consumes `ApertureConfig` for bind addresses, event processors, and Geyser plugin paths. +- `magicblock-chainlink` consumes `ChainLinkConfig`, `GrpcConfig`, `LifecycleMode`, and remote endpoint-derived settings for account sync and stream management. +- `magicblock-accounts-db` consumes `AccountsDbConfig` for persistent account storage sizing, reset, block size, and snapshots. +- `magicblock-task-scheduler` consumes `TaskSchedulerConfig` for SQLite reset, crank timing, and cleanup retention. +- `magicblock-aml` consumes Range risk-related values through `RiskConfig`. +- Integration tests under `test-integration/` parse or mirror config for validator startup scenarios. + +## Public API shape / Main public types and APIs + +The primary public entrypoint is: + +```rust +let params = magicblock_config::ValidatorParams::try_new(std::env::args_os())?; +``` + +`ValidatorParams` is `Clone + Deserialize + Serialize + Debug + Default` with `#[serde(default, rename_all = "kebab-case", deny_unknown_fields)]`. Its top-level fields are: + +- `config: Option` — optional TOML path parsed positionally by CLI; +- `remotes: Vec` — base-chain HTTP/websocket/gRPC endpoints; +- `lifecycle: LifecycleMode` — `ephemeral` by default; +- `storage: StorageDirectory` — root path for persistent validator data; +- `no_tui: bool` — headless mode flag for the validator binary; +- section configs: `metrics`, `grpc`, `validator`, `aperture`, `commit`, `accountsdb`, `ledger`, `chainlink`, `chain_operation`, `task_scheduler`, and `programs`. + +Important methods on `ValidatorParams`: + +- `try_new(args)` parses CLI with Clap, merges TOML, environment, and CLI providers, extracts the typed struct, then calls `ensure_http()` and `ensure_websocket()`. +- `rpc_url()` returns the first HTTP remote, falling back to `DEFAULT_REMOTE`. +- `websocket_urls()` iterates all websocket remotes. +- `grpc_urls()` iterates all gRPC remotes. +- `Display` serializes the resolved config as pretty TOML when possible, otherwise falls back to debug output. + +Public config sections are re-exported from `magicblock_config::config`, including `AccountsDbConfig`, `ApertureConfig`, `ChainLinkConfig`, `ChainOperationConfig`, `CommittorConfig`, `GrpcConfig`, `LedgerConfig`, `LifecycleMode`, `LoadableProgram`, `RiskConfig`, `TaskSchedulerConfig`, and `ValidatorConfig`. + +Important helper types: + +- `Remote` accepts `http(s)`, `ws(s)`, and `grpc(s)` schemes plus aliases `mainnet`, `devnet`, `testnet`, `localhost`, and `dev`. `grpc`/`grpcs` parse as `Grpc` while rewriting the stored URL scheme to `http`/`https` for URL compatibility. +- `Remote::to_websocket()` derives websocket endpoints only from HTTP remotes, preserving Solana's convention of websocket port = HTTP port + 1 when an explicit port exists. +- `BindAddress` accepts socket addresses and plain port numbers. Plain ports bind to `127.0.0.1:`. `http()` and `websocket()` convert unspecified IPs to localhost for client connection URLs and derive websocket port by saturating `port + 1`. +- `SerdeKeypair` serializes keypairs as base58 strings but displays/debugs only the pubkey. Its clone uses `insecure_clone()` because runtime consumers need signer material. +- `ReplicationConfig` redacts `secret` in `Debug` and `Serialize`; do not replace this with derived implementations. + +## Runtime flows + +### Layered configuration load + +```text +CLI args + -> CliParams overlay + -> optional TOML file + -> MBV_ environment variables + -> serialized CLI overlay + -> ValidatorParams extraction + -> ensure_http + ensure_websocket + -> runtime startup +``` + +1. `ValidatorParams::try_new` parses `CliParams` with `CliParams::parse_from(args)`. +2. It starts from an empty `Figment`; serde defaults fill omitted fields during extraction. +3. If `cli.config` is present, `Toml::file(path)` is merged first. +4. Environment variables are merged with prefix `MBV_`, split on `__`, and normalized by replacing `_` with `-` through `Uncased`, so `MBV_LEDGER__BLOCK_TIME` maps to `ledger.block-time`. +5. The serialized CLI overlay is merged last and has highest precedence. +6. The extracted `ValidatorParams` is post-processed to guarantee at least one HTTP remote and one websocket remote. + +Precedence is therefore: CLI > environment > TOML > serde/default values. Preserve the optional CLI overlay pattern: a CLI sub-struct must not reset unmentioned TOML/env fields in the same config section. + +### Remote endpoint flow + +1. `Remote::from_str` recognizes aliases and schemes. +2. Non-standard `grpc` and `grpcs` prefixes are rewritten to `http` and `https` before URL parsing but remain classified as `Remote::Grpc`. +3. `ensure_http()` appends the default devnet HTTP URL when no HTTP remote exists. +4. `ensure_websocket()` appends a websocket derived from the first HTTP remote when no websocket remote exists. +5. `magicblock-api` builds Chainlink `Endpoints` from all remotes, committor chain RPC from `rpc_url()`, and committor websocket from the first `websocket_urls()` result. + +Caveat: gRPC remotes do not derive websocket remotes. A config with only gRPC remotes will get a default devnet HTTP and derived websocket remote unless an HTTP/websocket endpoint is also configured. + +### Startup consumption flow + +```text +magicblock-validator::main + -> ValidatorParams::try_new + -> MagicValidator::try_from_config + -> ledger/accounts/replication/chainlink/aperture/metrics/scheduler/committor startup +``` + +`magicblock-api/src/magic_validator.rs` is the main consumer. It uses: + +- `validator.keypair`, `validator.basefee`, and `validator.replication_mode` for genesis, identity checks, base fees, replication, and mode transitions; +- `ledger` and `storage` for ledger opening, replay, reset, keypair verification, block timing, superblocks, and truncation size; +- `accountsdb` and `storage` for account database open/reset/snapshot behavior; +- `remotes`, `chainlink`, `grpc`, and `lifecycle` for remote account providers, Chainlink configuration, gRPC stream limits, and disabled-chainlink replica mode; +- `aperture` for RPC/websocket server startup and event processing; +- `metrics` for metrics service bind address and collection cadence; +- `commit` for base-layer compute unit price in committor transactions; +- `task_scheduler` for scheduled task service initialization; +- `programs` for startup program loading; +- `chain_operation` only when registration/fee-claim behavior is enabled and lifecycle permits it. + +### CLI and file-only fields + +Only `CliParams` fields are exposed to CLI. Current CLI coverage includes config path, remotes, lifecycle, storage, no-TUI, metrics bind address, validator base fee/keypair, Aperture listen/event processors, and ledger reset. Many sections are intentionally file/env-only, such as `accountsdb`, `chainlink`, `commit`, `grpc`, `task-scheduler`, `programs`, and most ledger fields. When adding CLI flags, use `Option` plus `skip_serializing_if` so the overlay remains non-destructive. + +## Important internals and caveats + +### Serde names and environment variables + +Most structs use `rename_all = "kebab-case"`; environment variables are upper snake case with `__` for nesting. For example: + +- `ledger.block-time` -> `MBV_LEDGER__BLOCK_TIME` +- `task-scheduler.failed-task-cleanup-interval` -> `MBV_TASK_SCHEDULER__FAILED_TASK_CLEANUP_INTERVAL` +- `chainlink.risk.request-timeout` -> `MBV_CHAINLINK__RISK__REQUEST_TIMEOUT` + +Do not introduce aliases casually. Operator docs, `config.example.toml`, integration tooling, and deployment configs depend on stable keys. + +### Strict unknown-field behavior + +Most config structs use `deny_unknown_fields`. This catches typos and stale config but makes renames/removals breaking for operators. If a field is renamed, include migration notes and update tests/example config in the same change. + +### Secrets and debug output + +`SerdeKeypair` debug/display output is pubkey-only, while serialized config still contains the base58 keypair. `ReplicationConfig` debug/serialize redacts the `secret`. The validator logs `format!("{config:#?}")` on startup, so any new secret-bearing type must implement redaction before being included in debug output. + +### Defaults are operational behavior + +Defaults are not merely test conveniences. They set devnet remotes, local storage, development validator keypair, base fee, commit compute unit price, account DB size, ledger timing, metrics cadence, Chainlink monitoring capacity, Range risk defaults, task scheduler timings, and gRPC stream limits. Changing defaults can affect local developer flows, integration tests, startup performance, storage usage, and network traffic. + +### Config example is tested + +`magicblock-config/src/tests.rs::test_example_config_full_coverage` parses the root `config.example.toml` and asserts many values. When adding or changing fields, update the example and this test together where appropriate. + +### Lifecycle mode is cross-cutting + +`LifecycleMode` is parsed by config but changes how account sync and execution are wired elsewhere. `Offline` is the only mode whose `needs_remote_account_provider()` returns false. Replica replication mode also disables Chainlink in `magicblock-api`; do not assume lifecycle alone fully determines remote-provider usage. + +## Important invariants + +1. Preserve config precedence: CLI > environment > TOML > defaults. +2. Preserve non-destructive CLI overlay semantics; absent CLI fields must not reset values loaded from lower-precedence sources. +3. Preserve `kebab-case` TOML/serde field names and `MBV_` environment mapping unless an intentional operator-facing breaking change is approved and documented. +4. Keep unknown-field rejection for strict operator feedback unless deliberately changing compatibility behavior. +5. `ValidatorParams::try_new` must return at least one HTTP remote and at least one websocket remote after post-processing. +6. Do not treat gRPC remotes as HTTP/websocket substitutes for committor or JSON-RPC flows; gRPC is for streaming providers. +7. Do not log secret material through `Debug`, `Display`, or startup config logging. +8. Keep `config.example.toml`, `magicblock-config/README.md`, tests, and config structs synchronized. +9. Adding config for a runtime service must include the service consumer update; unused config is misleading operational surface area. +10. Changes to timing, sizing, event-processor, stream-limit, reset, or retention defaults must call out runtime/performance and persistence implications. + +## Common change areas and what to inspect + +### Adding a new configurable field + +Inspect first: + +- target section in `magicblock-config/src/config/*.rs`; +- `ValidatorParams` in `magicblock-config/src/lib.rs` if it is a new top-level section; +- service consumer that will read the value; +- `config.example.toml` and `magicblock-config/README.md`; +- `magicblock-config/src/tests.rs` and relevant integration tests. + +Checklist: + +- choose `kebab-case` TOML name intentionally; +- add a safe/default value or make the field explicitly optional; +- decide whether it is CLI-exposed, env/TOML-only, or TOML-only; +- preserve CLI overlay semantics with `Option` when adding CLI flags; +- add tests for precedence/env/TOML/example coverage when behavior matters. + +### Changing remotes or endpoint parsing + +Inspect first: + +- `magicblock-config/src/types/network.rs`; +- `ValidatorParams::ensure_http`, `ensure_websocket`, `rpc_url`, `websocket_urls`, and `grpc_urls`; +- `magicblock-api/src/magic_validator.rs::init_chainlink` and `init_committor_service`; +- Chainlink endpoint parsing and gRPC stream consumers. + +Risks: + +- changing default/derived endpoints can silently point validators at a different base chain; +- websocket derivation affects pubsub/account monitoring and committor confirmation behavior; +- `grpc(s)` URL scheme rewriting is relied on by gRPC provider code. + +### Changing validator identity or replication config + +Inspect first: + +- `magicblock-config/src/config/validator.rs`; +- startup identity and replication setup in `magicblock-api/src/magic_validator.rs`; +- ledger keypair verification and replay authority override paths; +- tests for replication secret redaction. + +Risks: + +- leaking replication secrets or validator keypair material in logs; +- starting with an identity that does not match persisted ledger state; +- accidentally diverging primary and replica startup behavior. + +### Changing storage, ledger, or AccountsDb settings + +Inspect first: + +- `magicblock-config/src/config/accounts.rs` and `ledger.rs`; +- `magicblock-accounts-db/src/lib.rs` and `storage.rs`; +- `magicblock-api/src/ledger.rs` and startup/replay code; +- snapshot/defragment/reset tests. + +Risks: + +- reset flags wipe persistent state; +- defragmentation and snapshot settings interact with scheduler pauses and startup recovery; +- block time and superblock size affect blockhash validity, snapshots, and metrics. + +### Changing Chainlink/gRPC/risk settings + +Inspect first: + +- `magicblock-config/src/config/chain.rs` and `grpc.rs`; +- `magicblock-api/src/magic_validator.rs::init_chainlink`; +- `magicblock-chainlink/src/remote_account_provider/`; +- `magicblock-aml` Range risk usage; +- allowed-program filtering tests in Chainlink. + +Risks: + +- subscription limits and resubscription delay affect account sync throughput and provider load; +- allowed-program semantics treat `None` and `Some(vec![])` as unrestricted in current Chainlink code; +- risk config may add external I/O and should remain explicitly disabled by default. + +### Changing CLI flags + +Inspect first: + +- `magicblock-config/src/config/cli.rs`; +- `magicblock-validator/src/main.rs` usage and help output expectations; +- config precedence and overlay tests. + +Risks: + +- non-optional CLI fields can serialize defaults and overwrite TOML/env values; +- bool flags need careful `skip_serializing_if = "is_false"` handling; +- short flags can conflict with existing options. + +## Tests and validation + +For documentation-only changes to this guide: + +```bash +git diff --check -- .agents/context/crates/magicblock-config.md .agents/context/crate-map.md AGENTS.md +``` + +Also verify: + +- `.agents/context/crates/magicblock-config.md` exists; +- `.agents/context/crate-map.md` points future agents to this guide; +- `AGENTS.md` lists the new crate guide in the crate-specific examples; +- no files under `prompts/**` are staged or committed. + +For Rust/source changes in `magicblock-config`, run targeted checks first: + +```bash +cargo fmt +cargo clippy -p magicblock-config --all-targets -- -D warnings +cargo nextest run -p magicblock-config +``` + +For config changes that affect validator startup or operator config, also run the integration config suite when practical: + +```bash +cd test-integration +make test-config +``` + +Broader baseline validation remains the repository standard from `.agents/rules/testing-and-validation.md`: + +```bash +cargo fmt +cargo clippy --workspace --all-targets -- -D warnings +cargo nextest run --workspace +``` + +Performance validation expectations: + +- Documentation-only changes have no runtime performance impact. +- Changes to event processor counts, Chainlink monitoring capacity, gRPC stream limits, resubscription delay, ledger block time, AccountsDb sizing, metrics cadence, or task scheduler intervals should report expected runtime impact and include the smallest practical test or measurement for the affected service. +- Changes to reset, replay, identity, remotes, or replication config should include startup/recovery validation, not just crate unit tests. + +## Related docs + +- `AGENTS.md` — required agent workflow and documentation-memory rules. +- `.agents/context/overview.md` — validator runtime model and important concepts. +- `.agents/context/architecture.md` — startup/service orchestration and configuration ownership. +- `.agents/context/crate-map.md` — crate ownership map and pointer back to this guide. +- `.agents/rules/testing-and-validation.md` — repository validation commands and reporting expectations. +- `.agents/memory/agent-memory-and-docs.md` — rules for keeping agent documentation current. +- `magicblock-config/README.md` — human-facing config crate overview. +- `config.example.toml` — operator-facing example and tested config reference. +- `magicblock-api/src/magic_validator.rs` — primary runtime consumer of `ValidatorParams`. +- `magicblock-validator/src/main.rs` — binary entrypoint and config parsing. +- `test-integration/test-config/` — integration tests for config-driven validator behavior. diff --git a/.agents/context/crates/magicblock-core.md b/.agents/context/crates/magicblock-core.md new file mode 100644 index 000000000..622632066 --- /dev/null +++ b/.agents/context/crates/magicblock-core.md @@ -0,0 +1,433 @@ +# `magicblock-core` + +## Purpose + +`magicblock-core` is the validator's shared wiring and compatibility crate. It owns the channel types that connect RPC/API dispatch, transaction scheduling, executor outputs, scheduled tasks, and replication; it also provides cross-crate traits, intent payload types, global coordination-mode state, logging helpers, execution thread-local stashing, and shared token/eATA helpers. + +This crate is on several performance-sensitive paths: + +- transaction submission, simulation, replay, and replication ordering through `link::transactions`; +- account-update and transaction-status fanout through bounded `flume` channels; +- scheduler pause coordination for snapshot/checksum/reset maintenance; +- optimistic RPC/pubsub account reads through `LockedAccount`; +- Magic Program side effects that are collected through `ExecutionTlsStash` during SVM execution. + +Keep `magicblock-core` dependency-light and protocol-neutral where possible. It should define shared contracts and small helpers; it must not grow into an owner of RPC policy, account cloning, SVM execution, ledger persistence, committor delivery, or replication service orchestration. + +## Update requirement + +Update this guide in the same change whenever behavior or contracts in `magicblock-core` change. In particular, update it for changes to: + +- endpoint/channel topology, channel capacity, backpressure semantics, or scheduler pause behavior in `src/link.rs` and `src/link/transactions.rs`; +- public transaction modes, replay/block-boundary ordering, replication message layout, or bincode payload compatibility; +- `LockedAccount` optimistic read behavior or account encoding assumptions; +- `CoordinationMode` states, transitions, or helper semantics used by startup, primary, replica, and Magic Program code; +- `CommittedAccount`, `BaseActionCallback`, `MagicSys`, `LatestBlockProvider`, or `ActionsCallbackScheduler` contracts; +- `ExecutionTlsStash` lifecycle and the set of Magic Program side effects it carries; +- token/eATA derivation, remapping, and projection helpers; +- logging initialization style or feature-gated `tokio-console` behavior; +- validation commands or tests future agents should run for this crate. + +## Where it sits in the repository + +| Path | Role | +|---|---| +| `magicblock-core/src/lib.rs` | Public module surface, `Slot`, `TransactionIndex`, and `debug_panic!`. | +| `magicblock-core/src/link.rs` | Builds paired `DispatchEndpoints` and `ValidatorChannelEndpoints` via `link()`. Defines bounded channel capacity and the shared scheduler pause semaphore. | +| `magicblock-core/src/link/transactions.rs` | Transaction scheduler handle, scheduler commands, processing modes, simulation/replay result types, transaction sanitization wrappers, and scheduler idle guard. | +| `magicblock-core/src/link/accounts.rs` | Account update channel aliases plus `LockedAccount`, the optimistic sequence-lock wrapper used for safe reads of potentially borrowed account data. | +| `magicblock-core/src/link/blocks.rs` | Block hash alias and broadcast receiver type for latest-block notifications. | +| `magicblock-core/src/link/replication.rs` | Serializable replication message envelope and transaction/block/superblock/reset payloads. Variant order and sentinel indices are wire/order compatibility concerns. | +| `magicblock-core/src/coordination_mode.rs` | Process-global atomic coordination mode used by scheduler, RPC, Magic Program, and task/commit scheduling gates. | +| `magicblock-core/src/intent.rs` | Commit/action payload types shared by Magic Program, committor, accounts, and persistence code. Includes ATA-to-eATA remapping for committed accounts. | +| `magicblock-core/src/traits.rs` | Cross-crate trait boundaries for Magic Program syscalls, latest-block access, and action callback scheduling. | +| `magicblock-core/src/tls.rs` | Executor thread-local stash used by Magic Program instructions to emit scheduled task requests for the processor to drain after execution. | +| `magicblock-core/src/token_programs.rs` | Shared SPL Token, Token-2022, ATA, and eATA IDs plus derivation, inspection, remapping, and projection helpers. | +| `magicblock-core/src/logger/` | Tracing initialization, test logger setup, style-specific formatters, and log-rate consolidation helpers. | +| `magicblock-core/Cargo.toml` | Dependency and feature declaration. `tokio-console` enables `console-subscriber` and Tokio tracing. | + +Main consumers include: + +- `magicblock-api`, which calls `link()` during validator construction and holds the `TransactionSchedulerHandle` for service wiring; +- `magicblock-aperture`, which submits/simulates transactions and consumes account/status events for RPC/pubsub; +- `magicblock-processor`, which consumes `ValidatorChannelEndpoints`, executes `SchedulerCommand`s, drains `ExecutionTlsStash`, and emits account/status/replication messages; +- `magicblock-replicator`, which consumes and publishes `link::replication::Message` and uses `wait_for_idle()` during checksum/reset operations; +- `magicblock-ledger`, which replays persisted transactions through `TransactionSchedulerHandle::replay`; +- `magicblock-accounts`, `magicblock-account-cloner`, and `magicblock-task-scheduler`, which submit validator-internal transactions or consume scheduled tasks; +- `programs/magicblock`, which uses `MagicSys`, `CommittedAccount`, `BaseActionCallback`, coordination mode, and `ExecutionTlsStash`; +- `magicblock-committor-service`, which consumes committed-account/action types and callback scheduling traits; +- `magicblock-chainlink`, which uses token/eATA helpers and consolidated logging helpers. + +## Public API shape / Main public types and APIs + +### Root exports + +- `Slot = u64` and `TransactionIndex = u32` are shared position types used by ledger, processor, RPC, and replication. +- `debug_panic!` panics only in debug builds and logs an error in release builds. Use it only for invariant violations where production should remain alive. + +### Channel endpoints + +`link::link()` returns the two sides of the validator channel fabric: + +```text +DispatchEndpoints (RPC/API side) + -> transaction_scheduler: TransactionSchedulerHandle + <- transaction_status: flume Receiver + <- account_update: flume Receiver + <- tasks_service: Option> + <- replication_messages: Option> + +ValidatorChannelEndpoints (processor/internal side) + <- transaction_to_process: mpsc Receiver + -> transaction_status: flume Sender + -> account_update: flume Sender + -> tasks_service: UnboundedSender + -> replication_messages: mpsc Sender + -> pause_permit: Arc +``` + +The transaction, account-update, transaction-status, and replication queues use `LINK_CAPACITY = 16384` where backpressure matters. The scheduled-task channel is unbounded because it is drained by the task scheduler service and carries requests produced from execution TLS. + +### Transaction scheduling APIs + +`link::transactions::TransactionSchedulerHandle` is cloneable and is the public entrypoint for transaction-related dispatch: + +- `schedule(txn)` verifies/sanitizes and queues fire-and-forget execution. +- `execute(txn)` verifies/sanitizes, queues execution, and awaits a one-shot `TransactionResult<()>`. +- `simulate(txn)` verifies/sanitizes, queues simulation, and awaits `TransactionSimulationResult`. +- `replay(position, txn)` verifies/sanitizes and queues replay at a specific slot/index with a `persist` flag. +- `replay_block(block)` queues an ordered replicated block boundary and awaits scheduler acknowledgement. +- `wait_for_idle()` acquires the scheduler pause semaphore and returns an `OwnedSemaphorePermit` that keeps scheduling paused while held. + +`SanitizeableTransaction` is implemented for `SanitizedTransaction`, `VersionedTransaction`, `Transaction`, and `WithEncoded`. Use `with_encoded(txn)` when an internally constructed transaction also needs bincode bytes for replication or downstream reuse. For unsanitized transaction types, `sanitize(true)` verifies signatures; the `false` path uses a unique hash and is intended only for cases that explicitly skip verification. + +### Transaction and replay payloads + +Important types in `link::transactions`: + +- `SchedulerCommand::{Transaction, Block}` keeps replay transactions and block boundaries in one FIFO command stream so a block cannot overtake preceding transactions. +- `TransactionProcessingMode::{Simulation, Execution, Replay}` controls executor behavior and result notification. +- `ReplayPosition { slot, index, persist }` preserves primary ordering during replication and lets startup ledger replay avoid re-recording/broadcasting. +- `TransactionStatus` contains the committed slot, sanitized transaction, metadata, and slot-local index. +- `SchedulerMode::{Primary, Replica}` is sent to the processor scheduler to switch local scheduling behavior. + +### Account event APIs + +`link::accounts::LockedAccount` wraps an account update and protects readers from torn reads when `AccountSharedData` is borrowed from memory that another thread can mutate. Use `read_locked` or `ui_encode`; do not bypass the wrapper by holding borrowed account data across asynchronous or long-running work. + +### Coordination mode + +`coordination_mode::CoordinationMode` is a process-global atomic state: + +- `StartingUp`: ledger replay phase; no validator signer and no side effects. +- `Primary`: validator signer and side-effect scheduling are enabled. +- `Replica`: no validator signer and no side effects. + +Use `CoordinationMode::current()`, `needs_validator_signer()`, `should_schedule_intents()`, and `needs_onchain_interactions()` for gates. Scheduler mode switches call `switch_to_primary_mode()` or `switch_to_replica_mode()` through processor coordination paths; tests may call them directly but must account for global state leakage. + +### Intent/action and trait contracts + +- `intent::CommittedAccount` serializes a committed account with `pubkey`, `Account`, and `remote_slot`. `from_account_shared` can override the owner with a parent program ID and remaps delegated ATAs to eATA form when applicable. +- `intent::BaseActionCallback` is the callback payload used for base-action results. +- `traits::MagicSys` lets the Magic Program fetch current commit nonces without depending on the concrete validator service. +- `traits::LatestBlockProvider` abstracts latest slot/blockhash/clock access for services that should not depend on `magicblock-ledger` directly. +- `traits::ActionsCallbackScheduler` abstracts callback transaction construction/scheduling and returns per-callback signatures or construction errors. + +### Execution TLS + +`tls::ExecutionTlsStash` is a thread-local queue currently used for `TaskRequest`s emitted by Magic Program task scheduling/cancel instructions. The processor clears the stash around execution and drains it after a successful transaction path. Do not use it as a cross-thread channel or persistent store. + +### Token/eATA helpers + +`token_programs` exports program IDs and helpers for legacy SPL Token, Token-2022, ATA, and eATA handling: + +- ATA derivation: `derive_ata`, `derive_ata_with_token_program`, `try_derive_supported_ata_pubkeys`. +- eATA derivation: `derive_eata`, `try_derive_eata_address_and_bump`. +- ATA detection/remapping: `is_ata`, `try_remap_ata_to_eata`. +- eATA projection: `MaybeIntoAta` and `EphemeralAta` conversion/projection helpers. + +## Runtime flows + +### Validator channel construction + +1. `magicblock-api` calls `magicblock_core::link::link()` during validator startup. +2. `link()` creates bounded MPSC queues for transaction commands and replication messages, bounded `flume` queues for account/status events, an unbounded task queue, and a shared `Semaphore(1)` pause permit. +3. The API/RPC side receives `DispatchEndpoints`; the processor side receives `ValidatorChannelEndpoints`. +4. `TransactionSchedulerHandle` clones can be passed to RPC, cloner, accounts, ledger replay, task scheduler, and replication services. + +Do not create parallel ad-hoc channels for the same flows without updating this contract and all consumers; ordering and backpressure expectations are centralized here. + +### Transaction submit/simulate/replay flow + +```text +RPC/service caller + -> TransactionSchedulerHandle::{schedule,execute,simulate,replay} + -> SanitizeableTransaction::sanitize_with_encoded(verify = true) + -> SchedulerCommand::Transaction(ProcessableTransaction) + -> bounded scheduler command channel + -> magicblock-processor scheduler/executor + -> status/account/task/replication outputs +``` + +`execute` and `simulate` allocate one-shot channels and await processor completion; `schedule` and `replay` only wait for queueing. Choose the lowest-overhead method that preserves caller semantics. + +### Replicated transaction and block ordering + +1. The primary emits `link::replication::Message` values from the processor scheduler. +2. Replicas receive transaction payloads and block boundaries from `magicblock-replicator`. +3. Transactions are queued through `TransactionSchedulerHandle::replay` with `ReplayPosition`. +4. Block boundaries are queued through `replay_block` as `SchedulerCommand::Block` in the same FIFO scheduler command channel. +5. The block acknowledgement resolves only after the scheduler applies the block and executors acknowledge the slot transition. + +Preserve this single ordered command channel. Moving blocks to a separate path can let block boundaries overtake transactions and break replica consistency. + +### Scheduler pause / exclusive AccountsDb access + +1. External maintenance code calls `TransactionSchedulerHandle::wait_for_idle()`. +2. The future waits until the processor scheduler has released the shared semaphore because all executors are idle and no pending transactions are being processed. +3. The returned `OwnedSemaphorePermit` pauses scheduling while held. +4. Maintenance performs exclusive work such as checksums, snapshots, resets, or defragmentation. +5. Dropping the permit allows scheduling to resume. + +Never hold the permit across unrelated I/O or long network operations. It blocks transaction processing and is a critical availability/performance lever. + +### Account update read flow + +1. Processor emits `AccountWithSlot { account: LockedAccount, slot }` on the account-update channel. +2. RPC/pubsub code receives the update and calls `LockedAccount::read_locked` or `ui_encode`. +3. The first read is optimistic. +4. If the captured `AccountSeqLock` indicates a concurrent write, `LockedAccount` relocks, reinitializes a fresh account view, and retries until it obtains a consistent read. + +This flow allows low-overhead reads on the fast path while protecting borrowed account data from concurrent mutation races. + +### Magic Program scheduled task flow + +1. A Magic Program task instruction runs during SVM execution. +2. The program code calls `ExecutionTlsStash::register_task(TaskRequest::...)`. +3. The processor clears TLS around execution boundaries and, after execution, drains tasks with `next_task()`. +4. Drained tasks are sent over the `tasks_service` channel to `magicblock-task-scheduler`. + +The TLS stash is per executor thread. It must be cleared on success, simulation, and failure paths to avoid leaking one transaction's side effects into another transaction on the same worker. + +### Commit/action callback flow + +1. Magic Program scheduling code constructs `CommittedAccount` and `BaseActionCallback` values. +2. Validator-side adapters implement `MagicSys` to provide commit nonces. +3. Accounts/committor services use `CommittedAccount` payloads to build and persist commit/undelegation work. +4. Committor executors use `ActionsCallbackScheduler` to schedule callback transactions and report `ActionResult`/`ActionError`. + +Keep these types serializable and stable enough for persistence and cross-crate use. + +## Important internals and caveats + +### Channel capacity and backpressure + +`LINK_CAPACITY` is intentionally bounded for transaction commands, account/status event channels, and replication messages. Raising it can increase memory and latency under overload; lowering it can cause premature backpressure or dropped service throughput. Scheduled tasks are currently unbounded; if that changes, inspect task scheduler behavior and Magic Program task emission semantics. + +### Sanitization and encoded transaction bytes + +`with_encoded` bincode-serializes a transaction before wrapping it. If serialization fails it maps to `TransactionError::SanitizeFailure`. Replication relies on `ProcessableTransaction::encoded` to avoid redundant serialization. Do not silently drop encoded bytes on paths that feed replication unless the downstream code has been updated. + +### Replication wire compatibility + +`link::replication::Message` derives `Serialize`/`Deserialize`; the enum variant order is explicitly part of the wire format. Do not reorder variants or change sentinel indices (`BLOCK_INDEX`, `RESET_INDEX`, `SUPERBLOCK_INDEX`) without a coordinated compatibility plan for primary/replica deployments and persisted/catch-up behavior. + +### Coordination mode is global process state + +`COORDINATION_MODE` is an atomic static. In tests it starts as `Primary`; otherwise it starts as `StartingUp`. Direct test mutations can leak across tests unless serialized or reset. Runtime switches must keep the processor scheduler's local mode and global coordination mode aligned. + +### Token and eATA helpers are protocol-sensitive + +`try_remap_ata_to_eata` only remaps delegated token accounts whose pubkey matches the derived ATA for their owner/mint/token program. `EphemeralAta::try_from_account_data` supports both `EPHEMERAL_ATA_LEN` and `LEGACY_EPHEMERAL_ATA_LEN`. Changes here can affect cloning, post-delegation token transfer tests, commit account payloads, and chainlink blacklisting/ATA projection. + +### Logging initialization is process-global + +`logger::init`/`init_with_config` call `tracing_subscriber::init()` and must be called once at application startup. Tests should use `init_for_tests()`, which uses `try_init()` to tolerate multiple callers. `RUST_LOG_STYLE=EPHEM` or `DEVNET` selects custom formatters; other values use the default formatter. + +## Important invariants + +1. `link::link()` must return paired endpoints connected to the same channels and pause semaphore; dispatch and validator sides must not be mismatched. +2. Scheduler command ordering must keep replay transactions and replicated block boundaries in the same FIFO stream. +3. `TransactionSchedulerHandle::wait_for_idle()` must continue to pause scheduling while the returned permit is held; maintenance that relies on exclusive `AccountsDb` access depends on this. +4. Bounded channels on hot paths must preserve intentional backpressure and avoid unbounded memory growth. +5. `LockedAccount` readers must use sequence-lock checks for borrowed account data; do not expose APIs that encourage unsafely reading borrowed data after concurrent mutation. +6. `CoordinationMode::Primary` is the only mode that should require validator signing and schedule side effects; `StartingUp` and `Replica` must remain side-effect-free. +7. `ExecutionTlsStash` must be cleared between transaction executions on the same worker thread. +8. Replication `Message` variant order and sentinel indices must remain compatible unless the whole replication protocol is versioned/migrated. +9. Committed-account construction must preserve `remote_slot` and intended owner override semantics, including ATA-to-eATA remapping for delegated token accounts. +10. Token/eATA helper changes must preserve support for legacy SPL Token and Token-2022 derivations where current callers expect both. +11. `magicblock-core` should not depend on heavyweight runtime crates such as API, aperture, processor, ledger, accounts-db, chainlink, or committor service. + +## Common change areas and what to inspect + +### Changing transaction submission, simulation, or replay + +Start with: + +- `magicblock-core/src/link/transactions.rs` +- `magicblock-processor/src/scheduler/mod.rs` +- `magicblock-processor/src/executor/processing.rs` +- `magicblock-ledger/src/blockstore_processor/mod.rs` for startup replay +- `magicblock-replicator/src/service/replica.rs` and `magicblock-replicator/src/service/context.rs` +- `magicblock-aperture/src/server/http/dispatch.rs` and transaction request handlers + +Check result notification semantics, signature verification, encoded-byte propagation, backpressure, and ordering of `SchedulerCommand::Block` relative to replayed transactions. + +### Changing endpoint topology or event channels + +Start with: + +- `magicblock-core/src/link.rs` +- `magicblock-api/src/magic_validator.rs` +- `magicblock-aperture/src/processor.rs` and subscription state +- `magicblock-processor/src/scheduler/state.rs` and executor output paths +- `magicblock-task-scheduler/src/service.rs` + +Ensure every sender/receiver is wired exactly once, optional receivers are moved intentionally, and shutdown behavior remains clear. + +### Changing scheduler pause or maintenance coordination + +Start with: + +- `magicblock-core/src/link/transactions.rs::wait_for_idle` +- `magicblock-core/src/link.rs` pause semaphore creation +- `magicblock-processor/src/scheduler/mod.rs` and `scheduler/coordinator.rs` +- `magicblock-processor/tests/scheduling.rs::test_wait_for_idle_coordination` +- `magicblock-replicator/src/service/context.rs` checksum/reset flows + +Verify that exclusive `AccountsDb` operations cannot race executor writes and that the permit is not held longer than necessary. + +### Changing account update/read behavior + +Start with: + +- `magicblock-core/src/link/accounts.rs` +- `magicblock-aperture/src/utils.rs` +- `magicblock-aperture/src/requests/http/mod.rs` +- `magicblock-processor/src/executor/processing.rs` + +Preserve fast-path low overhead and slow-path correctness for borrowed account data. Avoid cloning large accounts unless the caller explicitly needs ownership. + +### Changing coordination mode or primary/replica behavior + +Start with: + +- `magicblock-core/src/coordination_mode.rs` +- `magicblock-processor/src/scheduler/coordinator.rs` +- `magicblock-api/src/magic_validator.rs` startup/mode switching +- `programs/magicblock/src/schedule_transactions/process_scheduled_commit_sent.rs` +- `magicblock-aperture/tests/transaction_primary_mode.rs` +- `magicblock-task-scheduler/src/service.rs` tests that call `switch_to_primary_mode()` + +Ensure scheduler-local mode, global coordination mode, validator signer requirements, and side-effect gates stay aligned. + +### Changing Magic Program side effects or task scheduling + +Start with: + +- `magicblock-core/src/tls.rs` +- `programs/magicblock/src/schedule_task/` +- `magicblock-processor/src/executor/processing.rs` +- `magicblock-task-scheduler/src/service.rs` + +If more side-effect types are added to TLS, document when they are registered, drained, cleared, persisted, and retried. + +### Changing commit/action payloads or traits + +Start with: + +- `magicblock-core/src/intent.rs` +- `magicblock-core/src/traits.rs` +- `programs/magicblock/src/magic_sys.rs` +- `programs/magicblock/src/magic_scheduled_base_intent.rs` +- `magicblock-accounts/src/scheduled_commits_processor.rs` +- `magicblock-committor-service/src/` + +Check serialization, persistence, commit nonce behavior, callback error mapping, and base-layer settlement compatibility. + +### Changing token/eATA behavior + +Start with: + +- `magicblock-core/src/token_programs.rs` +- `magicblock-chainlink/src/chainlink/fetch_cloner/ata_projection.rs` +- `magicblock-chainlink/src/chainlink/fetch_cloner/delegation.rs` +- `magicblock-chainlink/src/testing/eatas.rs` +- `programs/magicblock/src/schedule_transactions/process_schedule_commit_tests.rs` +- `test-integration/test-cloning/tests/10_post_delegation_token_transfer.rs` + +Verify legacy token and Token-2022 behavior, eATA account length compatibility, rent assumptions, and delegated-account checks. + +### Changing logging + +Start with: + +- `magicblock-core/src/logger/mod.rs` +- `magicblock-core/src/logger/consolidate.rs` +- `magicblock-validator/src/main.rs` +- chainlink/AML tests that call `logger::init_for_tests()` + +Avoid adding high-cardinality or noisy logs to hot loops. Keep test logging idempotent. + +## Tests and validation + +For documentation-only changes to this guide: + +```bash +git diff --check -- .agents/context/crates/magicblock-core.md .agents/context/crate-map.md AGENTS.md +``` + +For code changes in `magicblock-core`, run at minimum: + +```bash +cargo fmt +cargo nextest run -p magicblock-core +``` + +Because `magicblock-core` has no dedicated crate-local test suite at the time of writing, also run targeted consumer tests for the subsystem touched: + +```bash +# Transaction scheduling, replay, pause semantics +cargo nextest run -p magicblock-processor scheduling replay simulation + +# RPC transaction mode/LockedAccount consumers +cargo nextest run -p magicblock-aperture + +# Replication message/order or checksum/reset behavior +cargo nextest run -p magicblock-replicator + +# Magic Program TLS, commits, callbacks, or token/eATA helpers +cargo nextest run -p magicblock-program +cargo nextest run -p magicblock-chainlink +cargo nextest run -p magicblock-committor-service +cargo nextest run -p magicblock-task-scheduler +``` + +Then use the workspace baseline from `.agents/rules/testing-and-validation.md` when time allows: + +```bash +cargo fmt +cargo clippy --workspace --all-targets -- -D warnings +cargo nextest run --workspace +``` + +Integration/manual validation depends on the touched flow: + +- transaction/RPC behavior: `cd test-integration && make test-magicblock-api`; +- replication/replay behavior: run the relevant replicator/restore-ledger suites or targeted processor replay tests; +- cloning/token/eATA behavior: `cd test-integration && make test-cloning`; +- scheduled intents/commits/actions: `cd test-integration && make test-schedule-intents` and relevant committor suites; +- task scheduling: `cd test-integration && make test-task-scheduler`. + +If a change touches transaction dispatch, account updates, replication, or scheduler pause behavior, report whether performance risk was measured. At minimum, reason about queue backpressure, extra allocations/serialization, lock contention, and whether the change adds work to RPC/scheduler/executor hot paths. + +## Related docs + +- `AGENTS.md` for required agent workflow and documentation stewardship rules. +- `.agents/context/overview.md` for validator concepts and runtime model. +- `.agents/specs/validator-specification.md` for execution, scheduler, commit, undelegation, Magic Actions, ephemeral accounts, RPC/router, and recovery behavior. +- `.agents/context/architecture.md` for cross-crate boundaries and hot-path architecture. +- `.agents/context/crate-map.md` for workspace crate ownership and consumers. +- `.agents/rules/testing-and-validation.md` for baseline validation commands and integration suites. +- `docs/architecture.md` section `magicblock-core — the wiring loom` for additional architecture context. +- Consumer-specific guides such as `.agents/context/crates/magicblock-api.md`, `.agents/context/crates/magicblock-aperture.md`, `.agents/context/crates/magicblock-account-cloner.md`, `.agents/context/crates/magicblock-accounts.md`, `.agents/context/crates/magicblock-chainlink.md`, and `.agents/context/crates/magicblock-config.md`. diff --git a/.agents/context/crates/magicblock-ledger.md b/.agents/context/crates/magicblock-ledger.md new file mode 100644 index 000000000..0401342cf --- /dev/null +++ b/.agents/context/crates/magicblock-ledger.md @@ -0,0 +1,291 @@ +# `magicblock-ledger` + +## Purpose + +`magicblock-ledger` is the validator's local persistent ledger/history store. It wraps RocksDB column families behind typed ledger columns and provides the historical data used by RPC, replay/recovery, replication, task scheduling, tests, and operator tooling. + +High-level responsibilities: + +- persist block metadata (`blocktime`, `blockhash`, latest-block cache) and confirmed transaction data; +- index transactions by signature, slot/index, and account address for RPC history queries; +- expose efficient latest-block reads/subscriptions without requiring RocksDB reads on hot paths; +- replay persisted successful transactions during validator startup when AccountsDb lags the ledger; +- truncate old ledger data with range tombstones and RocksDB compaction when the configured ledger size is reached; +- report RocksDB column-family and ledger operation metrics. + +This crate sits on storage, RPC-history, startup/recovery, replication, and execution-persistence paths. Changes can affect RPC latency, ledger replay correctness, duplicate transaction protection, persistent status visibility, disk growth, shutdown flushing, and recovery after restart. Keep RocksDB reads/writes bounded and avoid adding blocking work to transaction execution or RPC hot paths. + +## Update requirement + +Update this guide in the same change whenever behavior or contracts in `magicblock-ledger` change. In particular, update it for changes to: + +- `Ledger`, `LatestBlock`, `LatestBlockInner`, `SignatureInfosForAddress`, or exported error/API types; +- RocksDB column families, key encodings, serialization formats, protobuf/bincode compatibility, or column options; +- transaction write/read semantics, address-signature pagination, block assembly, status counting, or signature verification; +- ledger replay behavior in `blockstore_processor`, including replay ordering, slot/blockhash handling, or persisted transaction filtering; +- truncation thresholds, cleanup-slot locking, compaction filters, entry counters, flush/shutdown behavior, or cancellation semantics; +- metrics names/labels, RocksDB perf sampling, validation commands, or performance characteristics; +- consumers in RPC, processor, API startup, replication, task scheduling, or test tooling that alter this crate's assumptions. + +## Where it sits in the repository + +| Path | Role | +|---|---| +| `magicblock-ledger/Cargo.toml` | Package metadata and dependencies on `magicblock-core`, `magicblock-metrics`, `solana-storage-proto`, RocksDB, and Solana transaction/status crates. | +| `magicblock-ledger/README.md` | Short overview of the RocksDB-backed ledger, `Ledger`, `Database`, and `LedgerColumn`. | +| `src/lib.rs` | Public crate surface. Re-exports `Ledger`, `SignatureInfosForAddress`, `PerfSample`, and `BLOCKSTORE_DIRECTORY_ROCKS_LEVEL`; defines `LatestBlock` and `LatestBlockInner`. | +| `src/store/api.rs` | Main `Ledger` implementation: open/init, block metadata, transaction/status/memo reads and writes, address-signature queries, perf samples, flush, shutdown, and latest-block access. | +| `src/database/columns.rs` | Column-family names, typed key encodings, deprecated-key compatibility, slot extraction, and column traits. | +| `src/database/db.rs` | `Database` wrapper around RocksDB, typed column construction, batches, range deletes, compaction, storage size, and oldest-slot propagation. | +| `src/database/ledger_column.rs` | Typed column API for bincode/protobuf/raw bytes, iterators, multi-get, RocksDB properties, metrics hooks, and cached entry counters. | +| `src/database/options.rs` | `LedgerOptions`, `LedgerColumnOptions`, storage directory constant, compression options, and currently supported primary access mode. | +| `src/database/rocks_db.rs`, `rocksdb_options.rs`, `cf_descriptors.rs`, `compaction_filter.rs` | Low-level RocksDB open/options/column descriptor setup and purged-slot compaction filtering. | +| `src/blockstore_processor/mod.rs` | Startup replay of persisted blocks/transactions through `TransactionSchedulerHandle`. | +| `src/ledger_truncator.rs` | Background size-based truncation service, range deletion, compaction, cancellation, and truncator lifecycle. | +| `src/metrics.rs` | RocksDB column-family and perf datapoints plus ledger RPC counters. | +| `tests/` and `src/store/api.rs` unit tests | Coverage for block assembly, address-signature pagination, transaction/status reads, counts, truncation, and compatibility behavior. | +| `magicblock-api/src/ledger.rs` | Opens/resets the ledger and manages validator-keypair files relative to the RocksDB directory. | +| `magicblock-api/src/magic_validator.rs` | Starts `LedgerTruncator`, initializes metrics tickers, and replays the ledger on startup when needed. | +| `magicblock-processor/src/executor/processing.rs` | Persists transaction status/metadata into the ledger after execution and broadcasts transaction notifications. | +| `magicblock-aperture/src/requests/http/` | Serves `getTransaction`, `getSignatureStatuses`, and `getSignaturesForAddress` from the ledger. | +| `magicblock-aperture/src/state/` | Seeds RPC blockhash cache from `ledger.latest_block()`. | +| `magicblock-task-scheduler/src/service.rs` | Uses `LatestBlock` as the source of recent blockhashes for scheduled task transactions. | +| `test-kit/src/lib.rs`, `tools/ledger-stats/` | Test harness and manual/operator inspection consumers. | + +Main consumers: + +- `magicblock-api` for startup, replay, truncator lifecycle, metrics, and shutdown; +- `magicblock-processor` for write-side transaction persistence on the execution path; +- `magicblock-aperture` for RPC-history and blockhash/status reads; +- `magicblock-replicator` and `tools/ledger-stats` for persisted history inspection/replay support; +- `magicblock-task-scheduler`, `magicblock-account-cloner`, and test support through `LatestBlock`. + +## Public API shape / Main public types and APIs + +### Crate exports + +`src/lib.rs` exposes: + +- `Ledger` and `SignatureInfosForAddress` from `store::api`; +- `LatestBlock` and `LatestBlockInner` for lock-free current block metadata; +- `blockstore_processor`, `errors`, and `ledger_truncator` modules; +- `PerfSample` and `BLOCKSTORE_DIRECTORY_ROCKS_LEVEL` for consumers that need perf samples or the RocksDB subdirectory name. + +The internal `database` and `metrics` modules are not public. Avoid making lower-level RocksDB internals public unless a consumer truly needs a new boundary; most runtime callers should go through `Ledger` or `LatestBlock`. + +### `Ledger` + +Important constructors and accessors: + +- `Ledger::open(path)` and `Ledger::open_with_options(path, LedgerOptions)` create `path/rocksdb`, adjust `ulimit -n` when enabled, open RocksDB, initialize `LatestBlock` from the highest stored blockhash, and initialize the lowest cleanup slot. +- `ledger_path()`, `banking_trace_path()`, `storage_size()`, `db(self)`, `latest_block()`, and `latest_blockhash()` expose operational handles. +- `flush()`, `shutdown(wait)`, and `cancel_manual_compactions()` are used during shutdown and maintenance. + +Primary data APIs: + +- Block data: `write_block(LatestBlockInner)`, `get_block(slot)`, `get_max_blockhash()`, `get_lowest_slot()`, blockhash/blocktime counters, and recent perf samples. +- Transactions: `write_transaction(...)`, `read_transaction((signature, slot))`, `get_complete_transaction(signature, highest_confirmed_slot)`, `get_transaction_status(signature, min_slot)`, `read_transaction_status`, `verify_transaction_signature`, and memo read/write helpers. +- Indexes: `get_confirmed_signatures_for_address(pubkey, highest_slot, before, until, limit)`, `read_slot_signature((slot, index))`, `get_highest_transaction_index_for_slot`, and `get_latest_transaction_position`. +- Maintenance: `set_lowest_cleanup_slot`, `delete_range_cf`, `compact_slot_range_cf`, `submit_rocksdb_cf_metrics_for_all_cfs`, and column `HasColumn` access for controlled internal maintenance. + +`write_transaction` writes status/indexes before raw transaction bytes. It expects the caller to provide the signature, slot, transaction index, writable/readonly account keys, bincode-serialized `VersionedTransaction`, and `TransactionStatusMeta` produced by execution. + +### `LatestBlock` + +`LatestBlock` is a cheap, cloneable, single-writer/multi-reader block metadata handle. It stores `LatestBlockInner { slot, blockhash, clock }` in `arc-swap` and broadcasts updates on `store`. + +- `load()` is a lock-free read used by RPC, task scheduler, processor, account cloner, and tests. +- `store(block)` atomically swaps the snapshot and notifies subscribers. +- `subscribe()` returns a `tokio::sync::broadcast::Receiver` for block updates. +- It implements `magicblock_core::traits::LatestBlockProvider` for generic consumers. + +`LatestBlockInner::new(slot, blockhash, timestamp)` sets `clock.slot` to `slot + 1`, while `LatestBlockInner.slot` remains the block's slot. Preserve that distinction because RPC/sysvar consumers may use the `Clock` value differently from the block height. + +### Column and storage model + +The ledger uses RocksDB column families with typed key encodings: + +| Column | Key | Value | Notes | +|---|---|---|---| +| `TransactionStatus` | `(Signature, Slot)` | protobuf `TransactionStatusMeta` | Key has deprecated-index compatibility. | +| `AddressSignatures` | `(Pubkey, Slot, u32, Signature)` | `AddressSignatureMeta` | Primary index for `getSignaturesForAddress`; not slot-keyed. | +| `SlotSignatures` | `(Slot, u32)` | `Signature` | Allows slot-order iteration and before/until pagination. | +| `Blocktime` | `Slot` | `UnixTimestamp` | Slot column. | +| `Blockhash` | `Slot` | `Hash` | Slot column and source for max/latest block. | +| `Transaction` | `(Signature, Slot)` | bincode `VersionedTransaction` bytes | Same key as `TransactionStatus`. | +| `TransactionMemos` | `(Signature, Slot)` | `String` | Deprecated signature-only key compatibility. | +| `PerfSamples` | `Slot` | bincode `PerfSample` | Slot column. | + +When adding or changing a column, update column traits, RocksDB descriptors/options, purge/truncation/compaction handling, counts, tests, and any operator tooling. Key order is part of RPC pagination and compatibility; do not reorder tuple fields casually. + +## Runtime flows + +### Transaction persistence from execution + +```text +processor executor + -> builds TransactionStatusMeta and account lock lists + -> Ledger::write_transaction(signature, slot, index, writable, readonly, encoded_tx, meta) + -> write_transaction_status writes AddressSignatures, SlotSignatures, TransactionStatus + -> raw bincode VersionedTransaction bytes are written to Transaction + -> processor broadcasts TransactionStatus to subscribers +``` + +Preserve the account-key indexing: `AddressSignatures` is what makes account history RPC work, and `SlotSignatures` is what lets pagination locate a before/until signature's transaction index even when that transaction did not include the queried address. + +### RPC history reads + +`magicblock-aperture` uses the ledger as the persistent fallback/source of truth for historical RPC methods: + +1. `getSignatureStatuses` checks the hot transaction cache first, then calls `get_transaction_status(signature, Slot::MAX)`. +2. `getTransaction` calls `get_complete_transaction(signature, u64::MAX)` and encodes the returned Solana transaction/status type. +3. `getSignaturesForAddress` clamps the limit to 1,000, then calls `get_confirmed_signatures_for_address` with optional `before` and `until` signatures. +4. `getBlock`-style reads call `get_block(slot)`, which loads block metadata, reversely iterates slot signatures, and combines transactions with status metadata. + +RPC reads hold `lowest_cleanup_slot` read locks while accessing cleanup-sensitive ranges. Do not remove those guards: they prevent cleanup/compaction from racing reads into inconsistent results. + +### Latest block and blockhash flow + +```text +slot/block producer or replay + -> Ledger::write_block(LatestBlockInner) + -> writes Blocktime and Blockhash columns + -> LatestBlock::store updates lock-free snapshot and broadcasts + -> RPC BlocksCache / task scheduler / processor / other consumers read latest block cheaply +``` + +`Ledger::open` seeds `LatestBlock` from `get_max_blockhash()` and the stored block time, defaulting to slot/hash zero when no blocks exist. + +### Startup replay + +`magicblock-api` calls `blockstore_processor::process_ledger` when the ledger's latest block is newer than AccountsDb. Replay starts at `full_process_starting_slot.saturating_sub(max_age)` so recent blockhashes are restored before replaying transactions. For slots before `full_process_starting_slot`, replay updates only latest block data; from that slot onward, only successful transactions are sanitized without signature verification and replayed through `TransactionSchedulerHandle::replay` with `ReplayPosition { persist: false }`. + +Do not persist replayed transactions again from this path, and preserve chronological replay: `get_block` returns transactions newest-first, so replay reverses them to execute in original order. + +### Truncation and compaction + +`LedgerTruncator` is a background service started by `magicblock-api` with `DEFAULT_TRUNCATION_TIME_INTERVAL` and configured ledger size. + +1. A dedicated thread runs a current-thread Tokio runtime and ticks on the truncation interval. +2. It checks `ledger.storage_size()` and skips work until the ledger is near the configured limit. +3. It estimates a safe slot range or handles an overfull ledger with `truncate_fat_ledger`. +4. It advances `lowest_cleanup_slot` / RocksDB `oldest_slot`, inserts range tombstones for slot-keyed columns, marks imprecise counters dirty, flushes tombstones, and compacts affected column families. +5. Compaction filters remove keys whose extracted slot is older than `oldest_slot`. +6. Cancellation is checked before and between manual compactions; `stop()` cancels and `join()` waits for thread exit. + +Range deletion is inclusive at the ledger API level even though RocksDB range deletes are end-exclusive. Preserve the existing `to + 1` handling and the special `(to_slot + 1, 0)` upper bound for `SlotSignatures`. + +## Important internals and caveats + +### Cleanup-slot locking + +`check_lowest_cleanup_slot(slot)` rejects reads at or below cleaned slots and returns a read guard that callers must hold across the sensitive read. `ensure_lowest_cleanup_slot()` returns the guard plus the first available slot for iterator bounds. These locks coordinate logical cleanup with RPC/history reads; weakening them can produce inconsistent user-visible history or panics during compaction. + +### Serialization compatibility + +- Typed columns use bincode through `serde`. +- `TransactionStatus` uses protobuf (`solana-storage-proto`) for stored metadata. +- Some columns implement `ColumnIndexDeprecation` to decode old key layouts; `iter_current_index_filtered` intentionally excludes deprecated keys for current-index iteration. +- `get_protobuf_or_bincode` exists for compatibility paths even though normal status reads use protobuf. + +Preserve compatibility for existing ledger directories unless the change explicitly includes a migration/reset plan. + +### Counts and metrics + +`LedgerColumn` caches entry counts with `DIRTY_COUNT = -1`. Sequential slot columns count via first/last slot; complex columns use RocksDB's estimated key count. Truncation decrements exact sequential counters and marks imprecise counters dirty. If a write/delete path changes a column, update the counter or intentionally mark it dirty. + +`maybe_enable_rocksdb_perf` currently returns `None`, so perf-context sampling is disabled even though reporting helpers exist. Column-family metrics are reported through `blockstore_rocksdb_cfs` with low-cardinality labels (`cf_name`, `storage`, `compression`). Do not add high-cardinality labels. + +### Open/access modes and directory layout + +`Ledger::open(path)` opens the underlying RocksDB database at `path/rocksdb`, matching `BLOCKSTORE_DIRECTORY_ROCKS_LEVEL`. `magicblock-api/src/ledger.rs` reset/lock/keypair helpers depend on this layout. `AccessType` defines primary, primary-maintenance, and secondary variants, but `Rocks::open` currently supports only `Primary`; other variants are unreachable. + +### Transaction order conventions + +`get_block(slot)` iterates `SlotSignatures` in reverse, so returned block transactions are newest-to-oldest by transaction index. Startup replay reverses them before execution. Tests assert this current API behavior; changing it affects RPC and replay consumers. + +## Important invariants + +1. **Do not lose committed execution history.** `write_transaction` must keep transaction bytes, status metadata, slot signatures, and address-signature indexes consistent for the same `(signature, slot, index)`. +2. **Keep latest-block reads cheap.** Hot-path consumers must continue using `LatestBlock` instead of RocksDB reads for current blockhash/slot/clock. +3. **Preserve cleanup/read coordination.** Reads over cleanup-sensitive columns must hold the lowest-cleanup read guard or use the provided helpers. +4. **Preserve key ordering.** Column key encodings must maintain the sort order required by reverse slot iteration and account-signature pagination. +5. **Preserve on-disk compatibility.** Deprecated key decoders and protobuf/bincode choices must not be removed without an explicit migration strategy. +6. **Do not replay failed transactions.** Ledger replay must only re-run successful transactions and must use `persist: false`. +7. **Do not block execution/RPC unnecessarily.** Avoid long compactions, full-column scans, excessive serialization, or unbounded iterator work on hot request or transaction paths. +8. **Shutdown must protect durability.** Flush and RocksDB background-work cancellation behavior must stay compatible with validator shutdown ordering. +9. **Metrics labels must stay bounded.** RocksDB/ledger datapoints should use fixed column/operation labels, not pubkeys, signatures, paths, or other high-cardinality values. + +## Common change areas and what to inspect + +### Adding or changing historical RPC data + +Start with `src/store/api.rs`, `src/database/columns.rs`, and the aperture handler using the data under `magicblock-aperture/src/requests/http/`. Check unit tests in `src/store/api.rs` and `magicblock-ledger/tests/get_block.rs`. Preserve `Slot::MAX` / highest-slot semantics and `getSignaturesForAddress` pagination behavior. + +### Changing transaction persistence + +Inspect `magicblock-processor/src/executor/processing.rs`, `Ledger::write_transaction`, `write_transaction_status`, `read_transaction`, `get_complete_transaction`, and replay tests in `magicblock-processor/tests/replay.rs`. Ensure the encoded transaction bytes and metadata remain sufficient for RPC encoding and startup replay. + +### Changing latest block or blockhash behavior + +Inspect `LatestBlock`, `Ledger::write_block`, `Ledger::open`, `magicblock-aperture/src/state/blocks.rs`, `magicblock-task-scheduler/src/service.rs`, and `magicblock-processor` consumers. Confirm that blockhash validity/cache behavior and `Clock` semantics remain compatible. + +### Changing replay/recovery + +Inspect `src/blockstore_processor/mod.rs` and `magicblock-api/src/magic_validator.rs::maybe_process_ledger`. Validate with ledger restore/replay tests where possible. Preserve replay ordering, blockhash warm-up, no-signature-verify sanitization for restored transactions, and `persist: false`. + +### Changing truncation, compaction, or storage size behavior + +Inspect `src/ledger_truncator.rs`, `src/database/compaction_filter.rs`, `src/database/columns.rs`, `src/database/db.rs`, and `tests/test_ledger_truncator.rs`. Check entry-counter updates, range bounds, cancellation, flushing before compaction, and whether a column is slot-keyed or only slot-extractable. + +### Changing RocksDB options or column families + +Inspect `src/database/options.rs`, `src/database/rocksdb_options.rs`, `src/database/cf_descriptors.rs`, `src/database/rocks_db.rs`, `src/database/columns.rs`, and any tooling that opens ledgers. Update reset/path logic if the RocksDB directory name changes. + +## Tests and validation + +For documentation-only changes, verify the new guide path and cross-references in `AGENTS.md` and `.agents/context/crate-map.md`. + +For code changes in this crate, start with targeted checks: + +```bash +cargo fmt +cargo test -p magicblock-ledger +``` + +Prefer `cargo nextest run -p magicblock-ledger` when available for crate tests. Add focused tests for the touched behavior, especially address-signature pagination, block assembly, replay ordering, truncation, or serialization compatibility. + +Broader validation from `.agents/rules/testing-and-validation.md` before handoff when practical: + +```bash +cargo clippy --workspace --all-targets -- -D warnings +cargo nextest run --workspace +``` + +Relevant integration/manual checks: + +```bash +cd test-integration +make test-restore-ledger +make test-pubsub +make test-magicblock-api +``` + +Use the restore-ledger suite for replay/recovery changes, pubsub/API suites for RPC-visible block/status behavior, and manual/operator tooling checks for format/path changes. If changes touch truncation or RocksDB options, include storage-size/truncator tests and report any unmeasured disk-growth or compaction-latency risk. + +Performance validation is important for execution writes, RPC history reads, startup replay, and truncation/compaction. If no benchmark or load-oriented check is run, explicitly report the residual risk and reason. + +Security validation for this crate is mostly about persistence correctness and attacker-triggerable resource use: untrusted RPC history queries and submitted transactions must not cause unbounded scans, unbounded memory growth, hangs, or stale/inconsistent status responses. Confirm that signer/authority checks remain owned by execution/Magic Program layers; this crate must not invent an alternate acceptance path for transactions. + +## Related docs + +- `.agents/context/overview.md` for validator runtime context. +- `.agents/rules/validator-goals.md` for security, persistence, recovery, and performance goals. +- `.agents/specs/validator-specification.md` for scheduler/replay, RPC, startup, shutdown, and recovery expectations. +- `.agents/context/architecture.md` for local persistence and transaction execution interactions. +- `.agents/context/crate-map.md` for crate ownership and consumers. +- `.agents/rules/testing-and-validation.md` for required validation workflow and integration commands. +- `magicblock-ledger/README.md` for a short crate overview. +- `magicblock-api/src/ledger.rs` and `magicblock-api/src/magic_validator.rs` for startup/reset/replay/truncator integration. +- `magicblock-processor/src/executor/processing.rs` for the main write-side call site. +- `magicblock-aperture/src/requests/http/` for RPC-history consumers. diff --git a/.agents/context/crates/magicblock-magic-program-api.md b/.agents/context/crates/magicblock-magic-program-api.md new file mode 100644 index 000000000..2f09e5d7b --- /dev/null +++ b/.agents/context/crates/magicblock-magic-program-api.md @@ -0,0 +1,389 @@ +# `magicblock-magic-program-api` + +## Purpose + +`magicblock-magic-program-api` is the shared wire-contract crate for the Magic Program and its built-in companion programs. It defines the program IDs, fixed accounts, PDA helpers, instruction enums, scheduling argument structs, clone metadata, task request payloads, callback response payloads, and Solana-type compatibility exports used by the validator and test programs. + +This crate is small, but it is protocol- and compatibility-sensitive. Its types are serialized with `bincode` into transactions, persisted scheduled intents, executor TLS payloads, and callback instructions. Changes can affect Magic Program CPI callers, account cloning, task scheduling, callback delivery, committor intent construction, account reset/blacklist behavior, and integration tests. Treat enum variant order, field order, constants, and feature-gated public type aliases as wire/API contracts. + +High-level responsibilities: + +- expose the Magic Program ID (`Magic111...`) plus fixed companion IDs/accounts such as `MAGIC_CONTEXT_PUBKEY`, `EPHEMERAL_VAULT_PUBKEY`, `CRANK_PROGRAM_ID`, `CALLBACK_PROGRAM_ID`, and `POST_DELEGATION_ACTION_EXECUTOR_PROGRAM_ID`; +- define `MagicBlockInstruction`, `CallbackInstruction`, and `PostDelegationActionExecutorInstruction` payloads consumed by `programs/magicblock` and validator-built internal transactions; +- define commit/action/undelegation bundle argument types used by application programs and Magic Program scheduling processors; +- define task scheduling/cancel request payloads sent from Magic Program execution into `magicblock-core` TLS and the task scheduler; +- define callback response types (`MagicResponse`, `MagicResponseV1`, `ActionReceipt`) delivered to base-layer callback programs; +- provide a `compat` boundary so public API users can opt into Solana 2.x-compatible types through the `backward-compat` feature while the default uses workspace Solana 3.x types. + +Do not put Magic Program execution logic, validation policy, persistence, RPC behavior, or committor delivery logic in this crate. Those belong in `programs/magicblock`, `magicblock-core`, `magicblock-accounts`, and the committor/service crates. + +## Update requirement + +Update this guide in the same change whenever behavior or contracts in `magicblock-magic-program-api` change. In particular, update it for changes to: + +- any public constant, program ID, fixed account, PDA seed, PDA derivation, or rent/context-size constant; +- `MagicBlockInstruction`, `CallbackInstruction`, `PostDelegationActionExecutorInstruction`, `AccountCloneFields`, account-modification types, or serialization behavior; +- commit, action, undelegation, intent-bundle, callback, task, or `ShortAccountMeta` argument structs/enums; +- the `backward-compat` feature or `compat` module public type aliases; +- callback response payload shape or receipt semantics; +- account metas expected by Magic Program processors or helper builders in `programs/magicblock/src/utils/instruction_utils.rs`; +- validation commands or integration suites that should be run after API changes. + +Because this crate defines shared wire formats, also update this document when another crate changes how these API types are interpreted. + +## Where it sits in the repository + +| Path | Role | +|---|---| +| `magicblock-magic-program-api/Cargo.toml` | Package metadata, minimal dependencies, and `backward-compat` feature wiring for Solana 2.x-compatible public types. | +| `magicblock-magic-program-api/src/lib.rs` | Public module surface, `declare_id!`, fixed program/account IDs, `MAGIC_CONTEXT_SIZE`, and `EPHEMERAL_RENT_PER_BYTE`. | +| `magicblock-magic-program-api/src/compat.rs` | Public Solana type boundary. Default exports workspace Solana 3.x `Pubkey`, `Instruction`, `AccountMeta`, `AccountInfo`, `Signature`; `backward-compat` exports corresponding 2.x-compatible crates. | +| `magicblock-magic-program-api/src/instruction.rs` | Magic Program and companion built-in instruction enums plus account modification and clone metadata payloads. This is the most wire-sensitive file. | +| `magicblock-magic-program-api/src/args.rs` | Serializable argument structs/enums for base actions, commits, undelegation, intent bundles, action callbacks, and scheduled tasks. | +| `magicblock-magic-program-api/src/pda.rs` | PDA seeds/helpers for crank and callback execution signers. `CALLBACK_SIGNER` and bump are compile-time derived. | +| `magicblock-magic-program-api/src/response.rs` | Serializable callback response payloads sent by `magicblock-services` and decoded by callback programs. | +| `programs/magicblock/src/magicblock_processor.rs` | Deserializes and dispatches `MagicBlockInstruction` and `CallbackInstruction` values. Keep this aligned with instruction variants. | +| `programs/magicblock/src/utils/instruction_utils.rs` | Validator/program helper builders for many `MagicBlockInstruction` variants and account metas. | +| `programs/magicblock/src/magic_scheduled_base_intent.rs` | Interprets `MagicIntentBundleArgs`, commit/action args, `ShortAccountMeta`, fees, duplicate checks, and secure/legacy action behavior. | +| `magicblock-services/src/actions_callback_service.rs` | Builds `CallbackInstruction::ExecuteCallback` and bincode-encoded `MagicResponse` values. | +| `magicblock-account-cloner/src/lib.rs` | Builds clone, cleanup, post-delegation action, program-finalize, and task-related Magic Program instructions using this API. | +| `magicblock-processor/tests/ephemeral_accounts.rs` and `magicblock-processor/tests/post_delegation_actions.rs` | Focused processor coverage for API-driven ephemeral account and post-delegation-action behavior. | +| `test-integration/schedulecommit/` and `test-integration/test-schedule-intent/` | Integration coverage for commit scheduling, intent bundles, commit limits, security, undelegation, actions, and callbacks. | + +Main consumers include: + +- `programs/magicblock`, the runtime implementation that deserializes and executes these instructions; +- `magicblock-core`, which stores task requests and callback/action payload dependencies; +- `magicblock-account-cloner`, `magicblock-chainlink`, `magicblock-accounts-db`, `magicblock-api`, `magicblock-processor`, and `magicblock-services`; +- workspace/test programs such as `programs/guinea`, `test-integration/programs/schedulecommit`, and `test-integration/programs/flexi-counter`; +- integration suites under `test-integration/schedulecommit` and `test-integration/test-schedule-intent`. + +## Public API shape / Main public types and APIs + +### Root exports and constants + +`src/lib.rs` publicly exports `args`, `compat`, `instruction`, `pda`, and `response`, then re-exports `compat::{declare_id, pubkey, Pubkey}`. The crate declares the Magic Program ID and fixed companion accounts/programs: + +- `ID` / `id()` from `declare_id!("Magic11111111111111111111111111111111111111")`; +- `CRANK_PROGRAM_ID` for task/crank execution; +- `CALLBACK_PROGRAM_ID` for callback executor built-in instructions; +- `POST_DELEGATION_ACTION_EXECUTOR_PROGRAM_ID` for same-transaction post-delegation action execution after cloning; +- `MAGIC_CONTEXT_PUBKEY`, the fixed account that stores scheduled intents; +- `EPHEMERAL_VAULT_PUBKEY`, the fixed vault account for ephemeral-account rent transfers; +- `MAGIC_CONTEXT_SIZE = 5 MiB`; +- `EPHEMERAL_RENT_PER_BYTE = 32` lamports/byte. + +These values are referenced by startup funding/reset logic, chainlink clone blacklists, Magic Program processors, tests, and account builders. Renaming or deriving them differently is a protocol change. + +### Compatibility boundary + +`compat.rs` exports public Solana types. With default features it uses workspace Solana 3.x crates. With `--features backward-compat`, it uses optional `solana-program`/`solana-signature` compatibility dependencies in the `>=2.0, <3` range. + +Use `crate::compat::{Instruction, AccountMeta, AccountInfo, Signature}` and `crate::Pubkey` in public API structs. Do not mix raw Solana versions in public fields unless the compatibility contract is intentionally changed. + +### Instruction enums + +`instruction.rs` defines: + +- `MagicBlockInstruction`: the primary bincode-serialized instruction enum for the Magic Program. Variants cover account modification, legacy and bundled commit scheduling, task scheduling/canceling, executable-check toggles, no-op uniqueness, ephemeral account create/resize/close, clone/chunk/cleanup, program finalization, callback attachment, account eviction, and crank execution. +- `CallbackInstruction`: instruction enum for the callback executor built-in. Current variant is `ExecuteCallback { instruction }`. +- `PostDelegationActionExecutorInstruction`: instruction enum for the post-delegation action executor built-in. Current variant is `Execute { cloned_account_pubkey, actions }`. +- `AccountCloneFields`: clone metadata (`lamports`, `owner`, `executable`, `delegated`, `confined`, `remote_slot`) that must preserve remote/local account semantics across clone instructions. +- `AccountModification` and `AccountModificationForInstruction`: validator-only account flag/owner modifications. + +`MagicBlockInstruction::try_to_vec()` is a thin `bincode::serialize` helper. Most call sites use `Instruction::new_with_bincode` directly. + +### Commit/action/intent args + +`args.rs` defines the public data model for base-layer intent scheduling: + +- `ActionArgs` carries action data plus an optional `escrow_index` sentinel defaulting to `255`. +- `BaseActionArgs` identifies the destination program, compact account metas, compute units, escrow authority index, and embedded action args. +- `CommitTypeArgs::{Standalone, WithBaseActions}` lists committed-account indices and optional post-commit base actions. +- `UndelegateTypeArgs::{Standalone, WithBaseActions}` lists undelegation behavior and optional base actions. +- `CommitAndUndelegateArgs` combines commit and undelegate parts. +- `MagicBaseIntentArgs` is the legacy single-intent shape used by `ScheduleBaseIntent`. +- `MagicIntentBundleArgs` is the recommended bundle shape used by `ScheduleIntentBundle`; it can include optional commit, commit-and-undelegate, commit-finalize, commit-finalize-and-undelegate, and standalone base actions. +- `ShortAccountMeta` carries only `pubkey` and `is_writable`; it intentionally has no caller-controlled `is_signer` flag. +- `AddActionCallbackArgs` attaches callback metadata to the latest scheduled action in the current slot/blockhash. + +All account references inside commit/action args are compact indices into the instruction account list. The Magic Program resolves them in `programs/magicblock/src/magic_scheduled_base_intent.rs`. + +### Task args + +Task scheduling uses: + +- `ScheduleTaskArgs` in `MagicBlockInstruction::ScheduleTask`; +- `TaskRequest::{Schedule, Cancel}` plus `ScheduleTaskRequest` and `CancelTaskRequest` to communicate side effects through `magicblock-core::tls::ExecutionTlsStash` to `magicblock-task-scheduler`; +- `TaskRequest::id()` to abstract over schedule/cancel IDs. + +### PDA helpers + +`pda.rs` defines: + +- `CRANK_SEED = b"crank-executor"` and `crank_signer_pda(authority)`, derived under `CRANK_PROGRAM_ID`; +- `CALLBACK_SEED = b"callback-executor"`; +- `CALLBACK_SIGNER` and `CALLBACK_SIGNER_BUMP`, compile-time derived under `CALLBACK_PROGRAM_ID` with no authority seed. + +Callback account metas treat `CALLBACK_SIGNER` specially: user-facing `ShortAccountMeta` cannot set signer bits, but callback builders mark this PDA as signer for inner callback instructions. + +### Callback responses + +`response.rs` defines: + +- `MagicResponse::V1(MagicResponseV1)` with convenience accessors `ok()`, `data()`, and `error()`; +- `MagicResponseV1 { ok, data, error, receipt }`, bincode-encoded after a callback-specific discriminator by `magicblock-services`; +- `ActionReceipt { signature }`, present when the base action transaction signature is available. + +Callback programs, such as `test-integration/programs/flexi-counter`, deserialize this payload directly. + +## Runtime flows + +### Magic Program instruction dispatch + +```text +caller / validator helper + -> Instruction::new_with_bincode(program_id, &MagicBlockInstruction::..., metas) + -> programs/magicblock/src/magicblock_processor.rs + -> bincode::deserialize::() + -> variant-specific processor +``` + +`programs/magicblock/src/magicblock_processor.rs` is the dispatch source of truth for current variant behavior. Adding a variant or changing variant fields requires updating dispatch, helper builders, tests, and this guide. Reordering variants changes bincode discriminants and must be treated as a wire compatibility break. + +### Intent bundle scheduling flow + +1. Application/test program builds `MagicIntentBundleArgs` or legacy `MagicBaseIntentArgs` with compact account indices. +2. The caller invokes `MagicBlockInstruction::ScheduleIntentBundle(args)` or `ScheduleBaseIntent(args)` against the Magic Program and passes payer, `MAGIC_CONTEXT_PUBKEY`, optional fee vault, and referenced accounts. +3. `process_schedule_intent_bundle` verifies the MagicContext account, payer signer, parent program, slot/blockhash, and fee-vault/commit-limit path. +4. `MagicIntentBundle::try_from_args` resolves indices to accounts, builds committed-account/action payloads, rejects empty bundles and duplicate committed pubkeys, and distinguishes secure (`ScheduleIntentBundle`) from legacy (`ScheduleBaseIntent`) action source handling. +5. Commit-and-undelegate variants mark affected local accounts as undelegating/immutable before the intent is written. +6. The scheduled intent is written into MagicContext and the precomputed `ScheduledCommitSent` signature is logged. + +Keep the argument structs compact and index-based. Changing them affects application CPI code, Magic Program validation, and committor/account recovery flows. + +### Clone and program-materialization flow + +```text +magicblock-account-cloner + -> MagicBlockInstruction::{CloneAccount, CloneAccountInit, CloneAccountContinue, CleanupPartialClone, Finalize*ProgramFromBuffer, SetProgramAuthority} + -> programs/magicblock/src/clone_account/* processors + -> local AccountsDb/program state +``` + +`AccountCloneFields` carries the non-data account properties for clone installation. It must stay aligned with cloner request construction and Magic Program clone processors. Post-delegation actions use both clone instruction fields and `PostDelegationActionExecutorInstruction::Execute` in the same transaction; the post-action executor checks the immediately previous Magic Program clone instruction. + +### Ephemeral account flow + +1. Caller invokes `CreateEphemeralAccount { data_len }`, `ResizeEphemeralAccount { new_data_len }`, or `CloseEphemeralAccount`. +2. The instruction account list includes sponsor, ephemeral account, and `EPHEMERAL_VAULT_PUBKEY`. +3. `programs/magicblock/src/ephemeral_accounts` applies rent math using `EPHEMERAL_RENT_PER_BYTE` and account static-size overhead. +4. Processor tests assert sponsor/vault lamport movement, signer requirements, PDA sponsor rules, and close/resize behavior. + +Changing the rent constant, vault pubkey, or account-meta shape affects user-visible balance semantics and tests. + +### Task scheduling flow + +```text +program CPI + -> MagicBlockInstruction::{ScheduleTask, CancelTask} + -> programs/magicblock/src/schedule_task/* + -> ExecutionTlsStash registers TaskRequest + -> magicblock-task-scheduler receives schedule/cancel request + -> MagicBlockInstruction::ExecuteCrank under CRANK_PROGRAM_ID executes due instructions +``` + +`ScheduleTaskArgs` is the user-facing instruction payload; `TaskRequest` is the internal side-effect payload. Preserve both when changing scheduled task behavior. + +### Callback result flow + +1. A secure intent bundle can attach an action callback with `AddActionCallbackArgs`. +2. `process_add_action_callback` validates the latest intent, same payer, same slot/blockhash, source program, fee vault, and callback account metas. +3. Committor/service code reports action results through `ActionsCallbackService`. +4. `magicblock-services` builds a callback executor instruction under `CALLBACK_PROGRAM_ID`, wraps the destination instruction in `CallbackInstruction::ExecuteCallback`, and bincode-encodes `MagicResponse::V1` after the destination discriminator. +5. Callback programs deserialize `MagicResponse` and can check `CALLBACK_SIGNER` as the authorized PDA signer. + +Do not add user-controlled signer bits to `ShortAccountMeta`; callback signer handling is intentionally derived from `CALLBACK_SIGNER`. + +## Important internals and caveats + +### Bincode compatibility + +All major public structs/enums derive `Serialize` and `Deserialize` and are serialized with `bincode`. Enum variant order and struct field order matter. Adding fields without compatibility handling, removing variants, or reordering variants can break old transactions, persisted contexts, integration programs, or callback decoders. + +The `Unused` instruction variant is intentionally retained as an unused slot after a removed `ScheduleCommitFinalize` path. Do not delete or repurpose it casually; doing so changes discriminants or semantics. + +### Secure vs legacy action scheduling + +`ScheduleBaseIntent(MagicBaseIntentArgs)` is the legacy single-intent path and is processed with `secure = false`. `ScheduleIntentBundle(MagicIntentBundleArgs)` is the recommended path and is processed with `secure = true`, allowing action source-program validation and callback attachment. Keep these distinctions aligned with `programs/magicblock/src/magic_scheduled_base_intent.rs` and `process_add_action_callback.rs`. + +### `ShortAccountMeta` intentionally omits signer state + +Base actions and callbacks carry compact account metas without `is_signer`. Users cannot request arbitrary signer privileges. The only callback signer is derived internally when the account pubkey equals `CALLBACK_SIGNER`; post-delegation and callback executor wrappers also clear signer bits on outer metas where needed. + +### Fixed accounts are also reset/blacklist inputs + +`MAGIC_CONTEXT_PUBKEY`, `EPHEMERAL_VAULT_PUBKEY`, Magic Program IDs, callback/crank/post-action program IDs, and validator IDs are protected from ordinary clone/reset paths in `magicblock-chainlink` and `magicblock-accounts-db`. Update those consumers when adding or changing fixed Magic Program accounts. + +### No crate-local tests + +This crate currently has no local unit tests. Behavior is validated through consumers (`programs/magicblock`, `magicblock-processor`, services, and integration tests). API changes should therefore include targeted consumer tests, not only `cargo nextest run -p magicblock-magic-program-api`. + +## Important invariants + +1. `MagicBlockInstruction`, `CallbackInstruction`, `PostDelegationActionExecutorInstruction`, and public argument/response structs must remain bincode-compatible unless the change is an intentional protocol migration with all consumers updated. +2. Program IDs, fixed account pubkeys, PDA seeds, and PDA derivations must remain stable across validator, Magic Program, services, tests, reset/blacklist logic, and application CPI callers. +3. `ShortAccountMeta` must not grow a user-controlled signer flag for base actions/callbacks without a security review and corresponding Magic Program validation changes. +4. Clone metadata must preserve `lamports`, `owner`, `executable`, `delegated`, `confined`, and `remote_slot`; missing or reordered fields can corrupt local clone semantics. +5. Commit and undelegation argument indices must refer to instruction accounts and must remain compact `u8` indices unless all builders and validators are updated. +6. Intent bundles must continue to reject empty bundles and duplicate committed-account pubkeys in the Magic Program implementation; API changes must not bypass those checks. +7. `ScheduleBaseIntent` and `ScheduleIntentBundle` must preserve their legacy/secure behavior distinction. +8. `EPHEMERAL_RENT_PER_BYTE` and `EPHEMERAL_VAULT_PUBKEY` changes are user-visible balance/lifecycle changes and require processor/integration validation. +9. Callback response payloads must remain decodable by callback programs that expect discriminator-prefixed bincode `MagicResponse` data. +10. The `backward-compat` feature must keep public type aliases coherent; do not expose mixed Solana major-version types in one public payload. + +## Common change areas and what to inspect + +### Adding or changing a Magic Program instruction + +Start with: + +- `magicblock-magic-program-api/src/instruction.rs`; +- `programs/magicblock/src/magicblock_processor.rs`; +- `programs/magicblock/src/utils/instruction_utils.rs`; +- relevant processor module under `programs/magicblock/src/`; +- call sites in `magicblock-account-cloner`, `magicblock-services`, `programs/guinea`, and integration programs. + +Check account metas, signer/writable bits, bincode compatibility, discriminant impact, and whether `Unused` must remain in place. + +### Changing commit/action/undelegation args + +Inspect: + +- `magicblock-magic-program-api/src/args.rs`; +- `programs/magicblock/src/magic_scheduled_base_intent.rs`; +- `programs/magicblock/src/schedule_transactions/process_schedule_intent_bundle.rs`; +- `programs/magicblock/src/schedule_transactions/process_add_action_callback.rs`; +- `magicblock-core/src/intent.rs`; +- `test-integration/programs/flexi-counter/src/processor/schedule_intent.rs`; +- `test-integration/schedulecommit/` suites. + +Preserve compact index resolution, duplicate-account checks, fee behavior, callback source authorization, and undelegation immutability. + +### Changing clone/program clone payloads + +Inspect: + +- `magicblock-magic-program-api/src/instruction.rs` (`AccountCloneFields` and clone variants); +- `magicblock-account-cloner/src/lib.rs`; +- `programs/magicblock/src/clone_account/`; +- `programs/magicblock/src/utils/instruction_utils.rs`; +- `magicblock-processor/tests/post_delegation_actions.rs`; +- `test-integration/test-cloning/`. + +Do not drop `remote_slot`, delegation/confined flags, executable state, or post-delegation action handling. + +### Changing ephemeral account API/constants + +Inspect: + +- `magicblock-magic-program-api/src/lib.rs`; +- `programs/magicblock/src/ephemeral_accounts/`; +- `programs/guinea/src/lib.rs` helper CPIs; +- `magicblock-api/src/fund_account.rs`; +- `magicblock-processor/tests/ephemeral_accounts.rs`. + +Validate sponsor/vault lamports, account ownership, signer requirements, PDA sponsor handling, and close/resize refunds. + +### Changing task payloads or crank PDAs + +Inspect: + +- `magicblock-magic-program-api/src/args.rs` and `src/pda.rs`; +- `programs/magicblock/src/schedule_task/`; +- `magicblock-core/src/tls.rs`; +- `magicblock-task-scheduler`; +- `programs/magicblock/src/utils/instruction_utils.rs`. + +Preserve task IDs, authority semantics, TLS drainage expectations, and `CRANK_PROGRAM_ID` account metas. + +### Changing callback responses or callback PDA behavior + +Inspect: + +- `magicblock-magic-program-api/src/response.rs` and `src/pda.rs`; +- `magicblock-services/src/actions_callback_service.rs`; +- `programs/magicblock/src/schedule_transactions/process_add_action_callback.rs`; +- callback consumers such as `test-integration/programs/flexi-counter/src/processor/callback.rs`. + +Preserve discriminator-prefixing, `MagicResponse::V1` decoding, `CALLBACK_SIGNER` semantics, and callback fee/source validation. + +### Changing Solana compatibility support + +Inspect: + +- `magicblock-magic-program-api/Cargo.toml`; +- `magicblock-magic-program-api/src/compat.rs`; +- any SDK/test-program builds that enable `backward-compat`. + +Run builds/tests with and without `--features backward-compat` when public type aliases or dependencies change. + +## Tests and validation + +For documentation-only changes to this guide: + +```bash +test -f .agents/context/crates/magicblock-magic-program-api.md +rg "magicblock-magic-program-api.md" .agents/context/crate-map.md +``` + +For crate changes, minimum targeted checks: + +```bash +cargo fmt +cargo nextest run -p magicblock-magic-program-api +cargo nextest run -p magicblock-magic-program-api --features backward-compat +``` + +Because this crate has no local tests, also run targeted consumer tests for the changed API area: + +```bash +cargo nextest run -p magicblock-program +cargo nextest run -p magicblock-processor ephemeral_accounts +cargo nextest run -p magicblock-processor post_delegation_actions +``` + +For commit/action/intent changes, prefer integration coverage: + +```bash +cd test-integration +make test-schedule-intents +make test-committor-intent-bundles +``` + +For clone/program-clone API changes: + +```bash +cd test-integration +make test-cloning +``` + +For broad validation, follow `.agents/rules/testing-and-validation.md`: + +```bash +cargo fmt +cargo clippy --workspace --all-targets -- -D warnings +cargo nextest run --workspace +``` + +Performance expectations: this crate has no runtime loops, but its payload shape affects transaction size, serialization cost, account-meta count, scheduler pressure, clone chunking, and callback/base-action delivery. Report any API change that increases serialized instruction size or account requirements, and run the smallest relevant integration test that exercises the affected hot path. + +## Related docs + +- `.agents/specs/validator-specification.md` for Magic Program, commit, undelegation, Magic Actions, cloning, task, and ephemeral account behavior. +- `.agents/context/architecture.md` for the Magic Program scheduling versus validator-side settlement boundary. +- `.agents/context/crate-map.md` for crate ownership and consumers. +- `.agents/rules/testing-and-validation.md` for validation workflow and integration-test commands. +- `.agents/context/crates/magicblock-core.md` for `TaskRequest`, `BaseActionCallback`, and TLS/action callback contracts. +- `.agents/context/crates/magicblock-account-cloner.md` for clone/program-clone flows that emit this crate's instruction payloads. +- `programs/magicblock/src/magicblock_processor.rs` and `programs/magicblock/src/utils/instruction_utils.rs` for current instruction interpretation and helper builders. diff --git a/.agents/context/crates/magicblock-metrics.md b/.agents/context/crates/magicblock-metrics.md new file mode 100644 index 000000000..95bd9b98d --- /dev/null +++ b/.agents/context/crates/magicblock-metrics.md @@ -0,0 +1,557 @@ +# `magicblock-metrics` + +## Purpose + +`magicblock-metrics` is the validator's shared Prometheus metrics crate. It centralizes metric definitions, metric label helper types, metric mutation wrappers, and the small HTTP service that exposes the process-local Prometheus registry. + +At a high level it: + +- defines the validator-wide Prometheus `Registry` with namespace `mbv`, +- declares gauges, counters, counter vectors, histograms, and histogram vectors used by runtime crates, +- registers all collectors exactly once before the metrics server starts, +- exposes a `/metrics` HTTP endpoint in Prometheus text format, +- provides typed wrapper functions so other crates do not need to construct collectors directly, +- provides reusable label traits/types such as `LabelValue`, `Outcome`, and `AccountFetchOrigin`, +- acts as the observability boundary for RPC, transaction execution, account sync, ledger, committor, RPC client, pubsub/gRPC, table-mania, and system-storage metrics. + +This crate is intentionally dependency-light and has no dependency on other workspace crates. Many performance-sensitive crates depend on it, so changes here can affect build topology, hot-path overhead, metric cardinality, and operator visibility across the validator. + +## Update requirement + +Whenever an agent changes behavior in `magicblock-metrics`, or changes another crate in a way that changes metrics exposed by this crate, this document must be updated in the same change. This file is useful only if it reflects the current implementation. + +Update this file for changes to: + +- metric names, help strings, labels, bucket ranges, metric kinds, or namespace behavior, +- wrapper functions in `magicblock-metrics/src/metrics/mod.rs`, +- label enums/traits in `magicblock-metrics/src/metrics/types.rs`, +- metrics service routing, binding, cancellation, or response behavior in `src/service.rs`, +- startup/configuration flow in `magicblock-api` or `magicblock-config` that changes how the service/ticker runs, +- any new consumer crate or major consumer flow that records metrics through this crate, +- validation commands or dashboard/scrape guidance relevant to this crate, +- performance characteristics of metrics used in hot paths. + +If a change adds, removes, or renames a metric that operators may scrape or alert on, call that out explicitly in the handoff/PR notes. Metric names are an external observability contract. + +## Where it sits in the repository + +Primary source files: + +| Path | Role | +|---|---| +| `magicblock-metrics/src/lib.rs` | Crate exports. Re-exports `metrics`, `try_start_metrics_service`, and `MetricsService`. | +| `magicblock-metrics/src/metrics/mod.rs` | Collector declarations, bucket constants, registry setup, and public mutation/timer wrappers. | +| `magicblock-metrics/src/metrics/types.rs` | Shared metric label helpers (`LabelValue`, `Outcome`, `AccountFetchOrigin`) plus currently-unused account clone/commit shape enums. | +| `magicblock-metrics/src/service.rs` | Hyper/Tokio HTTP server exposing `GET /metrics`, handling cancellation, and encoding registry contents. | +| `magicblock-metrics/README.md` | Prometheus/Grafana setup notes for local scraping and visualization. | +| `magicblock-config/src/config/metrics.rs` | Validator configuration shape for metrics address and system collection frequency. | +| `magicblock-api/src/magic_validator.rs` | Starts the metrics service and system metrics ticker during validator startup. | +| `magicblock-api/src/tickers.rs` | Periodically samples ledger/account storage gauges. | + +Main consumers: + +- `magicblock-api` starts the metrics service, starts the periodic system metrics ticker, initializes transaction count from restored ledger history, and observes commit nonce wait time through `MagicSysAdapter`. +- `magicblock-aperture` records RPC request counts, RPC handling time, websocket subscription gauges, send-transaction processing time, and skipped-preflight count. +- `magicblock-processor` records slot, transaction count, failed transaction count, and maximum lock-contention queue size. +- `magicblock-ledger` records ledger column counts, storage size, shutdown/compaction/truncation timers, and RPC API column-family metrics. +- `magicblock-chainlink` records account fetch outcomes, chain slot, monitored/evicted accounts, pubsub/gRPC subscription state, undelegation lifecycle counts, and some investigation counters. +- `magicblock-accounts` records scheduled committor-intent count. +- `magicblock-committor-service` records intent backlog, failed intents, busy executors, execution times, CU usage, task/ALT preparation timing, task-info fetcher state, and commit nonce wait time via `magicblock-api`. +- `magicblock-rpc-client` records signature-subscribe and signature-status confirmation counters. +- `magicblock-table-mania` records lookup-table fetch/close investigation counters. + +## Public API shape + +The crate exports two public areas: + +```rust +pub mod metrics; +mod service; + +pub use service::{try_start_metrics_service, MetricsService}; +``` + +Consumers normally import either: + +```rust +use magicblock_metrics::metrics; +``` + +or specific collectors/wrappers: + +```rust +use magicblock_metrics::metrics::{TRANSACTION_COUNT, RPC_REQUEST_HANDLING_TIME}; +``` + +### Metrics registry and registration + +`metrics::REGISTRY` is a lazily-created `prometheus::Registry` with custom namespace `mbv`. A declared metric such as `transaction_count` is exported as `mbv_transaction_count`. + +`metrics::register()` registers all collectors into this registry and is guarded by `std::sync::Once`. The function is `pub(crate)`, so production code starts registration through `try_start_metrics_service(...)` rather than calling registration directly. + +Important behavior: + +- All declared collectors must be registered in `register()` or they will not appear under `/metrics`. +- Registration is one-shot per process. New tests that start the service multiple times should rely on this idempotence instead of attempting to recreate a registry. +- Adding a static collector without adding it to `register()` is a common mistake. +- The custom namespace means dashboards/alerts should use the `mbv_`-prefixed metric names. + +### Metrics service + +`try_start_metrics_service(addr, cancellation_token)`: + +1. calls `metrics::register()`, +2. creates a `MetricsService`, +3. spawns the service on the current Tokio runtime, +4. returns the `MetricsService` handle or an I/O error from construction. + +`MetricsService` currently stores the bind address and cancellation token. It does not expose a join handle; `magicblock-api` stores it to keep service ownership tied to validator lifetime. + +`start_metrics_server`: + +- binds a `tokio::net::TcpListener` to the configured address, +- logs `Metrics server started` when bind succeeds, +- accepts TCP connections until the cancellation token is cancelled, +- spawns one Tokio task per accepted HTTP/1 connection, +- logs accept errors and connection-close debug messages, +- logs `Metrics server shutdown` after cancellation. + +`metrics_service_router` behavior: + +- `GET /metrics` returns Prometheus text encoding of `metrics::REGISTRY.gather()`. +- All other methods/paths return `404 Not Found` with an empty body. +- It records optional `host` and `user-agent` headers into the tracing span. +- It consumes the entire request body before returning so HTTP keep-alive does not leave unread bytes in the connection buffer. +- If text encoding fails, it logs a warning and returns an empty metrics body rather than panicking. + +Pitfalls: + +- Do not add heavy per-request work to `/metrics`. Scrapes can happen frequently, and `gather()` already walks every collector. +- Do not add stateful side effects to scrape handling. Prometheus scrapes should observe, not mutate validator state. +- If adding new routes, keep `/metrics` simple and preserve `404` behavior for unknown paths unless there is a clear operator need. +- The service requires an active Tokio runtime because it uses `tokio::spawn` and `TcpListener`. + +### Configuration and startup + +Metrics are configured through `ValidatorParams.metrics`, backed by `magicblock-config/src/config/metrics.rs`: + +```toml +[metrics] +address = "127.0.0.1:9090" +collect-frequency = "30s" +``` + +Current defaults come from `magicblock-config/src/consts.rs`: + +- `DEFAULT_METRICS_ADDR = "0.0.0.0:9000"`, +- `DEFAULT_METRICS_COLLECT_FREQUENCY_SEC = 30`. + +`config.example.toml` currently documents a sample address of `127.0.0.1:9090`; check the config defaults before assuming the example value is the runtime default. + +Startup flow in `magicblock-api/src/magic_validator.rs`: + +1. storage and core services are initialized, +2. `magicblock_metrics::try_start_metrics_service(config.metrics.address.0, token.clone())` starts the HTTP service, +3. `init_system_metrics_ticker(config.metrics.collect_frequency, &ledger, &accountsdb, token.clone())` starts periodic system/storage gauge updates, +4. `TRANSACTION_COUNT.inc_by(ledger.count_transactions()? as u64)` seeds transaction count from persisted ledger history before new execution starts. + +Shutdown is cancellation-token based: the system metrics ticker exits when the token is cancelled, and the metrics server stops accepting when the same token is cancelled. + +## Metric groups currently defined + +### Generic bucket constants + +Durations are recorded in seconds because Prometheus histograms use seconds. Shared bucket arrays cover: + +- 10-90 microseconds, +- 100-900 microseconds, +- 1-9 milliseconds, +- 10-90 milliseconds, +- 100-900 milliseconds, +- 1-9 seconds. + +Several histograms use custom buckets when their expected durations are longer, such as ledger compaction/truncation and committor intent execution. + +When adding a histogram, choose buckets around the expected latency distribution. Do not blindly reuse short microsecond/millisecond buckets for operations that normally take seconds or minutes, and do not use only large buckets for hot-path microsecond work. + +### Slot and clone cache + +| Wrapper/collector | Meaning | +|---|---| +| `set_slot(slot)` / `SLOT_GAUGE` | Local validator slot. Updated by the processor scheduler. | +| `set_chain_slot(value)` / `CHAIN_SLOT_GAUGE` | Observed base-chain slot. Updated by Chainlink's chain-slot wrapper. | +| `set_cached_clone_outputs_count(count)` / `CACHED_CLONE_OUTPUTS_COUNT` | Number of cached clone outputs in the remote account cloner worker. | + +### Ledger and storage + +| Wrapper/collector | Meaning | +|---|---| +| `set_ledger_size(size)` | Ledger storage size in bytes. | +| `set_ledger_block_times_count`, `set_ledger_blockhashes_count`, `set_ledger_slot_signatures_count`, `set_ledger_address_signatures_count`, `set_ledger_transaction_status_count`, `set_ledger_transaction_successful_status_count`, `set_ledger_transaction_failed_status_count`, `set_ledger_transactions_count`, `set_ledger_transaction_memos_count`, `set_ledger_perf_samples_count` | Per-column count gauges updated by ledger metrics collection. | +| `observe_columns_count_duration(f)` | Times ledger column count computation. | +| `start_ledger_truncator_compaction_timer()` | Timer for RocksDB compaction during truncation. | +| `observe_ledger_truncator_delete(f)` | Times deletion of RocksDB slot ranges. | +| `start_ledger_disable_compactions_timer()` | Timer for disabling manual compaction. | +| `start_ledger_shutdown_timer()` | Timer for ledger shutdown. | + +System/storage gauge updates are driven from `magicblock-api/src/tickers.rs` at `metrics.collect-frequency`. This means gauges such as ledger size and accounts size are sampled, not updated continuously. + +### Accounts and account sync + +| Wrapper/collector | Meaning | +|---|---| +| `set_accounts_size(value)` | Persisted account storage size in bytes. | +| `set_accounts_count(value)` | Number of accounts in `AccountsDb`. | +| `inc_pending_clone_requests()` / `dec_pending_clone_requests()` | In-memory pending account clone request gauge. Must remain balanced. | +| `set_monitored_accounts_count(count)` | Absolute count of monitored accounts; callers must pass total count, not delta. | +| `inc_evicted_accounts_count()` | Cumulative count of monitored accounts forcefully removed from monitor list/database. | +| `inc_account_fetches_success(count)` | Successful network account fetch count. | +| `inc_account_fetches_failed(count)` | Failed network account fetch count. | +| `inc_account_fetches_found(origin, count)` | Network fetches that found accounts, labelled by `AccountFetchOrigin`. | +| `inc_account_fetches_not_found(origin, count)` | Network fetches that did not find accounts, labelled by `AccountFetchOrigin`. | +| `inc_undelegation_requested()` | Chainlink observed an undelegation request. | +| `inc_undelegation_completed()` | Chainlink detected undelegation completion. | +| `inc_unstuck_undelegation_count()` | Undelegating account was already undelegated on chain. | + +Important caveats: + +- `PENDING_ACCOUNT_CLONES_GAUGE` is a gauge. Every increment must have a matching decrement on all success, error, cancellation, and timeout paths. +- `MONITORED_ACCOUNTS_GAUGE` is set to an absolute count. Do not call it with a delta. +- Account fetch found/not-found counters include an `origin` label. Keep origin cardinality low and stable. +- `AccountFetchOrigin::SendTransaction(Signature)` intentionally labels as only `send_transaction`; the signature is available through `signature()` for logging/correlation but must not become a Prometheus label. + +### RPC and aperture + +| Wrapper/collector | Meaning | +|---|---| +| `ENSURE_ACCOUNTS_TIME` (`kind` label) | Time spent ensuring account presence. Consumers start timers directly. | +| `TRANSACTION_PROCESSING_TIME` | Total time spent in RPC send-transaction processing. | +| `RPC_REQUEST_HANDLING_TIME` (`name` label) | Time spent handling named RPC requests. | +| `TRANSACTION_SKIP_PREFLIGHT` | Count of transactions submitted with preflight skipped. | +| `RPC_REQUESTS_COUNT` (`name` label) | Count of RPC requests by method name. | +| `RPC_WS_SUBSCRIPTIONS_COUNT` (`name` label) | Active RPC websocket subscriptions by subscription kind. | + +Pitfalls: + +- RPC method names are labels. Use bounded, canonical method names, not request parameters or user-controlled free-form values. +- Timing wrappers used in hot RPC paths must add minimal overhead. Prefer starting a Prometheus timer once around an existing operation instead of adding multiple nested metrics in tight loops. + +### Transaction execution + +| Wrapper/collector | Meaning | +|---|---| +| `TRANSACTION_COUNT` | Total executed transactions. Incremented in processor execution; also seeded from ledger on startup. | +| `FAILED_TRANSACTIONS_COUNT` | Total failed transactions. | +| `MAX_LOCK_CONTENTION_QUEUE_SIZE` | Maximum observed queue size for account-lock contention. | +| `set_slot(slot)` | Local slot gauge, updated by scheduler/slot flow. | + +Pitfalls: + +- Transaction execution and scheduler paths are hot. Do not add expensive label construction, serialization, allocation-heavy formatting, or high-cardinality labels here. +- `TRANSACTION_COUNT` is currently public and sometimes used directly rather than through a wrapper. If changing it, audit all direct imports. + +### Committor service and settlement + +| Wrapper/collector | Meaning | +|---|---| +| `inc_committor_intents_count()` / `inc_committor_intents_count_by(by)` | Scheduled committor intents. | +| `set_committor_intents_backlog_count(value)` | Number of intents in backlog. | +| `inc_committor_failed_intents_count(intent_kind, error_kind)` | Failed intents labelled by intent kind and error kind. | +| `set_committor_executors_busy_count(value)` | Busy intent executor count. | +| `observe_committor_intent_execution_time_histogram(seconds, kind, outcome)` | Intent execution duration by intent kind and outcome kind. | +| `set_commmittor_intent_cu_usage(value)` | Compute units used for an intent. Note the current function name has three `m`s in `commmittor`; preserve compatibility unless doing a deliberate rename. | +| `observe_committor_intent_task_preparation_time(task_type)` | Timer for task preparation, labelled by task type. | +| `observe_committor_intent_alt_preparation_time()` | Timer for address lookup table preparation. | +| `start_fetch_commit_nonces_wait_timer()` | Timer around waiting for current commit nonce responses from the committor service. | + +Pitfalls: + +- `LabelValue` is implemented by committor error/output/task types. Make sure new variants return low-cardinality static strings. +- Do not label failed intents with raw error messages, pubkeys, signatures, transaction IDs, or other unbounded values. +- Long-running commit/settlement operations need buckets that make operator alerts meaningful; update bucket ranges if expected durations change materially. + +### RPC client, table-mania, task-info, and investigation counters + +Current counters include: + +- `inc_remote_account_provider_a_count()`, +- `inc_task_info_fetcher_a_count()`, +- `set_task_info_fetcher_retiring_count(count)`, +- `inc_table_mania_a_count()`, +- `inc_table_mania_close_a_count()`, +- `inc_rpc_client_signature_ws_subscribe_count()`, +- `inc_rpc_client_signature_ws_notification_count()`, +- `inc_rpc_client_signature_ws_fallback_count()`, +- `inc_rpc_client_signature_status_batch_count()`, +- `inc_rpc_client_signature_status_batch_signatures_count(count)`. + +Some names/help strings are investigation-oriented (`*_a_count`, "Get mupltiple account count"). Treat them as current implementation, but if you formalize or rename them, update dashboards/alerts and this guide in the same change. + +### Pubsub clients and gRPC streams + +| Wrapper/collector | Meaning | +|---|---| +| `set_connected_pubsub_clients_count(count)` | Total connected pubsub clients. | +| `set_connected_direct_pubsub_clients_count(count)` | Pubsub clients that subscribe immediately when requested. If this goes to zero, account updates may be missed. | +| `set_pubsub_client_uptime(client_id, connected)` | Per-client connection state, `1` for connected and `0` for disconnected. | +| `set_pubsub_client_reconnect_backoff_duration_seconds(client_id, duration_secs)` | Current reconnect backoff. | +| `set_pubsub_client_failed_reconnect_attempts(client_id, attempts)` | Current failed reconnect attempts. | +| `set_pubsub_client_resubscribe_delay(client_id, delay_ms)` | Current resubscription delay in milliseconds. | +| `set_pubsub_client_resubscribed_count(client_id, count)` | Number of subscriptions resubscribed before completion/failure. | +| `set_pubsub_client_connections_count(client_id, count)` | Pooled websocket connection count for a client. | +| `inc_pubsub_unsubscribe_timeout_count(client_id, scope)` | Unsubscribe timeout count. | +| `inc_pubsub_idle_connections_pruned_count(client_id, count)` | Idle pooled connections pruned. | +| `set_grpc_optimized_streams_gauge(client_id, count)` | Optimized gRPC stream count. | +| `set_grpc_unoptimized_streams_gauge(client_id, count)` | Unoptimized gRPC stream count. | +| `set_grpc_total_streams_gauge(client_id, count)` | Total gRPC streams. | + +Pitfalls: + +- `client_id`, `scope`, and `program` labels must stay bounded. Do not include endpoint URLs with secrets, arbitrary pubkeys unless intentionally bounded, or untrusted free-form strings unless sanitized and cardinality-controlled. +- Pubsub/gRPC metrics are used to detect lost account-update connectivity. Do not silently remove or reset them without replacement guidance. + +## Label helper types + +### `LabelValue` + +`LabelValue` is a small trait used to convert strongly-typed values into stable Prometheus label strings: + +```rust +pub trait LabelValue { + fn value(&self) -> &str; +} +``` + +It is implemented for: + +- `&str`, +- `String`, +- `Result` where both sides implement `LabelValue`, +- `AccountFetchOrigin`, +- downstream consumer types such as committor execution outputs and errors. + +Use `LabelValue` when a metric needs a label derived from an enum-like type. New implementations should return stable, low-cardinality strings. Avoid allocating strings on hot paths where a static `&str` would work. + +### `Outcome` + +`Outcome` has two variants: + +- `Success` -> `success`, +- `Error` -> `error`. + +Use `Outcome::from_success(bool)` for binary success/error label values. Do not create separate labels for individual errors unless operators need that distinction and cardinality is controlled. + +### `AccountFetchOrigin` + +`AccountFetchOrigin` identifies why Chainlink fetched account data: + +- `GetMultipleAccounts` -> `get_multiple_accounts`, +- `GetAccount` -> `get_account`, +- `SendTransaction(Signature)` -> `send_transaction`, +- `ProjectAta` -> `project_ata`. + +The `SendTransaction` signature is intentionally not part of the label. Use `signature()` for tracing/log correlation only. + +### `AccountClone` and `AccountCommit` + +`AccountClone<'a>` and `AccountCommit<'a>` describe account-clone and account-commit shapes, including fee payer, undelegated, delegated, program, commit-only, and commit-and-undelegate variants. They are currently defined in `types.rs` for shared metric modelling but are not broadly used by the current wrappers. If you wire them into live metrics, update this guide with their metric names and label behavior. + +## Runtime flows + +### Validator startup and scrape flow + +```text +MagicValidator::try_from_config + -> try_start_metrics_service(address, cancellation_token) + -> metrics::register() once + -> spawn metrics HTTP server + -> init_system_metrics_ticker(collect_frequency, ledger, accountsdb, token) + -> seed TRANSACTION_COUNT from ledger count + -> Prometheus scrapes GET /metrics + -> REGISTRY.gather() + -> TextEncoder::encode_to_string(...) + -> HTTP 200 text body +``` + +### Periodic system gauge flow + +```text +system metrics ticker + -> sleep(config.metrics.collect_frequency) + -> ledger.storage_size() -> set_ledger_size + -> accountsdb.storage_size() -> set_accounts_size + -> accountsdb.account_count() -> set_accounts_count + -> repeat until cancellation token is cancelled +``` + +This flow samples relatively heavyweight storage values off the critical execution path. Do not move these operations into RPC request handling or transaction execution. + +### Hot-path instrumentation flow + +Most runtime crates instrument an existing operation by: + +1. starting a timer or incrementing a counter at the operation boundary, +2. avoiding dynamic label values, +3. recording once on completion/drop, +4. leaving detailed logs/traces to `tracing` rather than Prometheus labels. + +Examples: + +- RPC `sendTransaction` starts `TRANSACTION_PROCESSING_TIME` once around request processing. +- Processor increments `TRANSACTION_COUNT` once per execution. +- Chainlink increments account-fetch counters by batch counts rather than per-account string labels. +- Committor records intent execution time with typed kind/outcome labels. + +## Important invariants + +Preserve these invariants when editing this crate: + +1. **Metric names are operator-facing API.** Renames/removals require explicit documentation and migration notes. +2. **Every declared collector must be registered exactly once.** Add new collectors to `register()`. +3. **The registry namespace is `mbv`.** Scraped metric names are namespace-prefixed. +4. **Labels must be bounded and low-cardinality.** Never use signatures, pubkeys, account addresses, transaction IDs, raw errors, endpoint URLs with secrets, or user input as labels unless a bounded cardinality design is documented. +5. **Hot-path metrics must be cheap.** Avoid allocations, formatting, locks beyond Prometheus collector internals, and repeated label lookup in tight loops when a batch/outer operation metric is sufficient. +6. **Gauge wrappers must preserve set-vs-delta semantics.** Some wrappers set absolute counts; others increment/decrement. Do not mix these up. +7. **Increment/decrement gauges must be balanced on all control-flow paths.** This is especially important for pending clone request metrics. +8. **Histograms must use seconds and meaningful buckets.** Bucket choices should match expected latency ranges. +9. **Scrape handling must not mutate validator state.** `/metrics` observes registry values only. +10. **Metrics service shutdown must remain cancellation-token driven.** Do not introduce shutdown paths that can block validator shutdown indefinitely. +11. **This crate should stay dependency-light.** Do not add dependencies on runtime workspace crates; metrics should be a leaf observability helper that others can depend on. +12. **Do not hide critical failures by removing metrics.** Account-update connectivity, commit backlog/failure, RPC latency, transaction count/failure, and storage-size metrics are operationally important. + +## Common change areas and what to inspect + +### Adding a new metric + +Inspect and update: + +1. `magicblock-metrics/src/metrics/mod.rs` for the `lazy_static!` collector declaration. +2. `register()` in the same file. +3. A wrapper function, unless direct collector access is already the pattern for that metric group. +4. The consumer crate where the metric is recorded. +5. This guide's metric group section. +6. Any Prometheus/Grafana docs if the metric is intended for dashboards. + +Checklist: + +- Is the metric a counter, gauge, histogram, or vector of those? +- Are labels low-cardinality and static/enum-like? +- Are histogram buckets useful for the expected duration? +- Is the metric recorded outside tight loops when possible? +- Is the help string accurate and spelled correctly? +- Does the name follow existing snake_case naming? +- Does the metric need a wrapper to encode semantics? + +### Renaming or removing a metric + +Treat this as an operator-visible breaking change. + +Inspect: + +- `magicblock-metrics/src/metrics/mod.rs`, +- consumers listed above, +- `magicblock-metrics/README.md`, +- dashboards/Prometheus configs if present, +- integration tests or scripts that scrape metrics, such as subscription-limit tests. + +If a metric is obsolete, prefer a staged approach when possible: keep the old metric while adding the replacement, or document the exact replacement and update all repository references. + +### Adding or changing labels + +Inspect: + +- `metrics/types.rs` for reusable label enums, +- downstream `LabelValue` implementations, +- consumer paths to ensure they pass enum/static values rather than dynamic strings. + +Pitfalls: + +- Prometheus vector label cardinality multiplies memory/time cost. +- A label that includes pubkeys/signatures can create unbounded time-series growth. +- `String` implements `LabelValue`, but that does not make arbitrary strings safe labels. + +### Changing metrics service behavior + +Inspect: + +- `magicblock-metrics/src/service.rs`, +- `magicblock-api/src/magic_validator.rs` startup/shutdown ownership, +- `magicblock-config/src/config/metrics.rs` and config tests, +- `config.example.toml`, +- local Prometheus scrape configuration. + +Preserve: + +- `GET /metrics` compatibility, +- cancellation-token shutdown, +- body consumption for keep-alive correctness, +- low scrape overhead. + +### Changing periodic system metrics + +Inspect: + +- `magicblock-api/src/tickers.rs`, +- ledger/account storage APIs called by the ticker, +- `MetricsConfig.collect_frequency` defaults and tests. + +Do not put storage-size/account-count calls on critical RPC or execution paths. They can involve storage/index work and are intentionally sampled. + +### Instrumenting hot paths + +Before adding metrics to scheduler, executor, RPC, Chainlink fetch/clone, committor send/confirm, pubsub/gRPC, ledger, or account storage hot paths: + +- prefer counters/gauges with no labels or low-cardinality labels, +- avoid string formatting and heap allocation, +- aggregate counts where possible, +- use timers at operation boundaries, not inside per-account/per-instruction loops unless there is a clear need, +- document any unavoidable overhead. + +## Tests and validation + +For documentation-only changes to this file, verify the file exists and links/paths are accurate. + +Minimum targeted commands for Rust changes in `magicblock-metrics`: + +```bash +cargo fmt +cargo nextest run -p magicblock-metrics +``` + +For configuration/startup changes involving the metrics config or service startup, also run targeted config/API tests where practical: + +```bash +cargo nextest run -p magicblock-config metrics +cargo nextest run -p magicblock-api +``` + +For broader validation, use the repository baseline from `.agents/rules/testing-and-validation.md`: + +```bash +cargo clippy --workspace --all-targets -- -D warnings +cargo nextest run --workspace +``` + +Manual scrape validation when changing `service.rs` or startup wiring: + +```bash +# Start the validator with a known metrics address, then: +curl -s http://127.0.0.1:9000/metrics | head +curl -i http://127.0.0.1:9000/not-metrics +``` + +Expected behavior: + +- `/metrics` returns Prometheus text including `mbv_` metric names, +- unknown routes return `404`, +- shutdown cancellation stops the service without hanging validator shutdown. + +When changing metrics on performance-sensitive paths, include at least a reasoned performance assessment. If no benchmark or load test is run, report the residual risk explicitly. + +## Related docs + +- `.agents/context/overview.md` — high-level validator purpose and hot-path caution. +- `.agents/context/architecture.md` — observability as a background/service concern and cross-crate boundaries. +- `.agents/context/crate-map.md` — crate ownership and consumers. +- `.agents/rules/testing-and-validation.md` — baseline Rust validation workflow. +- `magicblock-metrics/README.md` — local Prometheus/Grafana setup. diff --git a/.agents/context/crates/magicblock-rpc-client.md b/.agents/context/crates/magicblock-rpc-client.md new file mode 100644 index 000000000..381db39b1 --- /dev/null +++ b/.agents/context/crates/magicblock-rpc-client.md @@ -0,0 +1,345 @@ +# `magicblock-rpc-client` + +## Purpose + +`magicblock-rpc-client` is the validator's shared async wrapper around Solana's nonblocking `RpcClient` for base-layer reads, transaction submission, and transaction confirmation. It sits on the base-layer settlement path used by the committor service and `magicblock-table-mania`, and on supporting account-fetch/admin flows used by account cloning, task-info fetching, and validator registration helpers. + +High-level responsibilities: + +- wrap an existing `Arc` in a cheap-to-clone `MagicblockRpcClient`; +- send base-layer transactions with MagicBlock defaults (`skip_preflight: true`, base64 encoding) and optional processed/committed confirmation; +- coalesce concurrent signature-status polling and optionally use `signatureSubscribe` websocket notifications before falling back to batched polling; +- cache recent blockhashes and slots briefly to reduce base-layer RPC load in high-volume settlement flows; +- batch `getMultipleAccounts` requests so callers do not exceed RPC provider input limits; +- expose address lookup table account helpers used by `magicblock-table-mania`; +- expose retry/error-mapping helpers in `utils` so committor callers can map Solana transaction errors into domain errors while preserving retry policy. + +This crate is performance-sensitive because it is used while preparing and delivering base-layer commit, undelegation, action, and lookup-table transactions. Changes must avoid increasing confirmation latency, duplicate RPC calls, unbounded task spawning, lock contention in the shared confirmation state, or RPC-provider amplification. + +## Update requirement + +Update this guide in the same change whenever behavior or contracts in `magicblock-rpc-client` change. In particular, update it for changes to: + +- `MagicblockRpcClient` constructors, cached blockhash/slot behavior, commitment handling, or `chain_slot` observation; +- `SEND_TRANSACTION_CONFIG`, send/confirm defaults, timeout intervals, or `MagicBlockSendTransactionConfig` semantics; +- signature confirmation behavior in `src/signature_confirmer.rs`, including websocket fallback, polling batch size, cache TTL, waiter coalescing, metrics, or cancellation behavior; +- `get_multiple_accounts*`, lookup-table helpers, transaction log/CU helpers, or account-not-found handling; +- retry and error-mapping traits/functions in `src/utils.rs`; +- metrics emitted through `magicblock-metrics` for RPC-client confirmation paths; +- validation commands or integration suites relevant to base-layer send/confirm behavior. + +Because this crate is a shared settlement dependency, also update this file when another crate changes how `MagicblockRpcClient` is constructed or how its send/confirm outcomes are interpreted. + +## Where it sits in the repository + +| Path | Role | +|---|---| +| `magicblock-rpc-client/Cargo.toml` | Package metadata and dependencies on Solana RPC/pubsub/status crates plus `magicblock-metrics`. | +| `magicblock-rpc-client/src/lib.rs` | Public crate surface, error/result types, send configuration/outcome types, `MagicblockRpcClient`, blockhash/slot caches, account/lookup-table helpers, transaction send/confirm methods, and transaction log/CU helpers. | +| `magicblock-rpc-client/src/signature_confirmer.rs` | Internal confirmation engine. Coalesces waiters, batches `getSignatureStatuses`, caches completed statuses, optionally uses websocket `signatureSubscribe`, and records RPC-client metrics. Unit tests live in this file. | +| `magicblock-rpc-client/src/utils.rs` | Public retry and error-mapping utilities used by committor send paths. | +| `magicblock-metrics/src/metrics/mod.rs` | Defines RPC-client confirmation counters: websocket subscribe/notification/fallback counts and signature-status batch counters. | +| `magicblock-committor-service/src/committor_processor.rs` | Builds `MagicblockRpcClient` from chain config, optional websocket URI, and optional observed chain-slot atom. | +| `magicblock-committor-service/src/intent_executor/intent_execution_client.rs` | Sends prepared base-layer intent transactions with `ensure_committed()` and records CU metrics through `get_transaction`. | +| `magicblock-committor-service/src/transaction_preparator/delivery_preparator.rs` | Sends delivery/preparation transactions with `ensure_committed()`. | +| `magicblock-committor-service/src/intent_executor/task_info_fetcher.rs` | Fetches committed accounts with `get_multiple_accounts_with_config` and `min_context_slot`. | +| `magicblock-table-mania/src/lookup_table_rc.rs` and `magicblock-table-mania/src/manager.rs` | Create/extend/deactivate/close ALTs, fetch lookup-table metadata/addresses, and choose send confirmation policy. | +| `magicblock-api/src/domain_registry_manager.rs` | Uses `MagicblockRpcClient` for validator domain-registry transaction submission. | +| `magicblock-account-cloner/src/util.rs` | Uses static transaction log/CU extraction helpers for clone diagnostics. | + +Main consumers: + +- `magicblock-committor-service` for commit/undelegation/action transaction delivery, recovery-related slot reads, task-info fetching, and post-send metrics; +- `magicblock-table-mania` for ALT lifecycle reads and transactions; +- `magicblock-account-cloner` for diagnostic helpers around transaction logs and compute units; +- `magicblock-api` and `magicblock-validator-admin` for operator/admin transaction helpers; +- integration suites `test-integration/test-committor-service` and `test-integration/test-table-mania`. + +## Public API shape / Main public types and APIs + +### Crate exports + +`src/lib.rs` exposes the wrapper and public utilities directly: + +- `pub mod utils`; +- `MagicblockRpcClient`; +- `MagicBlockRpcClientError` and `MagicBlockRpcClientResult`; +- `MagicBlockSendTransactionConfig`; +- `MagicBlockSendTransactionOutcome`; +- `SEND_TRANSACTION_ENCODING` and `SEND_TRANSACTION_CONFIG`. + +The crate name is `magicblock-rpc-client`; the main type is spelled `MagicblockRpcClient` with a lowercase `b` in `block`. Preserve that spelling in public APIs unless intentionally performing a breaking rename. + +### `MagicblockRpcClient` + +`MagicblockRpcClient` stores: + +- `Arc` as the underlying Solana RPC client; +- an internal blockhash/slot cache protected by Tokio mutexes; +- optional `Arc` `chain_slot` for sharing the highest observed base-layer slot with other services; +- an internal `SignatureConfirmer` for send confirmation. + +Constructors: + +- `MagicblockRpcClient::new(client)` wraps an RPC client with no chain-slot atom and no websocket URL; +- `new_with_chain_slot(client, chain_slot)` updates the shared atom whenever slots/blockhash context slots are observed; +- `new_with_websocket(client, websocket_url)` enables websocket-first signature confirmation when a URL is provided; +- `new_with_chain_slot_and_websocket(client, chain_slot, websocket_url)` combines both options; +- `impl From` creates a wrapper by placing the client in an `Arc`. + +Common methods: + +- `get_latest_blockhash()` fetches/caches `getLatestBlockhash` results and records the response context slot; +- `invalidate_cached_blockhash()` clears only the latest-blockhash cache entry; +- `get_slot()`, `wait_for_next_slot()`, and `wait_for_higher_slot(slot)` read or wait on observed/fetched base-layer slot values; +- `get_account(pubkey)` returns `Ok(None)` for Solana `AccountNotFound` user errors; +- `get_multiple_accounts*` chunks requests, defaulting to 100 pubkeys per RPC call; +- `get_lookup_table_meta` and `get_lookup_table_addresses` deserialize ALT accounts using `AddressLookupTable::deserialize`; +- `request_airdrop`, `get_transaction`, `get_transaction_logs`, and `get_transaction_cus` are convenience pass-through/extraction helpers; +- `get_inner()` returns the underlying `Arc` for consumers that need a native call. + +### Send configuration and outcome + +`SEND_TRANSACTION_CONFIG` is the default for actual submission: + +- `skip_preflight: true`; +- `encoding: Some(UiTransactionEncoding::Base64)`; +- no preflight commitment, retry override, or min context slot. + +`MagicBlockSendTransactionConfig` controls behavior after submission: + +- `Send` submits and returns a `MagicBlockSendTransactionOutcome` without status checks; +- `SendAndConfirm` waits for processed status and optionally for the client's configured commitment level; +- `ensure_sent()` returns `Send`; +- `ensure_processed()` waits for processed status, with a 2s blockhash-valid wait hint and a 50s processed timeout; +- `ensure_committed()` waits for processed status and then up to 8s for the client's commitment level; +- `ensures_committed()` reports whether a config includes the commitment-level wait. + +`MagicBlockSendTransactionOutcome` carries the signature plus optional processed/confirmed transaction errors. Use `into_result()` when transaction errors should become `MagicBlockRpcClientError::SentTransactionError`; use `into_signature_and_error()` when callers need to log or handle the signature and transaction error separately. + +### Errors and retry utilities + +`MagicBlockRpcClientError` distinguishes native RPC failures, latest-blockhash/slot failures, ALT deserialize failures, send failures, status timeout/confirmation failures, and submitted transaction errors. `signature()` returns the associated transaction signature for errors that have one. + +`utils.rs` exposes: + +- `send_transaction_with_retries(make_send_fut, mapper, stop_predicate)`, which repeatedly calls an async send operation until success, an unrecoverable mapped error, or a caller-supplied stop condition; +- `SendErrorMapper`, which maps transport/send errors into a caller's execution error and decides retry delay versus break; +- `TransactionErrorMapper`, which lets domain crates map Solana `TransactionError` values while falling back safely for unknown errors; +- `map_magicblock_client_error` and `try_map_client_error`, used by committor code to preserve domain-specific transaction-error handling; +- `decide_rpc_error_flow` and `decide_rpc_native_flow`, the shared retry policy for known retryable RPC conditions. + +Retry policy is intentionally conservative: IO errors, HTTP 5xx send errors, latest-blockhash fetch errors, and signature-status timeout/confirmation failures may retry; unknown transaction errors and most client errors break unless mapped by the caller. + +## Runtime flows + +### Base-layer transaction send and confirmation + +```text +caller builds/signs tx + -> MagicblockRpcClient::send_transaction(tx, config) + -> RpcClient::send_transaction_with_config(tx, SEND_TRANSACTION_CONFIG) + -> if config == Send: return signature-only outcome + -> wait_for_processed_status(signature, tx.recent_blockhash, ...) + -> optionally wait_for_confirmed_status(signature, client.commitment(), ...) + -> return outcome or SentTransactionError +``` + +Important details: + +1. `send_transaction` always submits with `skip_preflight: true`; validation must come from callers, transaction construction, and post-send status checks. +2. The processed wait uses `CommitmentConfig::processed()` regardless of the underlying client commitment. +3. The committed wait uses `self.client.commitment()`, so construction sites must choose the desired commitment level on the underlying Solana `RpcClient`. +4. Durable nonce transactions are explicitly unsupported by this helper because confirmation uses the transaction's recent blockhash. +5. If processed status returns a transaction error, `send_transaction` returns `SentTransactionError` immediately and does not continue to the committed wait. + +### Signature confirmation with websocket fallback + +When a websocket URL is configured: + +1. `SignatureConfirmer::wait_for_status` tries `wait_with_websocket_then_poll`. +2. The confirmer obtains or creates a cached `PubsubClient` under a mutex. +3. It subscribes with `signatureSubscribe` and the requested commitment. +4. It races notifications against the fallback delay. +5. A `ProcessedSignature` notification is converted into `TransactionResult<()>`, unsubscribe is called, and the status is returned. +6. Connection/subscription timeout, connection failure, subscription failure, stream end, or fallback delay causes batched polling for the remaining timeout. + +Metrics incremented in this path: + +- `mbv_rpc_client_signature_ws_subscribe_count`; +- `mbv_rpc_client_signature_ws_notification_count`; +- `mbv_rpc_client_signature_ws_fallback_count`. + +Do not make websocket confirmation the only path. Polling fallback is required for robustness across RPC providers and transient websocket failures. + +### Batched signature-status polling + +Without websocket, or after websocket fallback: + +1. A waiter is registered in shared `PollState` under the target signature and desired commitment. +2. Completed statuses are looked up from a short-lived cache first. +3. If no polling worker is running, one Tokio task is spawned. +4. The worker sleeps the coalesce delay, snapshots pending signatures, and calls `get_signature_statuses` in chunks of `batch_size`. +5. Fetched statuses are cached and delivered only to waiters whose requested commitment is satisfied by `TransactionStatus::satisfies_commitment`. +6. Waiters whose timeout expires remove themselves from `PollState`. +7. The worker exits after there are no remaining waiters. + +Defaults are currently: + +- batch size: `256` signatures; +- status cache TTL: `30s`; +- websocket fallback delay: `2s`; +- poll coalesce delay: `25ms`. + +Metrics incremented in this path: + +- `mbv_rpc_client_signature_status_batch_count`; +- `mbv_rpc_client_signature_status_batch_signatures_count`. + +### Blockhash and slot caching + +`get_latest_blockhash()` caches the latest blockhash for `5s` and keeps recent blockhash metadata for up to the processed-status timeout window. Recent metadata is used to improve status-timeout error messages when the observed slot is still within the blockhash validity window. + +Slot reads are cached for `400ms` in `get_cached_slot()`, and both fetched slots and blockhash context slots update the optional shared `chain_slot` atom with `fetch_max`. `wait_for_next_slot()` and `wait_for_higher_slot(slot)` prefer the observed atom when present and otherwise poll the cached slot every `100ms`. + +### Batched account and ALT reads + +`get_multiple_accounts_with_config` chunks input pubkeys by `max_per_fetch.unwrap_or(100)` and concurrently joins all chunk futures. The default exists because at least some RPC providers reject more than 100 `getMultipleAccounts` inputs. Callers that require `min_context_slot` or specific commitment must pass a full `RpcAccountInfoConfig`. + +`get_lookup_table_meta` and `get_lookup_table_addresses` first fetch the raw account through `get_account`; missing accounts return `Ok(None)`, while present accounts must deserialize as Solana ALT state or return `LookupTableDeserialize`. + +## Important internals and caveats + +### Confirmation state and task spawning + +`SignatureConfirmer` has one shared `PollState` per `MagicblockRpcClient` clone set. The state stores waiters by signature, completed status cache entries, and a `worker_running` flag. The polling worker is intentionally demand-driven: it is spawned only when the first waiter is registered and exits when no waiters remain. + +Avoid holding the `PollState` mutex across RPC calls. The current implementation snapshots signatures while locked, drops the lock for network I/O, then re-locks to apply statuses. Preserve that shape to avoid blocking new waiters and timeout cleanup behind remote RPC latency. + +### Commitment handling + +`status_result_for_commitment` only returns a result when the fetched `TransactionStatus` satisfies the requested commitment. This applies to errors as well as successes: a processed transaction error does not satisfy a confirmed waiter until Solana reports a status that satisfies confirmed commitment. Unit coverage exists for this behavior in `transaction_errors_wait_for_requested_commitment`. + +### Blockhash validity and retries + +`wait_for_processed_status` currently accepts a `_blockhash_valid_timeout` parameter but does not actively wait for blockhash validity through that value. It relies on `SignatureConfirmer` timeout behavior plus cached blockhash metadata for better error text. Do not assume the parameter enforces a separate blockhash-valid wait without changing the implementation and tests. + +### Account batching concurrency + +`get_multiple_accounts_with_config` launches one future per chunk and joins all chunks concurrently. This reduces wall-clock latency for large fetches but can amplify provider load for very large input lists. If changing chunk size or concurrency, consider both provider limits and settlement/task-info latency. + +### Observability contract + +The RPC-client metrics live in `magicblock-metrics` and are exported with the `mbv_` namespace. Metric names and meanings are operator-facing; changing them requires updating `.agents/context/crates/magicblock-metrics.md` and handoff notes. + +## Important invariants + +1. `send_transaction` must keep transaction submission and confirmation semantics explicit: `Send` must not wait for status, and `SendAndConfirm` must not silently skip requested confirmation. +2. The default send path must continue to use the shared `SEND_TRANSACTION_CONFIG` unless callers intentionally opt into a new API; consumers rely on skipped preflight for committor/table-mania delivery behavior. +3. Confirmation waits must respect the requested commitment and must not deliver a cached status that fails `TransactionStatus::satisfies_commitment`. +4. Websocket confirmation must retain batched polling fallback for provider compatibility and transient websocket failures. +5. Polling must remain coalesced and batched; do not replace it with one `getSignatureStatuses` loop per waiter. +6. Do not hold async mutexes across Solana RPC/pubsub network calls. +7. Blockhash and slot caches must be short-lived and monotonic where applicable; `chain_slot` updates must never move the observed slot backward. +8. `get_multiple_accounts*` must preserve input order and output cardinality after chunking. +9. `get_account` must continue to map Solana `AccountNotFound` user errors to `Ok(None)` rather than a hard error. +10. ALT helpers must fail loudly on deserialization errors instead of treating malformed accounts as missing. +11. Retry helpers must preserve caller-owned domain error mapping and must not hide unrecoverable transaction errors behind infinite retries. +12. Public error variants that carry signatures must continue to return them from `MagicBlockRpcClientError::signature()`. + +## Common change areas and what to inspect + +### Changing send/confirm behavior + +Start with: + +- `magicblock-rpc-client/src/lib.rs` (`MagicBlockSendTransactionConfig`, `send_transaction`, `wait_for_processed_status`, `wait_for_confirmed_status`); +- `magicblock-rpc-client/src/signature_confirmer.rs` (`wait_for_status`, websocket and polling paths); +- `magicblock-committor-service/src/intent_executor/intent_execution_client.rs` and `transaction_preparator/delivery_preparator.rs`; +- `magicblock-table-mania/src/lookup_table_rc.rs`. + +Check that committor and table-mania still know whether a transaction was merely sent, processed, or committed, and that status timeouts remain retryable where intended. + +### Changing retry/error mapping + +Start with: + +- `magicblock-rpc-client/src/utils.rs`; +- `magicblock-committor-service/src/intent_executor/error.rs`; +- `magicblock-committor-service/src/intent_executor/intent_execution_client.rs`; +- `magicblock-committor-service/src/transaction_preparator/delivery_preparator.rs`. + +Preserve the distinction between transport/RPC retryability and Solana transaction execution errors. Domain-specific errors should be mapped through `TransactionErrorMapper`; unknown transaction errors should surface with the original signature when available. + +### Changing account or lookup-table reads + +Start with: + +- `get_account`, `get_multiple_accounts*`, `get_lookup_table_meta`, and `get_lookup_table_addresses` in `src/lib.rs`; +- `magicblock-committor-service/src/intent_executor/task_info_fetcher.rs` for `min_context_slot` reads; +- `magicblock-table-mania/src/find_tables.rs`, `manager.rs`, and `lookup_table_rc.rs`. + +Preserve provider chunking limits, order/cardinality, requested commitment/config, and `None` semantics for missing accounts. + +### Changing slot/blockhash caching + +Start with: + +- `BLOCKHASH_CACHE_TTL`, `SLOT_CACHE_TTL`, `BlockhashCache`, and `CachedSlot` in `src/lib.rs`; +- committor construction in `magicblock-committor-service/src/committor_processor.rs`; +- `magicblock-table-mania` call sites that fetch latest blockhashes immediately before signing. + +Do not lengthen cache lifetimes without considering Solana blockhash expiration and commit-delivery retry behavior. + +### Changing metrics + +Start with: + +- `magicblock-rpc-client/src/signature_confirmer.rs` metric calls; +- `magicblock-metrics/src/metrics/mod.rs` collector declarations, registration, and wrappers; +- `.agents/context/crates/magicblock-metrics.md` for metric documentation expectations. + +Keep labels bounded; current RPC-client metrics are counters without labels. + +## Tests and validation + +For documentation-only changes: + +```bash +git diff --check +rg "magicblock-rpc-client.md|magicblock-rpc-client" AGENTS.md .agents/context/crate-map.md .agents/context/crates/magicblock-rpc-client.md +``` + +For code changes in this crate, run targeted checks first: + +```bash +cargo fmt +cargo nextest run -p magicblock-rpc-client +``` + +For confirmation, committor, or lookup-table behavior changes, also run the relevant integration suites when practical: + +```bash +cd test-integration +make test-committor +make test-table-mania +``` + +For broader Rust validation, follow `.agents/rules/testing-and-validation.md`: + +```bash +cargo clippy --workspace --all-targets -- -D warnings +cargo nextest run --workspace +``` + +Performance-sensitive changes should include targeted reasoning or measurement around RPC call counts, confirmation latency, waiter batching, websocket fallback rates, and task spawning. If no performance measurement is practical, report the residual risk explicitly. + +## Related docs + +- `.agents/context/overview.md` for validator runtime context. +- `.agents/specs/validator-specification.md` for base-layer commit/undelegation and RPC/router expectations. +- `.agents/context/architecture.md` for the base-layer settlement layer and service boundaries. +- `.agents/context/crate-map.md` for crate ownership and consumers. +- `.agents/rules/testing-and-validation.md` for required validation workflow. +- `.agents/context/crates/magicblock-metrics.md` for the metrics emitted by `SignatureConfirmer`. +- `.agents/context/crates/magicblock-account-cloner.md`, `.agents/context/crates/magicblock-api.md`, and future committor/table-mania guides for major consumer flows. diff --git a/.agents/context/crates/magicblock-services.md b/.agents/context/crates/magicblock-services.md new file mode 100644 index 000000000..67b244c61 --- /dev/null +++ b/.agents/context/crates/magicblock-services.md @@ -0,0 +1,248 @@ +# `magicblock-services` + +## Purpose + +`magicblock-services` contains small reusable service adapters that run alongside the validator and communicate through existing validator/RPC contracts. Its current responsibility is the action-callback adapter used by the committor pipeline to notify user callback programs about Magic Action results. + +High-level responsibilities: + +- implement `magicblock_core::traits::ActionsCallbackScheduler` for validator-runtime use; +- turn `BaseActionCallback` payloads from committed Magic Actions into local callback transactions; +- wrap callback results in the `magicblock-magic-program-api` `MagicResponse::V1` wire shape; +- submit callback transactions asynchronously through Solana's nonblocking `RpcClient`. + +This crate is settlement-adjacent and can affect Magic Action user experience. It is not the committor itself, does not own intent execution or persistence, and must stay generic enough to be used as a service adapter rather than a protocol owner. + +## Update requirement + +Update this guide in the same change whenever behavior or contracts in `magicblock-services` change. In particular, update it for changes to: + +- exported modules or public constructors in `magicblock-services/src/lib.rs` or `actions_callback_service.rs`; +- the `ActionsCallbackService` transaction layout, signer handling, blockhash source, or callback response encoding; +- how callback signatures, errors, receipts, discriminators, payloads, or account metas are propagated; +- asynchronous send behavior, logging, retry/confirmation semantics, or Tokio runtime assumptions; +- call-site wiring in `magicblock-api` or committor expectations around `ActionsCallbackScheduler`; +- validation commands or integration suites relevant to action callbacks. + +Because callbacks are a cross-crate contract between Magic Program scheduling, committor execution, and user callback programs, also update this file when another crate changes `BaseActionCallback`, `MagicResponse`, `CallbackInstruction`, or `ActionsCallbackScheduler` semantics. + +## Where it sits in the repository + +| Path | Role | +|---|---| +| `magicblock-services/Cargo.toml` | Package metadata and dependencies. Depends on `magicblock-core`, `magicblock-magic-program-api`, Solana transaction/RPC crates, Tokio, and tracing. | +| `magicblock-services/src/lib.rs` | Public module surface. Currently exports only `actions_callback_service`. | +| `magicblock-services/src/actions_callback_service.rs` | Implements `ActionsCallbackService` and `ActionsCallbackScheduler`. Builds and sends callback transactions. | +| `magicblock-api/src/magic_validator.rs` | Runtime wiring. `init_committor_service` creates `ActionsCallbackService` with the validator keypair, local latest-block provider, and an RPC client pointed at the validator aperture HTTP endpoint. | +| `magicblock-core/src/traits.rs` | Defines `ActionsCallbackScheduler`, `ActionResult`, `ActionError`, and `CallbackScheduleError`. | +| `magicblock-core/src/intent.rs` | Defines `BaseActionCallback`, the callback payload this crate consumes. | +| `magicblock-committor-service/src/intent_executor/utils.rs` | Extracts callbacks from action tasks and invokes the scheduler after success, action failure, or timeout. | +| `programs/magicblock/src/schedule_transactions/process_add_action_callback.rs` | Magic Program path that attaches callbacks to scheduled base actions. | + +Main consumers: + +- `magicblock-api`, which constructs the concrete callback service for the validator; +- `magicblock-committor-service`, which is generic over `ActionsCallbackScheduler` and calls it from intent execution; +- callback-capable Magic Action flows scheduled through `programs/magicblock`. + +There are no crate-local tests or README files for `magicblock-services` at the time of writing. Use the committor integration tests when callback behavior changes. + +## Public API shape / Main public types and APIs + +### Module surface + +`src/lib.rs` exposes: + +- `pub mod actions_callback_service`. + +Keep this surface small. New shared services should be added only when they are truly generic validator service adapters and not better owned by API orchestration, committor, RPC-client, or Magic Program crates. + +### `ActionsCallbackService` + +`ActionsCallbackService` stores: + +- `Arc` for callback transaction submission; +- a validator `Keypair` authority used as fee payer and transaction signer; +- `latest_block: L`, where `L: LatestBlockProvider`, used to obtain the recent blockhash for each callback transaction. + +Public constructor: + +```rust +ActionsCallbackService::new(rpc_client, authority, latest_block) +``` + +The service implements `Clone` when `L: Clone`. Cloning uses `authority.insecure_clone()` because the concrete service must satisfy `ActionsCallbackScheduler: Clone + Send + Sync + 'static`. Do not accidentally replace this with shared mutable keypair state or a non-`Clone` service without updating all committor generic bounds and runtime wiring. + +### `ActionsCallbackScheduler` implementation + +`schedule(callbacks, signature, result)` returns one result per input callback: + +- construction/signing failures become `Err(CallbackScheduleError)` at the matching position; +- successfully built transactions return their precomputed callback transaction signature immediately; +- valid transactions are sent later in a spawned Tokio task. + +The return value reports scheduling/build success, not confirmed on-chain callback execution. The spawned task logs send failures but does not retry or update the returned result after the fact. + +## Runtime flows + +### Validator startup wiring + +1. `magicblock-api::MagicValidator::init_committor_service` creates a Solana `RpcClient` pointed at `config.aperture.listen.http()`. +2. It constructs `ActionsCallbackService::new(...)` with the validator keypair and `LatestBlock` handle. +3. The service is passed into `CommittorService::try_start(...)` as the concrete `ActionsCallbackScheduler`. +4. The committor keeps using the trait boundary; it does not depend directly on this crate's concrete type. + +Preserve this separation. `magicblock-services` should not reach back into validator orchestration or committor internals. + +### Callback scheduling and transaction construction + +```text +committor intent executor + -> ActionsCallbackScheduler::schedule(callbacks, base_action_signature, result) + -> build callback transactions with latest local blockhash + -> return callback transaction signatures/errors + -> tokio::spawn sends valid transactions via RpcClient +``` + +For each callback: + +1. `build_transactions` reads `latest_block.blockhash()` once for the batch. +2. It converts `ActionResult` into `Result<(), String>` so the callback program receives a serializable success/error response. +3. It adds a Magic Program `Noop(counter)` instruction before the callback instruction. The static `AtomicU64` counter makes otherwise identical callback transactions unique. +4. It builds the outer `CallbackInstruction::ExecuteCallback` instruction for `CALLBACK_PROGRAM_ID`. +5. It signs a legacy `VersionedTransaction` with the validator authority as payer. + +The current implementation uses `Ordering::Relaxed` for uniqueness only; no ordering or synchronization semantics are implied. + +### Inner callback instruction encoding + +`build_inner_instruction` creates the user-program instruction that the callback program will invoke: + +1. It wraps the outcome in `MagicResponse::V1(MagicResponseV1 { ok, data, error, receipt })`. +2. `data` starts with `callback.discriminator` and appends `bincode::serialize(&response)`. +3. `receipt` is present only when the committor supplied a base-action transaction signature. +4. Account metas are copied from `callback.account_metas_per_program`. +5. Only `CALLBACK_SIGNER` is marked as signer in the inner instruction; all other metas preserve writability but are not made signers here. + +### Outer callback instruction accounts + +`build_callback_instruction` wraps the inner instruction for the callback program. The outer accounts are ordered as: + +1. validator authority, readonly signer; +2. `CALLBACK_SIGNER`, readonly non-signer PDA; +3. destination program ID, readonly non-signer; +4. all inner instruction accounts, with `is_signer` forcibly set to `false` because a PDA cannot sign the outer transaction directly. + +Do not reorder these accounts without checking `magicblock-magic-program-api` and the callback program processor that consumes `CallbackInstruction::ExecuteCallback`. + +## Important internals and caveats + +### Fire-and-forget send semantics + +`schedule` uses `tokio::spawn` and `join_all` over `rpc_client.send_transaction(tx)` for all valid callback transactions. This requires a live Tokio runtime at the call site. The validator and committor run inside Tokio today; if a future caller invokes the scheduler outside a runtime, scheduling will panic. + +Callback sends are not confirmed and are not retried. This keeps the committor callback handoff lightweight, but it means callback delivery is best-effort after transaction construction. If stronger delivery is required, that is a product/architecture change touching committor reporting, persistence, and possibly `magicblock-rpc-client`; do not silently add blocking confirmation in this crate. + +### Local RPC target + +`magicblock-api` currently points this service at the validator's own aperture HTTP endpoint, not directly at the base-layer RPC URL used by the committor. This preserves the local callback execution path. Changing the endpoint changes where callback transactions are executed and must be reviewed as a protocol/architecture change. + +### Wire compatibility + +The callback instruction data combines a program-specific discriminator with a bincode-encoded `MagicResponse`. This is a user-program-facing wire contract. Changes to response versioning, serialization, or account meta treatment must be coordinated with `magicblock-magic-program-api`, Magic Program validation, and downstream callback program expectations. + +### Error reporting boundary + +`CallbackScheduleError` only covers local serialization/signing failures. RPC send failures are logged asynchronously with the callback transaction signature and do not flow back into `IntentExecutionReport` after `schedule` returns. + +## Important invariants + +1. `schedule` must return exactly one `Result` for each input callback, preserving input order. +2. Callback response data must preserve the `discriminator || bincode(MagicResponse::V1)` layout unless a coordinated wire-format migration is implemented. +3. The validator authority must remain the outer transaction payer and signer unless startup/wallet semantics are intentionally changed. +4. `CALLBACK_SIGNER` may be a signer only inside the inner instruction; it must be non-signer in the outer transaction account metas. +5. Inner account meta writability must be propagated from `BaseActionCallback`; this crate should not reinterpret callback account authorization. +6. The `Noop(counter)` uniqueness instruction must keep otherwise duplicate callback transactions from producing identical signatures. +7. Do not add blocking RPC confirmation or retry loops to the committor hot path without an explicit architecture decision and performance review. +8. Keep the crate dependency-light. Generic service adapters should not pull in validator orchestration, persistence, or large protocol owners. + +## Common change areas and what to inspect + +### Changing callback transaction shape + +Start with: + +- `magicblock-services/src/actions_callback_service.rs`; +- `magicblock-magic-program-api` callback instruction and response types; +- callback program processing for `CALLBACK_PROGRAM_ID`; +- `programs/magicblock/src/schedule_transactions/process_add_action_callback.rs`. + +Check account order, signer flags, discriminator/response encoding, and whether user callback programs require compatibility migration. + +### Changing delivery guarantees + +Start with: + +- `ActionsCallbackService::schedule`; +- `magicblock-committor-service/src/intent_executor/utils.rs::handle_actions_result`; +- `IntentExecutionReport` callback report handling; +- `magicblock-rpc-client` send/confirm APIs if confirmation is needed. + +Do not make `schedule` block on network confirmation unless the committor timeout, persistence, and report semantics are updated intentionally. + +### Changing validator wiring + +Start with: + +- `magicblock-api/src/magic_validator.rs::init_committor_service`; +- `magicblock-config` endpoint settings used by `config.aperture.listen.http()` and `config.rpc_url()`; +- `magicblock-committor-service::CommittorService::try_start` generic bounds. + +Be explicit about whether callbacks should be sent to local aperture or base-layer RPC. + +### Adding another shared service adapter + +Start with `magicblock-services/src/lib.rs` and ask whether the new adapter belongs here. Prefer this crate only for small generic services that implement shared traits. Put orchestration in `magicblock-api`, settlement logic in `magicblock-committor-service`, RPC policy in `magicblock-rpc-client`, and protocol wire types in `magicblock-magic-program-api`. + +## Tests and validation + +For documentation-only changes touching this guide: + +```bash +git diff -- .agents/context/crates/magicblock-services.md .agents/context/crate-map.md AGENTS.md +``` + +For Rust changes in `magicblock-services`, run at minimum: + +```bash +cargo fmt +cargo nextest run -p magicblock-services +``` + +Because this crate has no crate-local tests, callback behavior changes should also run the relevant committor tests, especially suites that exercise action callbacks and timeouts: + +```bash +cargo nextest run -p magicblock-committor-service +cd test-integration && make test-committor +``` + +Before handoff, run or justify skipping the broader baseline from `.agents/rules/testing-and-validation.md`: + +```bash +cargo clippy --workspace --all-targets -- -D warnings +cargo nextest run --workspace +``` + +Performance-risk reporting: callback scheduling is settlement-adjacent and currently fire-and-forget. If a change adds synchronous RPC work, retries, confirmation, persistence, extra serialization, or unbounded spawning/logging, report the expected impact on committor throughput and callback latency. + +## Related docs + +- `AGENTS.md` for required agent guidance and documentation stewardship rules. +- `.agents/specs/validator-specification.md` for Magic Actions, commit/undelegation, committor, and callback-related protocol context. +- `.agents/context/architecture.md` for service boundaries and base-layer settlement architecture. +- `.agents/context/crate-map.md` for workspace crate ownership and consumers. +- `.agents/rules/testing-and-validation.md` for baseline validation commands. +- `.agents/context/crates/magicblock-core.md` for `ActionsCallbackScheduler`, `BaseActionCallback`, and related shared trait/type contracts. +- `.agents/context/crates/magicblock-api.md` for validator startup and service wiring. +- `.agents/context/crates/magicblock-magic-program-api.md` for callback instruction and response wire types. +- `.agents/context/crates/magicblock-rpc-client.md` if callback delivery begins using shared send/confirm helpers. diff --git a/.agents/context/crates/magicblock-table-mania.md b/.agents/context/crates/magicblock-table-mania.md new file mode 100644 index 000000000..d84c9d574 --- /dev/null +++ b/.agents/context/crates/magicblock-table-mania.md @@ -0,0 +1,267 @@ +# `magicblock-table-mania` + +## Purpose + +`magicblock-table-mania` manages Solana Address Lookup Tables (ALTs) for MagicBlock's base-layer settlement pipeline. The committor service uses it while preparing commit, undelegation, finalize, action, and buffer-delivery transactions that would otherwise exceed transaction account limits. + +High-level responsibilities: + +- create lookup tables with validator-derived table authorities; +- extend active tables with pubkeys needed by pending base-layer transactions; +- track local reference counts so shared tables are kept alive until all requesters release their pubkeys; +- fetch finalized `AddressLookupTableAccount` values after local create/extend transactions have landed and finalized remotely; +- deactivate and close released tables through an optional background garbage collector; +- expose low-level table helpers for integration tests and table-discovery tooling. + +This crate is on the base-layer settlement preparation path and is performance-sensitive. Changes must avoid unnecessary RPC amplification, long-held async locks, duplicate table creation, or extra finalization waits that slow commit delivery. + +## Update requirement + +Update this guide in the same change whenever behavior or contracts in `magicblock-table-mania` change. In particular, update it for changes to: + +- public exports in `src/lib.rs`, `TableMania`, `LookupTableRc`, `GarbageCollectorConfig`, `TableManiaComputeBudgets`, or `TableManiaError`; +- reservation/release semantics, reference-count behavior, or active/released table lifecycle; +- create/extend/deactivate/close instruction layout, signer requirements, authority derivation, compute budgets, priority fees, or send-confirmation policy; +- local-to-remote readiness waits in `try_get_active_address_lookup_table_accounts`; +- retry/fallback behavior for failed table extension, chain reconciliation, or close salvage; +- `randomize_lookup_table_slot` feature or `RANDOMIZE_LOOKUP_TABLE_SLOT` environment behavior; +- metrics emitted through `magicblock-metrics`; +- committor call-site expectations or integration-test workflows for table preparation. + +Because this crate is a shared settlement dependency, also update this file when another crate changes how ALTs are reserved, fetched, released, or cleaned up. + +## Where it sits in the repository + +| Path | Role | +|---|---| +| `magicblock-table-mania/Cargo.toml` | Package metadata, dependencies on `magicblock-rpc-client` and `magicblock-metrics`, and the `randomize_lookup_table_slot` feature used by parallel integration tests. | +| `magicblock-table-mania/src/lib.rs` | Public crate surface. Re-exports compute-budget types/constants, `find_open_tables`, `LookupTableRc`, `MAX_ENTRIES_AS_PART_OF_EXTEND`, and all manager exports. | +| `magicblock-table-mania/src/manager.rs` | High-level `TableMania` manager, reservation/release flows, local/remote readiness waits, extension fallback/retry policy, and optional garbage collector. | +| `magicblock-table-mania/src/lookup_table_rc.rs` | Ref-counted lookup-table representation, deterministic derived authority, create/extend/deactivate/close transactions, chain reconciliation, and low-level table reads. | +| `magicblock-table-mania/src/compute_budget.rs` | Compute-unit limits and priority fees used for table init, extend, deactivate, and close transactions. | +| `magicblock-table-mania/src/derive_keypair.rs` | Deterministic derived table-authority keypair generation from the validator authority, slot, and sub-slot. | +| `magicblock-table-mania/src/find_tables.rs` | Helper for finding table accounts by recomputing derived authorities and table addresses over a slot/sub-slot range. | +| `magicblock-table-mania/src/error.rs` | `TableManiaError`, `TableManiaResult`, signature extraction, and invalid-instruction-data classification used by extension fallback. | +| `magicblock-committor-service/src/committor_processor.rs` | Constructs one `TableMania` with `GarbageCollectorConfig::default()` for the committor processor. | +| `magicblock-committor-service/src/transaction_preparator/delivery_preparator.rs` | Main runtime consumer. Reserves lookup-table pubkeys, fetches finalized ALT accounts, and releases pubkeys during cleanup. | +| `test-integration/test-table-mania/` | Integration tests for create/extend/deactivate/close, reserve, release, ensure, and table-discovery behavior. | +| `magicblock-metrics/src/metrics/mod.rs` | Defines `table_mania_a_count` and `table_mania_closed_a_count`. | + +Main consumers: + +- `magicblock-committor-service`, especially `DeliveryPreparator`, for ALT preparation before base-layer transaction delivery; +- `magicblock-account-cloner`, indirectly through committor errors that may carry a `TableManiaError` signature for diagnostics; +- `test-integration/test-table-mania` and committor integration tests. + +## Public API shape / Main public types and APIs + +### Crate exports + +`src/lib.rs` exports: + +- `pub mod error`; +- compute-budget constants and types from `compute_budget.rs`; +- `find_open_tables` and `FindOpenTablesOutcome`; +- `LookupTableRc` and `MAX_ENTRIES_AS_PART_OF_EXTEND`; +- manager exports including `TableMania` and `GarbageCollectorConfig`. + +### `TableMania` + +`TableMania` is cheap to clone because it wraps shared state in `Arc` locks. It stores: + +- `active_tables: Arc>>` for tables that may satisfy reservations; +- `released_tables: Arc>>` for tables with no remaining reservations and pending GC; +- one authority pubkey, a `MagicblockRpcClient`, the table-slot randomization setting, and default compute budgets. + +Important methods: + +- `new(rpc_client, authority, garbage_collector_config)` creates the manager and optionally spawns a non-terminating Tokio GC task; +- `reserve_pubkeys(authority, pubkeys)` increments refs for existing keys and creates/extends tables for missing keys; +- `ensure_pubkeys_table(authority, pubkeys)` ensures keys are present, but does not increase refs for keys already present; +- `try_get_active_address_lookup_table_accounts(pubkeys, wait_for_local_table_match, wait_for_remote_table_match)` returns finalized `AddressLookupTableAccount`s for reserved/present keys; +- `release_pubkeys(pubkeys)` decrements refs and moves fully unreserved tables to `released_tables`; +- inspection helpers such as `active_tables_count`, `released_tables_count`, `active_table_addresses`, `released_table_addresses`, `active_table_pubkeys`, and `get_pubkey_refcount`. + +The authority passed to `reserve_pubkeys` and `ensure_pubkeys_table` must match the authority used to construct the `TableMania`; mismatches return `TableManiaError::InvalidAuthority`. + +### `LookupTableRc` + +`LookupTableRc` is the low-level table state enum: + +- `Active` stores the derived table authority keypair, table address, ref-counted pubkeys, creation slot/sub-slot, init signature, extend signatures, and an `extendable` flag; +- `Deactivated` stores the derived authority, table address, deactivation slot, and deactivate signature. + +Important methods include `init`, `extend`, `extend_respecting_capacity`, `reconcile_with_chain`, `deactivate`, `is_deactivated_on_chain`, `close`, `is_closed`, `get_meta`, `get_chain_pubkeys`, and `get_chain_pubkeys_for`. + +`MAX_ENTRIES_AS_PART_OF_EXTEND` is currently `24`, based on transaction-size constraints with compute-budget instructions. Do not increase it without validating transaction fit and compute behavior. + +### Compute budgets and errors + +`TableManiaComputeBudgets::default()` supplies per-operation budgets for init, extend, deactivate, and close. Each table transaction prepends both `set_compute_unit_limit` and `set_compute_unit_price` instructions. `TableManiaError` wraps `MagicBlockRpcClientError` and preserves signatures where possible for diagnostic log/CU lookups. + +## Runtime flows + +### Committor ALT preparation flow + +```text +DeliveryPreparator::prepare_for_delivery + -> prepare_lookup_tables + -> TableMania::reserve_pubkeys + -> TableMania::try_get_active_address_lookup_table_accounts + -> VersionedMessage compilation uses returned AddressLookupTableAccount values + -> DeliveryPreparator::cleanup releases lookup-table pubkeys +``` + +`DeliveryPreparator` currently waits up to 50 seconds for local table matches and up to 50 seconds for finalized remote table matches. The remote wait is intentional: lookup table create/extend transactions must be finalized before the resulting ALT accounts are usable in base-layer transactions. + +### Reservation and table creation/extension + +1. `reserve_pubkeys` first tries to reserve keys already present in active tables. +2. Missing keys go through `reserve_new_pubkeys` under manager-level active-table locks. +3. The manager tries to extend the last active, not-full table before creating a new table. +4. Each create/extend writes at most `MAX_ENTRIES_AS_PART_OF_EXTEND` keys per transaction. +5. Successful low-level table operations insert local keys and set initial refcounts. +6. If an existing table extension fails, the manager reconciles local state with chain and either retries, marks the table non-extendable and creates a new table for invalid-instruction-data failures, or returns the error after the retry budget is exhausted. + +Locking is part of correctness here. The write lock on `active_tables` prevents concurrent tasks from creating/extending the same table in conflicting ways. Avoid awaiting under locks unless the lock is deliberately serializing table mutation. + +### Fetching finalized lookup table accounts + +1. `try_get_active_address_lookup_table_accounts` waits until every requested key is present in local active tables. +2. For each matching table, it records the latest local update send time for the matched keys. +3. Before the first finalized remote fetch, it delays based on an estimated finalization depth: 32 slots at 400 ms plus a 200 ms buffer. +4. It polls `get_multiple_accounts_with_commitment(..., CommitmentConfig::finalized(), ...)` until every matched table exists remotely and contains every locally matched key. +5. It returns `AddressLookupTableAccount` values populated from the finalized remote table contents. + +Do not relax finalized commitment in this flow unless the consuming transaction path is explicitly changed and tested; non-finalized ALT updates may not be usable by later base-layer transactions. + +### Release and garbage collection + +1. `release_pubkeys` decrements local refcounts for all released keys. +2. While holding the active-table write lock, it drains active tables and moves tables with no remaining reservations to `released_tables`. +3. If configured, the background GC periodically calls `deactivate_tables` and `close_tables`. +4. Deactivation sends a table-deactivate transaction and converts the `LookupTableRc` to `Deactivated` with the current slot. +5. Closing first checks the Solana deactivation window using `MAX_ENTRIES` slot hashes, then sends a close transaction and verifies the account no longer exists. +6. Close errors are retried on later GC cycles; if a close error indicates a prior close may have landed, `is_closed` salvages the state so the table can be removed from `released_tables`. + +The GC task logs errors instead of bubbling them because there is no request context to fail. Future edits must preserve retry-on-next-cycle behavior or replace it with an explicit operational recovery path. + +## Important internals and caveats + +### Derived authorities and table addresses + +Table authority keypairs are deterministically derived from the validator authority, slot, and sub-slot by signing the seed material and hashing the signature. This lets `find_open_tables` recompute candidate table addresses later. `create_new_table_and_extend` increments a process-wide `SUB_SLOT` when multiple tables are created in the same slot. + +The `randomize_lookup_table_slot` feature, or `RANDOMIZE_LOOKUP_TABLE_SLOT` without the feature, randomizes the sub-slot source to avoid table-address collisions in parallel tests. Integration test crates enable the feature. + +### Local state may be ahead of finalized chain state + +A table create/extend can be locally recorded immediately after `send_transaction` returns successfully, but the finalized table account may lag. `latest_update_sent_at` and the remote-readiness wait are designed to avoid fetching too early. Removing this delay can cause transaction compilation/delivery failures that are difficult to reproduce. + +### Refcounts and `ensure_pubkeys_table` + +`reserve_pubkeys` represents a checkout and increments refs for existing keys. `ensure_pubkeys_table` is intended for existence checks: it does not increase refs for already-present keys, but newly inserted keys are created through the same low-level path and therefore start with a refcount of 1 in the current implementation. Preserve the tested behavior unless changing the public contract and tests together. + +### Metrics + +The crate currently increments: + +- `table_mania_a_count` before finalized `getMultipleAccounts` polling in remote table readiness; +- `table_mania_closed_a_count` before `get_account` checks used to verify table closure. + +Metric names and cardinality are operator-facing. Prefer adding new metrics in `magicblock-metrics` rather than ad-hoc instrumentation. + +## Important invariants + +1. A `TableMania` instance must use exactly one authority. Mutating methods must reject a different authority. +2. Lookup table create/extend/deactivate/close transactions must be signed by the validator authority and the deterministic derived table authority where required by the ALT program. +3. No create or extend transaction may include more than `MAX_ENTRIES_AS_PART_OF_EXTEND` pubkeys. +4. A deactivated table must never be extended. +5. Refcounts must not underflow; releasing an unreserved key must be a no-op. +6. Tables may move from active to released only when all contained pubkeys have zero reservations. +7. Returned `AddressLookupTableAccount`s must reflect finalized remote table contents containing all requested keys. +8. Extension failure handling must preserve local/remote consistency by reconciling chain state before retrying or falling back to a new table. +9. GC must tolerate transient RPC or chain errors and retry deactivate/close work later. +10. Changes must avoid unnecessary RPC calls and lock contention on the settlement preparation path. + +## Common change areas and what to inspect + +### Changing reservation, release, or concurrency behavior + +Start with `magicblock-table-mania/src/manager.rs` (`reserve_pubkeys`, `reserve_new_pubkeys`, `extend_table`, `release_pubkeys`) and `src/lookup_table_rc.rs` (`RefcountedPubkeys`, `reserve_pubkey`, `release_pubkey`, `has_reservations`). Then inspect `test-integration/test-table-mania/tests/ix_reserve_pubkeys.rs`, `ix_release_pubkeys.rs`, and `ix_ensure_pubkey_table.rs`. + +Check for duplicate table creation, missing releases, refcount changes for overlapping requests, and awaits while holding locks. + +### Changing ALT transaction construction + +Inspect `src/lookup_table_rc.rs` methods `init`, `extend`, `deactivate`, and `close`, plus `src/compute_budget.rs`. Also inspect `magicblock-rpc-client` send-confirmation behavior because `LookupTableRc::get_send_transaction_config` uses processed confirmation for processed RPC clients and committed confirmation for confirmed/finalized clients. + +Validate transaction fit, signer lists, instruction indexes, compute unit limits, priority fees, and error classification. The invalid-instruction-data fallback assumes the extend instruction is at index `2` in existing-table extend transactions. + +### Changing remote readiness or timeouts + +Inspect `try_get_active_address_lookup_table_accounts`, `remote_table_finalization_delay`, and `DeliveryPreparator::prepare_lookup_tables`. Preserve finalized commitment unless intentionally changing the base-layer transaction delivery contract. + +### Changing table cleanup + +Inspect `release_pubkeys`, `launch_garbage_collector`, `deactivate_tables`, `close_tables`, and `LookupTableRc::close`. If changing close behavior, run or reason about the long `test_table_close` feature path; table deactivation can take minutes. + +### Changing derived-address behavior or table discovery + +Inspect `src/derive_keypair.rs`, `LookupTableRc::derive_keypair`, `create_new_table_and_extend`, and `src/find_tables.rs`. Any change affects ability to discover tables by slot/sub-slot and may strand existing tables. + +## Tests and validation + +For documentation-only changes, verify the guide path and cross-references are correct: + +```bash +test -f .agents/context/crates/magicblock-table-mania.md +rg "magicblock-table-mania.md" AGENTS.md .agents/context/crate-map.md +``` + +For crate code changes, run targeted unit tests first: + +```bash +cargo fmt +cargo nextest run -p magicblock-table-mania +``` + +For behavior touching ALTs on a validator, run the integration suite: + +```bash +cd test-integration +make test-table-mania +``` + +For close/deactivation changes, also consider the long feature-gated close path: + +```bash +cd test-integration +cargo test -p test-table-mania --features test_table_close -- --test-threads=1 --nocapture +``` + +Because this crate is on the committor settlement path, changes that affect reservation, remote readiness, compute budgets, or RPC behavior should also run relevant committor preparation/delivery tests, for example: + +```bash +cd test-integration +make test-committor-preparators +``` + +Before handing off Rust behavior changes, run the broader baseline from `.agents/rules/testing-and-validation.md` when time allows: + +```bash +cargo clippy --workspace --all-targets -- -D warnings +cargo nextest run --workspace +``` + +Always report whether performance-sensitive settlement preparation behavior was measured or only reasoned about. + +## Related docs + +- `.agents/specs/validator-specification.md` for base-layer settlement, committor, and address lookup table context. +- `.agents/context/architecture.md` for the base-layer settlement crate boundary. +- `.agents/context/crate-map.md` for crate ownership and consumers. +- `.agents/rules/testing-and-validation.md` for workspace and integration validation expectations. +- `.agents/context/crates/magicblock-rpc-client.md` for the RPC wrapper used by all table transactions and remote reads. +- `test-integration/test-table-mania/` for integration coverage of table lifecycle and reservation behavior. +- `magicblock-committor-service/src/transaction_preparator/delivery_preparator.rs` for the main runtime call site. diff --git a/.agents/context/crates/magicblock-task-scheduler.md b/.agents/context/crates/magicblock-task-scheduler.md new file mode 100644 index 000000000..86bf5e395 --- /dev/null +++ b/.agents/context/crates/magicblock-task-scheduler.md @@ -0,0 +1,300 @@ +# `magicblock-task-scheduler` + +## Purpose + +`magicblock-task-scheduler` is the validator-side service that turns Magic Program scheduled-task requests into recurring local crank transactions. Programs schedule or cancel tasks during normal ER execution; the processor forwards those `TaskRequest`s to this crate, which persists them in SQLite, delays them until their next execution time, and submits validator-signed crank transactions back through the validator RPC endpoint. + +High-level responsibilities: + +- persist scheduled task definitions and failure records in `task_scheduler.sqlite`; +- load and reschedule persisted tasks when the primary validator starts; +- receive `ScheduleTaskRequest` and `CancelTaskRequest` values from the transaction executor channel; +- maintain an in-memory `DelayQueue` for due tasks, including replacement/cancellation state; +- submit crank transactions that call Magic Program `ExecuteTask` with validator authority and crank signer accounts; +- record execution success, final completion, retryable failures, permanent failures, and failed scheduling records; +- periodically clean up old failed scheduling/execution records according to validator config. + +This crate sits on the scheduled-task/crank path and can affect transaction execution latency indirectly by how quickly it drains executor-produced task requests and how much RPC/SQLite work it performs. It is also persistence-sensitive: task definitions and failure records survive restart unless `task_scheduler.reset` is enabled. + +## Update requirement + +Update this guide in the same change whenever behavior or contracts in `magicblock-task-scheduler` change. In particular, update it for changes to: + +- public exports in `src/lib.rs`, `SchedulerDatabase`, `TaskSchedulerService`, or `TaskSchedulerError`; +- SQLite schema, database path, WAL/PRAGMA settings, retention behavior, optimistic `updated_at` concurrency, or restart recovery semantics; +- scheduling/cancellation authorization, interval clamping, iteration handling, retry/backoff policy, or stale completion handling; +- crank transaction layout, signer requirements, Magic Program instruction construction, blockhash source, send configuration, or RPC endpoint selection; +- startup/shutdown wiring in `magicblock-api`, primary/replica gating, cancellation handling, or draining of in-flight crank completions; +- `TaskSchedulerConfig` fields/defaults/env keys or README configuration examples; +- task scheduler unit tests, integration-test setup, or validation commands. + +Because this crate consumes task requests emitted by Magic Program execution, also update this file when `magicblock-program`, `magicblock-magic-program-api`, `magicblock-core`, or `magicblock-processor` changes `ScheduleTask`, `CancelTask`, `ExecuteTask`, `TaskRequest`, `ExecutionTlsStash`, or the scheduled-task channel semantics. + +## Where it sits in the repository + +| Path | Role | +|---|---| +| `magicblock-task-scheduler/Cargo.toml` | Package metadata and dependencies on config, core channels, ledger latest blockhash, Magic Program instruction helpers, Solana RPC/transaction crates, Tokio, and SQLite. | +| `magicblock-task-scheduler/README.md` | Operator-facing overview, `[task-scheduler]` config example, and performance notes. Keep it aligned with config and service behavior. | +| `magicblock-task-scheduler/src/lib.rs` | Public crate surface. Re-exports `SchedulerDatabase`, `TaskSchedulerError`, and `TaskSchedulerService`. | +| `magicblock-task-scheduler/src/db.rs` | SQLite persistence layer, task/failure record types, task serialization, optimistic concurrency tokens, and batch crank-completion transaction. | +| `magicblock-task-scheduler/src/service.rs` | Runtime service, startup recovery, request processing, delay queue, crank send batching, retry/backoff, cleanup ticker, and cancellation handling. | +| `magicblock-task-scheduler/src/errors.rs` | `TaskSchedulerError` and `TaskSchedulerResult`. Wraps SQLite, bincode, RPC, I/O, invalid config, and unauthorized replacement errors. | +| `magicblock-config/src/config/scheduler.rs` | `TaskSchedulerConfig` (`reset`, `min_interval`, failed-record retention and cleanup interval). | +| `magicblock-api/src/magic_validator.rs` | Constructs the service at startup and starts it only after the validator leaves `StartingUp` in primary mode. | +| `magicblock-core/src/link/transactions.rs` | Defines `ScheduledTasksTx`/`ScheduledTasksRx`, the channel carrying `TaskRequest`s from executor to service. | +| `magicblock-processor/src/executor/processing.rs` | Drains `ExecutionTlsStash` after transaction execution and sends scheduled-task requests to this crate. | +| `programs/magicblock/src/schedule_task/` | Magic Program processors that validate schedule/cancel/execute-task instructions and register `TaskRequest`s. | +| `test-integration/test-task-scheduler/` | Integration tests for scheduling, cancellation, rescheduling, signing, schedule errors, unauthorized reschedule, crank signer use, and scheduled-commit interaction. | + +Main consumers: + +- `magicblock-api`, which owns construction/startup and passes the local aperture HTTP URL as the RPC endpoint; +- `magicblock-processor`, which sends task requests after successful instruction execution through `ScheduledTasksTx`; +- Magic Program task instructions, which define the wire/request semantics this crate persists and executes; +- task scheduler integration tests, which inspect the SQLite database directly through `SchedulerDatabase`. + +## Public API shape / Main public types and APIs + +### Crate exports + +`src/lib.rs` exposes: + +- `pub mod db`, `pub mod errors`, and `pub mod service`; +- `pub use db::SchedulerDatabase`; +- `pub use errors::TaskSchedulerError`; +- `pub use service::TaskSchedulerService`. + +### `SchedulerDatabase` + +`SchedulerDatabase` wraps a single `rusqlite::Connection` in `Arc>`. It is cloneable, but all DB operations serialize through that mutex. + +Important API: + +- `SchedulerDatabase::path(path)` returns `path.join("task_scheduler.sqlite")`; +- `new(path)` opens SQLite, enables WAL, `synchronous=NORMAL`, `busy_timeout=5000`, and a larger page cache, then creates `tasks`, `failed_scheduling`, and `failed_tasks` tables if missing; +- `insert_task`, `get_task`, `get_tasks`, `get_task_ids`, `remove_task`, and `unschedule_task` manage scheduled task rows; +- `insert_failed_scheduling`, `insert_failed_task`, `get_failed_schedulings`, and `get_failed_tasks` manage diagnostic failure records; +- `apply_crank_batch_completion(...)` atomically applies one batch of success updates, success removals, failed moves, and retry checks using optimistic `tasks.updated_at` tokens; +- `delete_failed_records_older_than(cutoff)` removes old rows from both failure tables in one transaction. + +`DbTask` is the persisted runtime task shape. It stores task IDs and timestamps as `i64`, serializes `Vec` with `bincode`, stores authority as a stringified `Pubkey`, and uses `executions_left`, `last_execution_millis`, and `updated_at` to drive future scheduling. + +### `TaskSchedulerService` + +`TaskSchedulerService::new(path, config, rpc_url, scheduled_tasks, block, slot_interval, token)` creates the service. It may remove the DB file first when `config.reset` is true, validates that `config.min_interval` fits in `u32::MAX` milliseconds, opens the database, and constructs a nonblocking Solana `RpcClient` pointed at `rpc_url`. + +`start(self)` loads persisted tasks and spawns the main Tokio task, returning `JoinHandle>`. + +Internally the service owns: + +- the SQLite database; +- a `ScheduledTasksRx` channel from the processor; +- a `LatestBlock` handle for crank transaction blockhashes; +- a `DelayQueue` plus `task_queue_keys` for cancellation/replacement; +- `task_versions` keyed by task ID to reject stale in-flight completions; +- retry counters and exponential backoff state; +- a cancellation token and failed-record cleanup interval; +- an `AtomicU64` counter used to create unique noop instructions in crank transactions. + +The type has manual `unsafe impl Send`/`Sync` with an explicit safety comment: the service is moved into one Tokio task by `start()` and is not cloned. Do not make it shared/mutated from multiple tasks without revisiting this assumption. + +### Errors + +`TaskSchedulerError` wraps invalid configuration, SQLite, bincode, Solana RPC client errors, I/O, unauthorized task replacement, and a currently unused `SizeMismatch` variant. Schedule/cancel processing errors are normally recorded as failed scheduling and treated as recoverable; service-level failures returned from the main loop cause `magicblock-api` to log and exit the process. + +## Runtime flows + +### Startup and primary-mode gating + +```text +MagicValidator::try_new + -> SchedulerDatabase::path(storage parent) + -> TaskSchedulerService::new(..., aperture HTTP URL, ScheduledTasksRx, LatestBlock, slot interval) +MagicValidator::start + -> wait until CoordinationMode != StartingUp + -> if Primary: tokio::spawn(task_scheduler.start()) + -> if Replica: do not start task scheduler +``` + +On `start()`, `load_persisted_tasks` reads all rows from `tasks`, removes invalid rows (`execution_interval_millis <= 0`, `>= u32::MAX`, or `executions_left <= 0`), and inserts valid rows into the delay queue. Restarted tasks are delayed until the later of their next scheduled time and two slot intervals. That two-slot minimum avoids cranking before the validator has produced a fresh blockhash after restart. + +### Schedule request flow + +```text +Magic Program ScheduleTask instruction + -> ExecutionTlsStash::register_task(TaskRequest::Schedule) + -> executor process_scheduled_tasks sends over ScheduledTasksTx + -> TaskSchedulerService::process_schedule_request + -> SQLite upsert + DelayQueue insert +``` + +Processing details: + +1. Invalid intervals are ignored by the service. The Magic Program also validates intervals, but the service keeps this guard for persisted/channel inputs. +2. Valid intervals are clamped to at least `config.min_interval` and at most `u32::MAX` milliseconds. +3. If the task ID already exists, only the same authority may replace it; a different authority records a failed scheduling row and leaves the original task intact. +4. `insert_task` writes a monotonic `updated_at` token, replacing any existing row. +5. The service removes any queued old instance, clears retry state, records the new version, and inserts the replacement task with zero delay so it can run immediately. + +### Cancel request flow + +```text +Magic Program CancelTask instruction + -> ExecutionTlsStash::register_task(TaskRequest::Cancel) + -> executor sends over ScheduledTasksTx + -> TaskSchedulerService::process_cancel_request + -> remove runtime state and SQLite row when authority matches +``` + +If the task is missing, runtime queue/retry state is cleaned and the request succeeds. If the authority does not match the persisted task authority, the service logs and returns success without removing the task. This mirrors the service's defensive behavior; signer validation happens in the Magic Program. + +### Crank execution flow + +1. The main loop waits for `DelayQueue` expirations. +2. When one task expires, it drains all currently expired tasks in the same tick into one batch. +3. The service spawns a Tokio task to send the batch so the main loop can continue receiving schedule/cancel requests and cleanup ticks. +4. `send_crank_batch` reads the latest blockhash, then uses a `JoinSet` to send one transaction per task concurrently. +5. Each transaction includes a Magic Program noop instruction with a unique counter and an `execute_task_instruction(task.authority, task.instructions.clone())` instruction, signed by `validator_authority()` with `validator_authority_id()` as payer. +6. The batch result is sent back over an internal unbounded channel. +7. `on_crank_batch_completed` prepares success/failure DB mutations, applies them through `apply_crank_batch_completion`, then updates the delay queue only for rows whose optimistic `updated_at` token still matches. + +A successful first execution anchors `last_execution_millis` at completion time; recurring executions preserve fixed-rate cadence by adding the interval to the previous `last_execution_millis`. Overdue recurring executions are requeued with zero delay. + +### Failure, retry, and stale completion flow + +- Only `TaskSchedulerError::Rpc(_)` is retryable for crank execution. +- Retryable failures use exponential backoff based on `max(slot_interval, 100ms)`, capped at 5 seconds, for at most 10 retries. +- Non-retryable failures and exhausted retries delete the task from `tasks` and insert a `failed_tasks` row. +- Stale in-flight completions are ignored using both SQLite `updated_at` checks and in-memory `task_versions`. This protects task replacement/cancellation that races with an already spawned crank send. +- On cancellation-token shutdown, the service breaks the select loop, drops the internal sender, and drains completed crank batches still present on the internal receiver before returning. + +### Failed-record cleanup flow + +The service creates a Tokio interval from `failed_task_cleanup_interval.max(1ms)` with `MissedTickBehavior::Delay`. On each tick it computes `now - failed_task_retention` and deletes older rows from both `failed_scheduling` and `failed_tasks` in a single transaction. Cleanup failures are logged and do not stop the service. + +## Important internals and caveats + +### SQLite persistence and optimistic concurrency + +The task row's `updated_at` is a version token as well as a timestamp. `insert_task` ensures replacement tokens are monotonic even when the system clock does not advance. Batch completion updates/deletes include `WHERE id = ? AND updated_at = ?`; if a row changed during an in-flight crank send, completion maps omit that task and runtime state is left untouched. + +Do not replace batch completion with per-task commits without considering throughput and race behavior. The current one-transaction batch is intentional. + +### Unbounded channels and concurrent sends + +The service uses the processor's unbounded scheduled-task channel and an internal unbounded crank-completion channel. Crank sends inside a batch are parallelized with a `JoinSet` and are currently not explicitly bounded beyond the number of tasks that expire together. Heavy scheduled-task workloads can therefore amplify RPC sends; preserve or improve this behavior carefully and report performance risk when changing it. + +### Validator authority and crank signer assumptions + +Crank transactions are built with `validator_authority()` and include Magic Program `ExecuteTask` instruction helpers that derive the required crank signer PDA from task authority. The Magic Program verifies validator/crank signer constraints. Do not change signer layout or payer selection in this crate without checking `programs/magicblock/src/schedule_task/process_execute_task.rs` and integration tests such as `test_use_crank_signer.rs`. + +### Primary-only execution + +`magicblock-api` starts the task scheduler only in primary mode. Replica behavior must remain intentional: replicas should not independently crank scheduled tasks unless the validator lifecycle/coordination model is explicitly changed. + +## Important invariants + +1. Persisted task rows must remain recoverable across restart unless `task_scheduler.reset` removes the database file. +2. Invalid or completed persisted tasks must be removed on startup, not requeued forever. +3. A task ID may be replaced only by the same authority; unauthorized replacement must not mutate the existing task. +4. Cancel requests must remove a task only when the persisted authority matches the cancel authority. +5. `execution_interval_millis` must remain in the valid Magic Program/service range and must be clamped to `config.min_interval` for runtime scheduling. +6. `updated_at` tokens must be preserved on queued/in-flight `DbTask`s and checked before applying completion state. +7. Stale crank completions must not mutate a replacement or resurrect a cancelled task. +8. Successful recurring tasks must decrement `executions_left`, update `last_execution_millis`, and preserve fixed-rate cadence. +9. Final successful executions, unretryable failures, and exhausted retries must remove the active task row. +10. Retryable RPC failures must use bounded backoff and must not busy-loop the delay queue. +11. Crank transactions must use a fresh/latest blockhash and validator authority signer from the local validator context. +12. Shutdown must drain already completed crank batch results from the internal channel before returning. +13. Changes must avoid unnecessary SQLite transactions, long-held mutexes, unbounded logging, and RPC amplification on the scheduled-task path. + +## Common change areas and what to inspect + +### Changing schedule/cancel semantics + +Start with `magicblock-task-scheduler/src/service.rs` (`process_request`, `process_schedule_request`, `process_cancel_request`) and `src/db.rs` (`insert_task`, `remove_task`, `get_task`). Then inspect `programs/magicblock/src/schedule_task/process_schedule_task.rs`, `process_cancel_task.rs`, and `magicblock-magic-program-api/src/args.rs`. + +Validate authority checks, invalid intervals, iterations, task replacement, cancellation races, and failure-record behavior. Integration tests to inspect include `test_schedule_task.rs`, `test_reschedule_task.rs`, `test_cancel_ongoing_task.rs`, `test_schedule_error.rs`, and `test_unauthorized_reschedule.rs`. + +### Changing crank transaction construction or send behavior + +Start with `send_crank_batch`, `on_crank_batch_completed`, and `programs/magicblock/src/utils/instruction_utils.rs` for `execute_task_instruction`. Inspect Magic Program execute-task validation in `programs/magicblock/src/schedule_task/process_execute_task.rs`. + +Check validator authority, crank signer PDA, noop uniqueness, blockhash source, payer, transaction signing, RPC endpoint, retry classification, and concurrency. Run or inspect `test_schedule_magic_cpi_crank.rs`, `test_schedule_task_signed.rs`, and `test_use_crank_signer.rs`. + +### Changing persistence, recovery, or schema + +Start with `magicblock-task-scheduler/src/db.rs` and `load_persisted_tasks` in `service.rs`. Also inspect `magicblock-api/src/magic_validator.rs` for the database path and `magicblock-config/src/config/scheduler.rs` for reset/retention config. + +Schema changes need migration/recovery thought; the current code only creates missing tables and does not run versioned migrations. Preserve bincode compatibility for stored `Vec` or add an explicit migration/compatibility path. + +### Changing retry/backoff or cleanup + +Inspect constants in `service.rs` (`MAX_TASK_EXECUTION_RETRIES`, `TASK_EXECUTION_RETRY_BASE_DELAY`, `TASK_EXECUTION_RETRY_MAX_DELAY`), `is_retryable_task_execution_error`, `task_execution_retry_delay`, `prepare_crank_failure_outcome`, `apply_crank_failure_outcome`, and the failed-record cleanup select branch. + +Check that transient RPC failures do not delete tasks too eagerly, permanent errors do not retry forever, and cleanup cannot stop the scheduler. + +### Changing startup/shutdown or mode behavior + +Inspect `TaskSchedulerService::start`, `run`, and `magicblock-api/src/magic_validator.rs` around task scheduler initialization and primary-mode gating. Preserve cancellation-token handling and the behavior that scheduler startup failures cause the validator process to exit rather than silently running without task cranking. + +## Tests and validation + +For documentation-only changes, verify paths and cross-references: + +```bash +test -f .agents/context/crates/magicblock-task-scheduler.md +grep -n "magicblock-task-scheduler.md" .agents/context/crate-map.md AGENTS.md +``` + +For code changes in this crate, run targeted unit tests first: + +```bash +cargo fmt +cargo nextest run -p magicblock-task-scheduler +``` + +For config changes, also run: + +```bash +cargo nextest run -p magicblock-config task_scheduler +``` + +For runtime behavior changes, run the integration suite that starts validators: + +```bash +cd test-integration +make test-task-scheduler +``` + +For isolated debugging, use the workflow in `.agents/rules/testing-and-validation.md`: + +```bash +cd test-integration +make setup-task-scheduler-devnet +# in another terminal, run a focused cargo nextest command in test-task-scheduler +``` + +Before handing off Rust changes, run the broader baseline when practical: + +```bash +cargo clippy --workspace --all-targets -- -D warnings +cargo nextest run --workspace +``` + +Performance-sensitive changes should report whether concurrency, SQLite transaction count, lock hold time, RPC send volume, or scheduler startup/recovery latency was measured or only reasoned about. + +## Related docs + +- `AGENTS.md` for repository-wide agent rules and the requirement to keep `.agents/` current. +- `.agents/specs/validator-specification.md` for Magic Program scheduled tasks in the broader validator model and startup/shutdown expectations. +- `.agents/context/architecture.md` for background service boundaries and validator startup flow. +- `.agents/context/crate-map.md` for crate ownership and dependency discovery. +- `.agents/rules/testing-and-validation.md` for task scheduler integration commands and validation reporting. +- `.agents/context/crates/magicblock-api.md` for validator orchestration/startup responsibilities. +- `.agents/context/crates/magicblock-config.md` for config loading and env/TOML behavior. +- `.agents/context/crates/magicblock-core.md` for shared channels and `ExecutionTlsStash`/link responsibilities. +- `.agents/context/crates/magicblock-magic-program-api.md` for task request and Magic Program instruction wire types. +- `magicblock-task-scheduler/README.md` for operator-facing scheduler configuration and performance notes. +- `test-integration/test-task-scheduler/` for end-to-end scheduled-task behavior. diff --git a/.agents/context/crates/magicblock-validator-admin.md b/.agents/context/crates/magicblock-validator-admin.md new file mode 100644 index 000000000..a12130199 --- /dev/null +++ b/.agents/context/crates/magicblock-validator-admin.md @@ -0,0 +1,245 @@ +# `magicblock-validator-admin` + +## Purpose + +`magicblock-validator-admin` contains small operator/admin helpers that the validator uses to manage base-layer validator state. Its current implemented responsibility is claiming accrued Delegation Program validator fees from the validator fees vault. + +High-level responsibilities: + +- expose `claim_fees(url)` for one-shot validator fee claiming against a base-layer RPC endpoint; +- expose `ClaimFeesTask` for periodic fee-claim attempts owned by `magicblock-api::magic_validator::MagicValidator`; +- construct and sign the Delegation Program `validator_claim_fees` instruction using the validator authority from `magicblock-program`; +- avoid submitting fee-claim transactions when the fees vault balance is below the local minimum threshold; +- provide cooperative startup/shutdown behavior for the periodic Tokio task. + +This crate sits on validator startup, background administration, and base-layer transaction paths. It is not part of per-transaction ER execution, but changes can affect operator cost collection, base-layer RPC load, startup latency, and graceful shutdown. + +## Update requirement + +Update this guide in the same change whenever behavior or contracts in `magicblock-validator-admin` change. In particular, update it for changes to: + +- public exports in `src/lib.rs` or `src/claim_fees.rs`; +- `ClaimFeesTask` lifecycle, cancellation, tick scheduling, duplicate-start behavior, or shutdown timeout; +- `claim_fees` RPC commitment, fee-vault derivation, threshold, signer, payer, instruction construction, or error mapping; +- use of `magicblock-program::validator::validator_authority()` or Delegation Program APIs; +- startup/shutdown wiring in `magicblock-api`, especially `chain_operation.claim_fees_frequency` gating; +- configuration docs or defaults that affect fee claiming; +- tests, integration setup, or validation commands for validator fee claiming. + +Because this crate sends operator/admin transactions to the base layer, also update this file when another crate changes validator authority initialization, Delegation Program fee-vault semantics, or how `MagicValidator` starts/stops admin background work. + +## Where it sits in the repository + +| Path | Role | +|---|---| +| `magicblock-validator-admin/Cargo.toml` | Package metadata and dependencies on Delegation Program APIs, Magic Program validator authority helpers, Solana RPC/transaction crates, Tokio, cancellation tokens, and `magicblock-rpc-client` error types. | +| `magicblock-validator-admin/src/lib.rs` | Public crate surface. Currently exports only `pub mod claim_fees`. | +| `magicblock-validator-admin/src/claim_fees.rs` | Fee-claim implementation, `ClaimFeesTask`, periodic loop, minimum claim threshold, one-shot RPC transaction construction, and error mapping. | +| `magicblock-api/src/magic_validator.rs` | Main consumer. Owns `ClaimFeesTask`, calls one-shot `claim_fees` during on-chain setup, starts periodic claims for standalone validators, and stops the task during shutdown. | +| `magicblock-config/src/config/chain.rs` | Defines `ChainOperationConfig::claim_fees_frequency`; zero disables fee claiming and the default is 24 hours. | +| `config.example.toml` | Operator-facing `[chain-operation] claim-fees-frequency` example and environment variable name. | +| `test-integration/test-magicblock-api/tests/test_claim_fees.rs` | Integration coverage for instruction creation, `ClaimFeesTask` construction/defaults, fee-vault funding, direct fee-claim transaction, and RPC connectivity. | + +Main consumers: + +- `magicblock-api`, which owns runtime integration and lifecycle ordering; +- integration tests under `test-integration/test-magicblock-api`; +- future operator/admin code that needs validator-management helper transactions. + +Important upstream dependencies: + +- `magicblock-delegation-program-api` (imported as `dlp_api`) for `validator_claim_fees` and validator fees vault PDA derivation; +- `magicblock-program` for the validator authority keypair used to sign fee-claim transactions; +- `solana_rpc_client::nonblocking::rpc_client::RpcClient` for balance, blockhash, and send/confirm calls; +- `magicblock-rpc-client` for the shared `MagicBlockRpcClientError` type used by this crate's public result. + +## Public API shape / Main public types and APIs + +### Crate exports + +`src/lib.rs` exposes: + +- `pub mod claim_fees`. + +All public APIs currently live under `magicblock_validator_admin::claim_fees`. + +### `ClaimFeesTask` + +`ClaimFeesTask` is a small Tokio task handle plus cancellation token: + +- `ClaimFeesTask::new()` and `Default` create an idle task with `handle: None`; +- `start(tick_period, url)` spawns `run_claim_fees_loop` unless the task has already been started; +- `stop().await` cancels the token, waits up to two seconds for the JoinHandle, and logs if the task does not stop within the grace period; +- `handle` is public and currently used by tests to verify idle construction; the cancellation token is private. + +`start` schedules the first claim for `Instant::now() + tick_period`, not immediately. `MagicValidator` separately performs a one-shot startup claim during on-chain setup when fee claiming is enabled. + +### `claim_fees` + +`claim_fees(url: String) -> Result<(), MagicBlockRpcClientError>` performs one fee-claim attempt: + +1. Creates a Solana nonblocking `RpcClient` with `CommitmentConfig::confirmed()`. +2. Loads the validator authority keypair via `magicblock_program::validator::validator_authority()`. +3. Derives the validator fees vault PDA with `dlp_api::pda::validator_fees_vault_pda_from_validator`. +4. Reads the vault balance. +5. Returns `Ok(())` without sending a transaction if the balance is `<= MIN_CLAIMABLE_LAMPORTS` (`100_000_000`). +6. Builds `dlp_api::instruction_builder::validator_claim_fees(validator, None)`. +7. Fetches a latest blockhash. +8. Signs a transaction with the validator as payer and signer. +9. Sends and confirms the transaction. + +Error mapping is intentionally narrow and currently wraps Solana RPC failures as `RpcClientError`, `GetLatestBlockhash`, or `SendTransaction` variants from `magicblock-rpc-client`. + +## Runtime flows + +### Startup one-shot claim + +```text +MagicValidator::spawn_primary_onchain_setup + -> ensure validator is funded on the base chain + -> ensure Magic fee vault exists + -> if chain_operation.claim_fees_frequency is non-zero: + claim_fees(rpc_url).await + log but do not abort on failure + -> optionally register validator on-chain +``` + +The startup claim runs only inside the primary on-chain setup path. Failure is logged and does not stop startup, unlike the funding/vault setup failures immediately before it. + +### Periodic background claim + +```text +MagicValidator::start + -> after ledger replay/reset and primary/standalone mode setup + -> if is_standalone && chain_operation.claim_fees_frequency is non-zero: + claim_fees_task.start(frequency, config.rpc_url()) + +ClaimFeesTask loop + -> wait one full tick period before first interval tick + -> call claim_fees(url.clone()) on each tick + -> log errors and continue + -> exit when cancellation token is cancelled +``` + +The periodic task uses the validator's configured RPC URL. Do not move it onto transaction execution or scheduler threads. + +### Shutdown + +```text +MagicValidator::stop + -> stop scheduled-commit processor + -> stop committor service + -> claim_fees_task.stop().await + -> join RPC thread and remaining validator services +``` + +`ClaimFeesTask::stop` is cooperative. It cancels the loop and waits briefly for the task, but it does not abort the Tokio task if the underlying RPC call is still blocked. Changes that increase claim RPC duration can therefore increase shutdown latency up to the grace-period behavior and leave the spawned task to finish later. + +## Important internals and caveats + +### Minimum claim threshold + +`MIN_CLAIMABLE_LAMPORTS` is `100_000_000`. Balances at or below the threshold are skipped to avoid spending transaction fees on small claims. This threshold is crate-local and is not currently configurable. + +### Validator authority and fee vault + +The fee-claim transaction is signed by `validator_authority()` from `magicblock-program`, and the validator pubkey is also used as the transaction payer. The fees vault PDA must be derived from the same validator pubkey. If validator identity initialization changes, verify this helper still signs with the intended base-layer authority. + +### RPC client choice + +This crate currently constructs a raw Solana nonblocking `RpcClient` for one-shot fee claims instead of using the `MagicblockRpcClient` wrapper. It still reuses `MagicBlockRpcClientError` for public error compatibility with other base-layer helper code. If confirmation behavior, retry policy, or metrics are needed here, inspect `.agents/context/crates/magicblock-rpc-client.md` before changing the client type. + +### Configuration gating lives outside this crate + +`ClaimFeesTask::start` assumes the caller already chose a non-zero tick period. The enable/disable policy lives in `magicblock-api` and `magicblock-config` via `[chain-operation] claim-fees-frequency`; zero disables both startup and periodic fee claiming where checked. + +## Important invariants + +1. Fee claims must use the validator authority keypair that matches the validator fees vault PDA. +2. The validator pubkey must remain the payer and signer for `validator_claim_fees` transactions unless Delegation Program requirements change. +3. The crate must not send a claim transaction when the vault balance is at or below `MIN_CLAIMABLE_LAMPORTS`. +4. Startup fee-claim failures must remain observable through logs; do not silently swallow errors. +5. Periodic fee-claim failures must not crash the validator or terminate the loop unless the task is explicitly cancelled. +6. `ClaimFeesTask::start` must not spawn multiple loops for the same task instance. +7. `ClaimFeesTask::stop` must cancel and join cooperatively so validator shutdown can make progress. +8. Do not perform fee-claim RPC work on scheduler/executor hot paths. +9. Keep configuration semantics aligned across `magicblock-config`, `config.example.toml`, and `magicblock-api` lifecycle wiring. + +## Common change areas and what to inspect + +### Changing fee-claim transaction semantics + +Start with: + +- `magicblock-validator-admin/src/claim_fees.rs` (`claim_fees`, `MIN_CLAIMABLE_LAMPORTS`); +- Delegation Program API helpers used through `dlp_api::instruction_builder::validator_claim_fees` and `dlp_api::pda::validator_fees_vault_pda_from_validator`; +- `magicblock-program` validator authority helpers; +- `test-integration/test-magicblock-api/tests/test_claim_fees.rs`. + +Check signer/payer requirements, vault PDA derivation, commitment level, blockhash freshness, and whether error mapping still tells operators what failed. + +### Changing periodic scheduling or shutdown + +Start with: + +- `ClaimFeesTask::start`, `run_claim_fees_loop`, and `ClaimFeesTask::stop`; +- `magicblock-api/src/magic_validator.rs` fields, startup, and shutdown sections using `claim_fees_task`; +- `magicblock-config/src/config/chain.rs` and `config.example.toml` for frequency semantics. + +Preserve duplicate-start protection, cancellation responsiveness, and the fact that the periodic loop waits one full period before its first tick. + +### Adding new admin helpers + +Keep this crate focused on validator/operator management helpers. New helpers should have explicit lifecycle owners in `magicblock-api` or operator tooling, clear signer requirements, bounded RPC behavior, and targeted validation. Avoid embedding general RPC client wrappers or core protocol execution logic here. + +## Tests and validation + +For documentation-only changes, verify the new guide path and cross-references: + +```bash +ls .agents/context/crates/magicblock-validator-admin.md +rg "magicblock-validator-admin.md" AGENTS.md .agents/context/crate-map.md +``` + +For Rust changes in this crate, run at least: + +```bash +cargo fmt +cargo nextest run -p magicblock-validator-admin +``` + +For lifecycle or config integration changes, also run relevant API/config tests: + +```bash +cargo nextest run -p magicblock-api +cargo nextest run -p magicblock-config claim_fees +``` + +For end-to-end fee-claim behavior, use the MagicBlock API integration suite or the specific test when the devnet validator harness is available: + +```bash +cd test-integration +make test-magicblock-api +# or, with the required validators already started: +RUST_LOG=info cargo test -p test-magicblock-api --test test_claim_fees -- --test-threads=1 --nocapture +``` + +Broader baseline before handing off code changes remains: + +```bash +cargo clippy --workspace --all-targets -- -D warnings +cargo nextest run --workspace +``` + +Performance-sensitive risk to report: this crate can add base-layer RPC load and startup/shutdown latency. If fee-claim frequency, retry behavior, confirmation behavior, or task cancellation changes, state whether RPC-load and shutdown-latency risk was measured or only reasoned about. + +## Related docs + +- `.agents/context/overview.md` — validator runtime model and operator/admin flows. +- `.agents/context/architecture.md` — process/service orchestration and startup/shutdown boundaries. +- `.agents/context/crate-map.md` — repository crate ownership map. +- `.agents/rules/testing-and-validation.md` — repository validation workflow. +- `.agents/context/crates/magicblock-api.md` — `MagicValidator` startup/shutdown owner for this crate's task. +- `.agents/context/crates/magicblock-config.md` — `[chain-operation]` config model and env/TOML behavior. +- `.agents/context/crates/magicblock-rpc-client.md` — shared base-layer RPC wrapper and error type used by related admin/settlement helpers. +- `config.example.toml` — operator-facing `claim-fees-frequency` example. diff --git a/.agents/context/crates/magicblock-validator.md b/.agents/context/crates/magicblock-validator.md new file mode 100644 index 000000000..253f6d698 --- /dev/null +++ b/.agents/context/crates/magicblock-validator.md @@ -0,0 +1,294 @@ +# `magicblock-validator` + +## Purpose + +`magicblock-validator` is the production binary crate for the MagicBlock validator. It is intentionally thin: it parses `magicblock-config::ValidatorParams`, initializes process-level logging and the main Tokio runtime, constructs `magicblock_api::magic_validator::MagicValidator`, holds the ledger lock for the lifetime of the process, and drives either the headless shutdown-wait loop or the feature-gated embedded TUI. + +High-level responsibilities: + +- create the process main Tokio runtime with a small worker pool for async I/O and timers; +- load layered CLI/env/TOML/default configuration through `ValidatorParams::try_new`; +- initialize logging through `magicblock-core` or the TUI embedded logger; +- create, start, stop, and report readiness for `MagicValidator`; +- take and retain the ledger write lock so only one validator process uses the same ledger directory; +- expose operator-facing startup information in headless mode; +- wait for `SIGTERM`/`SIGINT` in headless mode and then run the shutdown sequence; +- launch the embedded TUI only when compiled with the `tui` feature and not run with `--no-tui`. + +This crate sits on startup and shutdown paths, not on per-transaction execution hot loops. Changes here can still affect operator compatibility, runtime sizing, service ordering, graceful shutdown, ledger durability, and optional TUI behavior. + +## Update requirement + +Update this guide in the same change whenever `magicblock-validator` behavior or contracts change. In particular, update it for changes to: + +- CLI invocation, feature flags, `--no-tui` semantics, or logging behavior; +- main Tokio runtime construction, worker-thread sizing, or runtime/thread names; +- ordering around `ValidatorParams::try_new`, `MagicValidator::try_from_config`, ledger locking, `api.start()`, TUI/headless execution, unregistration, ledger preparation, or `api.stop()`; +- `run_no_tui` startup output, version reporting, endpoint reporting, or shutdown-signal handling; +- `shutdown.rs` signal support or platform-specific graceful shutdown behavior; +- dependencies on `magicblock-api`, `magicblock-config`, `magicblock-core`, `magicblock-version`, or `magicblock-tui-client`; +- validation commands, run commands, packaging, or docs that operators use to start the binary. + +Also update this file if another crate changes a contract consumed directly here, such as `MagicValidator` lifecycle methods, `ValidatorParams` fields, ledger lock helpers, or `TuiConfig`. + +## Where it sits in the repository + +| Path | Role | +|---|---| +| `magicblock-validator/Cargo.toml` | Binary package metadata, `default-run = "magicblock-validator"`, dependencies, and feature flags. `tokio-console` enables Tokio tracing support in `magicblock-core`; `tui` pulls in `magicblock-tui-client`. | +| `magicblock-validator/src/main.rs` | Process entrypoint. Builds the Tokio runtime, parses config, initializes logging, creates/starts/stops `MagicValidator`, holds the ledger lock, and chooses TUI versus headless mode. | +| `magicblock-validator/src/shutdown.rs` | Headless graceful-shutdown waiter. On Unix waits for `SIGTERM` or Ctrl-C; on non-Unix waits for Ctrl-C. | +| `magicblock-api/src/magic_validator.rs` | Downstream lifecycle owner consumed by this binary: `try_from_config`, `start`, `start_unregister_validator_on_chain`, `prepare_ledger_for_shutdown`, `stop`, and `ledger`. | +| `magicblock-api/src/ledger.rs` | Provides public `ledger_lockfile` and `lock_ledger` helpers used here to prevent concurrent ledger use. | +| `magicblock-config/src/lib.rs` and `magicblock-config/src/config/cli.rs` | Define `ValidatorParams`, layered config loading, and CLI flags consumed by the binary, including `--no-tui`. | +| `tools/magicblock-tui-client/` | Optional embedded TUI implementation used only behind the `tui` feature. | +| `README.md`, `docs/architecture.md`, `docs/tui-externalization.md` | Operator and architecture documentation that describe running the validator and the embedded/external TUI split. | + +Main consumers: + +- End users/operators run this crate as the `magicblock-validator` binary. +- Docker/npm/binary packaging should treat this as the process entrypoint. +- Integration test harnesses and manual workflows may launch this binary indirectly, but runtime behavior is primarily exercised through `magicblock-api` and RPC tests. + +Important upstream dependencies: + +- `magicblock-api` owns the real validator service graph and lifecycle. +- `magicblock-config` owns config parsing and CLI/env/TOML semantics. +- `magicblock-core` owns logging initialization and optional `tokio-console` feature plumbing. +- `magicblock-version` owns version/git-version display. +- `magicblock-tui-client` owns optional UI rendering and RPC enrichment. + +## Public API shape / Main public types and APIs + +This is a binary crate, not a library crate. Its public surface is the executable behavior and compile-time features. + +### Entrypoint and runtime + +- `main()` computes `workers = (num_cpus::get() / 4).max(1)`, builds a multi-thread Tokio runtime with `enable_all()` and thread name `async-runtime`, then `block_on(run())`. +- The worker split is intentional: the main runtime handles general async I/O/timer-bound services while RPC and CPU-bound transaction execution use separate runtimes/threads in downstream crates. Do not casually increase this runtime to consume all CPUs; that can steal capacity from RPC and executor paths. + +### Configuration and logging + +- `run()` passes `std::env::args_os()` to `ValidatorParams::try_new` and exits with status `1` after printing to stderr if config loading fails. +- Headless builds always call `init_logger()`, which delegates to `magicblock_core::logger::init_with_config` with `LogStyle::from_env()`. +- With the `tui` feature: + - if `config.no_tui` is true, the normal logger is initialized and the binary runs headless; + - otherwise `magicblock_tui_client::init_embedded_logger()` supplies a log receiver for the embedded TUI. + +### Validator lifecycle calls + +The binary calls into `MagicValidator` in this order: + +1. `MagicValidator::try_from_config(config).await` to construct the service graph. +2. `ledger::ledger_lockfile(api.ledger().ledger_path())` and `ledger::lock_ledger(...)` to hold the ledger lock for the rest of the process. +3. `api.start().await` to enter primary/replica runtime mode. +4. TUI or headless run loop. +5. `api.start_unregister_validator_on_chain().await` before final shutdown. +6. `api.prepare_ledger_for_shutdown()` to cancel ledger compactions and flush before stop. +7. `api.stop().await` to consume the validator, stop services, join workers, flush AccountsDb/ledger, and shut down the ledger. + +Preserve this order unless the `magicblock-api` lifecycle contract changes and both documents are updated together. + +### Headless run loop + +`run_no_tui(...)` prints startup information including: + +- `magicblock_version::Version` and `git_version`; +- local RPC and WebSocket endpoints; +- remote RPC endpoint; +- validator identity; +- ledger location. + +It then waits on `Shutdown::wait()`. `print_info` uses plain `println!` only when `RUST_LOG` is unset or exactly `quiet`; otherwise it emits `tracing::info!` so operators can hide startup banners with logging filters such as `RUST_LOG=warn`. + +### Feature flags + +| Feature | Effect | +|---|---| +| default | Headless binary only. | +| `tui` | Adds `magicblock-tui-client` and enables the embedded TUI path after validator startup. | +| `tokio-console` | Adds `console-subscriber`, enables Tokio tracing, and forwards `magicblock-core/tokio-console`. | + +## Runtime flows + +### Headless startup and shutdown + +```text +main + -> build main Tokio runtime + -> run + -> ValidatorParams::try_new(args) + -> init_logger + -> MagicValidator::try_from_config(config) + -> create and hold ledger write lock + -> api.start() + -> run_no_tui(...) + -> print startup/operator info + -> wait for SIGTERM/SIGINT (or Ctrl-C on non-Unix) + -> api.start_unregister_validator_on_chain() + -> api.prepare_ledger_for_shutdown() + -> api.stop() + -> drop runtime +``` + +Pitfalls: + +- The ledger lock guard must remain in scope while the validator runs. Dropping it early would allow another process to open the same ledger directory. +- Shutdown only begins after the headless wait or TUI returns. If a new UI/run loop is added, it must return promptly on operator shutdown. +- `start_unregister_validator_on_chain` is intentionally called before `prepare_ledger_for_shutdown` and `stop`; `MagicValidator::stop` also calls it defensively and no-ops if already started. + +### Embedded TUI path + +```text +run with --features tui and without --no-tui + -> init_embedded_logger() + -> construct/start MagicValidator and lock ledger + -> build TuiConfig from local endpoints, remote RPC, identity, ledger path, block time, lifecycle, base fee, version + -> enrich_config_from_rpc(&mut TuiConfig) + -> run_tui(tui_config, validator_log_rx) + -> on TUI return/error: continue normal unregister/ledger-prep/stop shutdown +``` + +Pitfalls: + +- The embedded TUI is UI-facing only. Do not move validator service ownership or protocol logic into `tools/magicblock-tui-client`. +- TUI config fields are captured before `MagicValidator::try_from_config(config)` consumes the config. If new display fields are needed, extract them before moving `config`. +- `--no-tui` only matters in builds compiled with the `tui` feature; headless-only builds silence the unused variable and always use normal logging. + +### Configuration failure path + +If `ValidatorParams::try_new` fails, the binary prints `Failed to read validator config: ...` to stderr and exits with status `1`. If `MagicValidator::try_from_config` fails, it logs the error and exits with status `1`. These are operator-facing process contracts; avoid converting them into panics or silent returns. + +## Important internals and caveats + +### Keep the binary thin + +`magicblock-validator` should remain a process wrapper around `magicblock-api`. Cross-service wiring, account synchronization, scheduler/executor logic, settlement, replication, metrics, and RPC behavior belong in their owning crates. If a change needs to alter runtime behavior after construction, inspect `magicblock-api/src/magic_validator.rs` first. + +### Runtime sizing is part of performance behavior + +The main runtime intentionally uses about one quarter of available CPUs with a minimum of one worker. The comment in `main.rs` documents that the remaining capacity is reserved for blocking I/O, RPC, and transaction scheduler/executor work. Any change to this split should call out expected impact on startup/shutdown services, RPC latency, and execution throughput. + +### Ledger locking is process-level safety + +The binary obtains the lock after `MagicValidator` is constructed because it needs the resolved ledger path. It must keep the `RwLockWriteGuard` alive until after the run loop exits. `ledger::lock_ledger` exits the process with an operator-facing message if another validator already holds the lock. + +### Shutdown is cooperative with downstream services + +`shutdown.rs` only detects process signals. Actual service cancellation, committor shutdown ordering, thread joins, AccountsDb flush, ledger flush, and RocksDB shutdown are owned by `MagicValidator::stop`. Do not duplicate that logic in the binary. + +### Operator-facing output is compatibility-sensitive + +Startup output in `run_no_tui` is useful for local development, automation, and debugging. If changing labels, hiding fields, or routing output differently, update operator docs and consider tests or manual validation of both `RUST_LOG` modes. + +## Important invariants + +1. The binary must not own protocol, account, execution, RPC, settlement, or persistence logic beyond process lifecycle calls into `magicblock-api`. +2. `ValidatorParams` must be parsed before any config-dependent logging, TUI choice, endpoint reporting, or validator construction. +3. The ledger write lock guard must stay alive while `MagicValidator` is running. +4. `api.start()` must complete before the binary reports readiness or launches the embedded TUI. +5. Shutdown must call `start_unregister_validator_on_chain`, `prepare_ledger_for_shutdown`, and `stop` in the established order unless the `magicblock-api` lifecycle changes. +6. The main Tokio runtime must leave CPU capacity for RPC and transaction execution domains; avoid moving blocking or CPU-bound work onto it. +7. The embedded TUI must remain feature-gated and must not be required for headless operation. +8. `SIGTERM` and Ctrl-C should initiate graceful shutdown on Unix; Ctrl-C should remain supported on non-Unix. +9. Version, endpoint, identity, and ledger-path reporting should remain accurate and derived from the resolved config/runtime state. +10. Error paths during config loading or validator construction should fail fast and visibly for operators. + +## Common change areas and what to inspect + +### Changing startup or shutdown order + +Start with: + +- `magicblock-validator/src/main.rs` (`run`, `run_no_tui`); +- `magicblock-api/src/magic_validator.rs` (`try_from_config`, `start`, `start_unregister_validator_on_chain`, `prepare_ledger_for_shutdown`, `stop`); +- `.agents/context/crates/magicblock-api.md` startup/shutdown sections. + +Check that ledger locking, readiness reporting, unregister behavior, compaction cancellation, service cancellation, and durable flushes still happen in a safe order. + +### Changing CLI/config behavior + +Start with: + +- `magicblock-config/src/lib.rs` (`ValidatorParams::try_new`); +- `magicblock-config/src/config/cli.rs` (`CliParams`, `CliValidatorConfig`, `CliApertureConfig`, `CliLedgerConfig`); +- `.agents/context/crates/magicblock-config.md`. + +Do not add ad-hoc CLI parsing in the binary; keep config layering in `magicblock-config`. + +### Changing logging or startup output + +Start with: + +- `magicblock-validator/src/main.rs` (`init_logger`, `print_info`, `run_no_tui`); +- `magicblock-core/src/logger/`; +- `tools/magicblock-tui-client/` if embedded TUI logs are affected. + +Validate both unset/`quiet` `RUST_LOG` behavior and filtered tracing behavior. + +### Changing TUI integration + +Start with: + +- `magicblock-validator/Cargo.toml` feature flags; +- `magicblock-validator/src/main.rs` `#[cfg(feature = "tui")]` blocks; +- `tools/magicblock-tui-client/README.md` and `docs/tui-externalization.md`. + +Keep the default binary headless and preserve `--no-tui` behavior for feature-enabled builds. + +### Changing runtime sizing or async behavior + +Start with: + +- `main()` runtime builder; +- `magicblock-api/src/magic_validator.rs` for downstream RPC/execution runtime/thread creation; +- `.agents/context/architecture.md` and `.agents/context/crates/magicblock-api.md` for execution-domain boundaries. + +Avoid blocking calls in the main async runtime unless they are already isolated in downstream service code. + +## Tests and validation + +For documentation-only changes touching this guide: + +```bash +git diff --check -- .agents/context/crates/magicblock-validator.md .agents/context/crate-map.md AGENTS.md +``` + +For code changes in this crate, run at minimum: + +```bash +cargo fmt +cargo check -p magicblock-validator --no-default-features +cargo check -p magicblock-validator --features tui +cargo nextest run -p magicblock-validator +``` + +If logging or `tokio-console` behavior changes, also check: + +```bash +cargo check -p magicblock-validator --features tokio-console +``` + +For broader validation before handoff, follow `.agents/rules/testing-and-validation.md`: + +```bash +cargo clippy --workspace --all-targets -- -D warnings +cargo nextest run --workspace +``` + +For lifecycle or startup/shutdown changes, perform a manual smoke run with a disposable storage directory or the relevant integration harness. At minimum verify the process starts, prints expected endpoints/identity/ledger path, accepts Ctrl-C/SIGTERM, and exits after flushing without leaving a second process able to use the same ledger concurrently. + +Performance-sensitive risk to report: runtime sizing, blocking work in `run`, and lifecycle ordering can indirectly affect RPC/execution throughput or shutdown latency. If those areas change, state whether any performance or smoke validation was run. + +## Related docs + +- `.agents/context/overview.md` — validator runtime model and non-negotiable agent rules. +- `.agents/context/architecture.md` — process/service orchestration and startup/shutdown boundaries. +- `.agents/context/crate-map.md` — repository crate ownership map. +- `.agents/rules/testing-and-validation.md` — repository validation workflow. +- `.agents/context/crates/magicblock-api.md` — downstream `MagicValidator` lifecycle and service graph. +- `.agents/context/crates/magicblock-config.md` — config layering and CLI/env/TOML behavior. +- `.agents/context/crates/magicblock-core.md` — logging and shared runtime infrastructure. +- `README.md` — operator-facing build/run overview. +- `docs/architecture.md` — high-level architecture, including this binary as the entrypoint. +- `docs/tui-externalization.md` and `tools/magicblock-tui-client/README.md` — embedded/external TUI behavior. diff --git a/.agents/context/crates/magicblock-version.md b/.agents/context/crates/magicblock-version.md new file mode 100644 index 000000000..345b5dcfe --- /dev/null +++ b/.agents/context/crates/magicblock-version.md @@ -0,0 +1,253 @@ +# `magicblock-version` + +## Purpose + +`magicblock-version` is the validator's shared build/version metadata crate. It provides the `Version` value used by the validator binary, the JSON-RPC `getVersion` response, and operator-facing TUI/headless startup displays. + +High-level responsibilities: + +- derive the MagicBlock package semver from the workspace package version at compile time; +- expose Solana compatibility metadata, including the Agave RPC API version and current `solana-feature-set` identifier; +- expose build source metadata, including a CI-provided commit prefix and a compile-time git description from `git-version`; +- identify the client implementation as MagicBlock while preserving Solana/Jito/Firedancer client ID compatibility; +- keep the version type serializable, sanitizable, and ABI-example-compatible for RPC/operator consumers. + +This crate is small and dependency-light. It is not on the transaction execution hot path, but it is operator- and RPC-facing: field names, formatting, and client IDs are compatibility-sensitive because `magicblock-aperture`, `magicblock-validator`, and TUI/operator tooling depend on them. + +## Update requirement + +Update this guide in the same change whenever `magicblock-version` behavior or contracts change. In particular, update it for changes to: + +- `Version` fields, serialization behavior, `Display` / `Debug` formatting, or the `semver!` / `version!` macros; +- how `major`, `minor`, `patch`, `commit`, `feature_set`, `client`, `solana_core`, or `git_version` are computed; +- the `ClientId` numeric mapping or accepted unknown-client behavior; +- build-script cfg handling for stable/beta/nightly/dev Rust toolchains; +- consumers that expose version data through RPC, startup logs, the embedded/external TUI, packaging, or operator docs; +- validation commands or tests used to check version compatibility. + +Also update this file if another crate changes a contract that consumes this crate, such as the `getVersion` JSON shape in `magicblock-aperture` or startup/TUI version display in `magicblock-validator`. + +## Where it sits in the repository + +| Path | Role | +|---|---| +| `magicblock-version/Cargo.toml` | Crate manifest. Depends on `semver`, `serde`, Solana feature/RPC/sanitize/ABI crates, and `git-version`; uses `rustc_version` in the build script. | +| `magicblock-version/build.rs` | Emits `rustc-check-cfg` declarations for specialization-related cfgs based on the active compiler channel. | +| `magicblock-version/src/lib.rs` | Defines `ClientId`, public `Version`, default metadata computation, formatting, macros, `Sanitize`, and unit tests. | +| `Cargo.toml` | Workspace package version is the source for `Version.major`, `minor`, and `patch`. | +| `magicblock-aperture/src/requests/http/get_version.rs` | RPC consumer. Builds the `getVersion` response from `Version::default()`. | +| `magicblock-aperture/tests/node.rs` and `magicblock-aperture/tests/batches.rs` | RPC tests that assert `getVersion` returns Solana version/feature-set information and works in batches. | +| `magicblock-validator/src/main.rs` | Binary consumer. Prints headless startup version/git metadata and passes version fields into the embedded TUI config. | +| `tools/magicblock-tui-client/src/app.rs` and `tools/magicblock-tui-client/src/state.rs` | TUI consumer. External TUI starts with its own package version, then enriches display with the validator's `getVersion` `solana-core` value when reachable. | + +Main consumers: + +- `magicblock-aperture` exposes version metadata through Solana-compatible JSON-RPC. +- `magicblock-validator` uses it for headless startup output and embedded TUI configuration. +- Operator tooling and external Solana clients indirectly consume the JSON returned by `getVersion`. + +Important upstream dependencies: + +- `solana-feature-set::ID` supplies the feature-set fingerprint exposed to RPC clients. +- `solana_rpc_client_api::response::RpcApiVersion::default()` supplies the Solana core/API version string. +- `git-version::git_version!()` supplies `Version.git_version` at compile time. +- `CI_COMMIT`, when set, is parsed by `compute_commit` as the first four bytes of a hex SHA-1 prefix. + +## Public API shape / Main public types and APIs + +Public surface from `src/lib.rs`: + +- `pub struct Version` with public fields: + - `major`, `minor`, `patch`: parsed from `CARGO_PKG_VERSION_*`; + - `commit`: first four bytes of `CI_COMMIT` interpreted as big-endian hex, or `0` when absent/invalid; + - `feature_set`: first four bytes of `solana_feature_set::ID`, interpreted as little-endian; + - `solana_core`: default Solana RPC API version string; + - `git_version`: `git-version` output string. +- `client: u16` is intentionally private but participates in serialization because it is a field of `Version`. `Version::default()` sets it to the MagicBlock client ID. +- `Version::as_semver_version() -> semver::Version` returns only the MagicBlock package semver. +- `impl Default for Version` is the canonical constructor. Consumers should prefer `Version::default()` instead of recomputing build metadata. +- `impl Display` renders `major.minor.patch` only. This is used for `magicblock-core` in `getVersion` and for validator startup display. +- `impl Debug` renders `major.minor.patch (src:; feat:, client:)`. The `version!()` macro returns this debug string. +- `semver!()` returns a formatted `Display` string for `Version::default()`. +- `version!()` returns a formatted `Debug` string for `Version::default()`. +- `impl Sanitize for Version` is intentionally empty, matching Solana's lightweight sanitize marker pattern for this metadata type. + +`ClientId` is crate-private but compatibility-sensitive: + +| Numeric ID | Meaning | +|---|---| +| `0` | SolanaLabs | +| `1` | JitoLabs | +| `2` | Firedancer | +| `3` | MagicBlock | +| `4..=u16::MAX` | Preserved as `Unknown(id)` | + +`TryFrom for u16` rejects `Unknown(0..=3)` so known IDs cannot be smuggled through the unknown variant. + +## Runtime flows + +### Constructing default version metadata + +```text +Version::default() + -> parse CARGO_PKG_VERSION_MAJOR/MINOR/PATCH + -> compute feature_set from solana_feature_set::ID[..4] + -> compute commit from option_env!("CI_COMMIT") first 8 hex chars, else 0 + -> set client to ClientId::MagicBlock numeric ID 3 + -> read Solana RPC API version from RpcApiVersion::default() + -> read git_version from git_version::git_version!() +``` + +Pitfalls: + +- `CI_COMMIT` is optional and may be non-hex (for example `HEAD` in local builds). Invalid values must continue to degrade to `0` rather than failing startup or build. +- `feature_set` uses little-endian bytes from the feature-set ID; changing byte order would change the RPC-visible value. +- `Display` intentionally omits git and feature metadata. Consumers that need git metadata must use `Version.git_version` or `Debug`/`version!()`. + +### `getVersion` RPC flow + +```text +HTTP getVersion request + -> magicblock-aperture HttpDispatcher::get_version + -> Version::default() + -> JSON result fields: + solana-core = version.solana_core + feature-set = version.feature_set + git-commit = version.git_version + magicblock-core = version.to_string() +``` + +The RPC field names are external compatibility surface. Do not rename them or swap `git-commit` from `git_version` to the numeric `commit` field without updating RPC tests, docs, and downstream tooling expectations. + +### Validator/TUI display flow + +```text +magicblock-validator startup + -> Version::default() + -> headless output: "Validator version: (Git: )" + -> embedded TUI config: version=, git_version= + +external TUI startup + -> starts with tools/magicblock-tui-client package version/GIT_HASH + -> calls getVersion when validator RPC is reachable + -> appends validator to the displayed version string +``` + +Preserve the distinction between the validator crate's own version metadata and the external TUI binary's package/GIT_HASH metadata. + +## Important internals and caveats + +### Build-script cfg declarations + +`build.rs` uses `rustc_version::version_meta()` to emit check-cfg declarations for `RUSTC_WITH_SPECIALIZATION` or `RUSTC_WITHOUT_SPECIALIZATION`. `src/lib.rs` has `#![cfg_attr(RUSTC_WITH_SPECIALIZATION, feature(min_specialization))]` to mirror Solana's version crate pattern. + +Current behavior only declares the cfg names for stable/beta/nightly and sets `RUSTC_NEEDS_PROC_MACRO_HYGIENE` on dev toolchains. It does not currently emit `cargo:rustc-cfg=RUSTC_WITH_SPECIALIZATION` for nightly or `RUSTC_WITHOUT_SPECIALIZATION` for stable/beta. If this changes, validate on the supported toolchains because cfg spelling and Cargo output syntax directly affect builds. + +### Commit metadata sources + +There are two commit-like fields: + +- `commit: u32` is derived from `CI_COMMIT` and appears in `Debug` as `src:`. +- `git_version: String` comes from `git-version` and is exposed by RPC as `git-commit` and by startup/TUI display as Git metadata. + +Do not assume they are identical. Local builds may have `commit == 0` while `git_version` contains a git description. + +### Serialization compatibility + +`Version` derives `Serialize`/`Deserialize`, and the private `client` field is still serialized by serde. Adding, removing, renaming, or changing field visibility/defaults can affect any Solana-compatible client or persisted/test payload that deserializes this type. + +## Important invariants + +1. `Version::default()` must be cheap, deterministic within one binary build, and side-effect free. It should not perform I/O, network calls, or runtime git commands. +2. `Display` must remain the MagicBlock package semver string (`major.minor.patch`) unless every consumer display/RPC expectation is updated. +3. The MagicBlock client ID must remain `3` unless coordinated with Solana-client compatibility expectations and all tests/docs are updated. +4. Unknown client IDs `>= 4` must round-trip through `ClientId::Unknown` and `TryFrom for u16`. +5. Known client IDs must not be accepted through `ClientId::Unknown(0..=3)`. +6. Invalid or absent `CI_COMMIT` must not fail builds or validator startup; it should continue to produce `commit == 0`. +7. `getVersion` response field names and meanings must remain stable for Solana/RPC/operator tooling. +8. Keep this crate dependency-light. Do not add runtime dependencies that pull heavy validator services into version reporting. + +## Common change areas and what to inspect + +### Changing package/version reporting + +Start with: + +- `Cargo.toml` `[workspace.package].version`; +- `magicblock-version/src/lib.rs` `Version::default`, `Display`, and `as_semver_version`; +- `magicblock-aperture/src/requests/http/get_version.rs` for RPC field mapping; +- `magicblock-validator/src/main.rs` for startup/TUI display. + +Check that `magicblock-core` in RPC continues to mean the MagicBlock validator semver, while `solana-core` continues to mean the Solana RPC API version string. + +### Changing client IDs + +Start with: + +- `ClientId` enum; +- `impl From for ClientId`; +- `impl TryFrom for u16`; +- `test_client_id`. + +Preserve known-ID rejection through `Unknown` and update this guide with any new numeric assignment. + +### Changing commit/git metadata + +Start with: + +- `compute_commit` and `test_compute_commit`; +- `Version::default` `CI_COMMIT` and `git_version::git_version!()` usage; +- `magicblock-aperture/src/requests/http/get_version.rs` `git-commit` mapping; +- `magicblock-validator/src/main.rs` startup display. + +Be explicit about whether a change affects the numeric `commit`, string `git_version`, or RPC field named `git-commit`. + +### Changing build-script/toolchain behavior + +Start with: + +- `magicblock-version/build.rs`; +- `magicblock-version/Cargo.toml` `unexpected_cfgs` check-cfg allowlist; +- the crate-level `cfg_attr(RUSTC_WITH_SPECIALIZATION, feature(min_specialization))`. + +Validate with the repository-supported toolchain. If you alter emitted cfgs, ensure Cargo output keys use the intended spelling (`cargo:...` versus `cargo::...`) for the minimum supported Cargo version. + +## Tests and validation + +For documentation-only changes touching this guide, at minimum verify changed paths and cross-references: + +```bash +git diff --check +``` + +For changes to `magicblock-version` itself, run: + +```bash +cargo fmt +cargo nextest run -p magicblock-version +``` + +For RPC-visible changes, also run the relevant Aperture tests: + +```bash +cargo nextest run -p magicblock-aperture --test node test_get_version +cargo nextest run -p magicblock-aperture --test batches test_batch_requests +``` + +Before handing off Rust behavior changes, follow the broader baseline from `.agents/rules/testing-and-validation.md` when time allows: + +```bash +cargo clippy --workspace --all-targets -- -D warnings +cargo nextest run --workspace +``` + +Performance risk is normally low because this crate only builds small metadata values. Still report any new dependency, I/O, process spawning, locking, or allocation-heavy behavior added to `Version::default()` because `getVersion` and startup display call it synchronously. + +## Related docs + +- `.agents/context/overview.md` for the validator's high-level runtime model. +- `.agents/context/architecture.md` for process/API ingress boundaries. +- `.agents/context/crate-map.md` for workspace crate ownership and consumers. +- `.agents/rules/testing-and-validation.md` for repository validation workflow. +- `.agents/context/crates/magicblock-aperture.md` for the RPC layer that exposes `getVersion`. +- `.agents/context/crates/magicblock-validator.md` for startup/headless/TUI version display. diff --git a/.agents/context/crates/storage-proto.md b/.agents/context/crates/storage-proto.md new file mode 100644 index 000000000..5149c3966 --- /dev/null +++ b/.agents/context/crates/storage-proto.md @@ -0,0 +1,252 @@ +# `storage-proto` / `solana-storage-proto` + +## Purpose + +`storage-proto/` contains the workspace crate whose package and Rust crate name are `solana-storage-proto` / `solana_storage_proto`. It provides protobuf definitions, generated protobuf Rust modules, and conversion glue for ledger transaction-status storage. + +High-level responsibilities: + +- compile `storage-proto/proto/*.proto` into `prost` message types during the crate build; +- expose generated modules for confirmed blocks, address-signature indexes, and entry summaries through `solana_storage_proto::convert`; +- convert between generated protobuf messages and Solana transaction-status types such as `TransactionStatusMeta`, `VersionedConfirmedBlock`, `TransactionByAddrInfo`, and `EntrySummary`; +- preserve compatibility with older serialized ledger metadata through `Stored*` helper structs and `default_on_eof` serde defaults; +- provide the protobuf message type used by `magicblock-ledger` for the `transaction_status` RocksDB column. + +This crate is on the ledger persistence/read path through `magicblock-ledger`. It is not execution logic, but its wire formats and conversion semantics are persistence- and RPC-history-sensitive. Avoid unnecessary allocations or decode/encode work in conversion paths because ledger writes happen when transactions are recorded and ledger reads feed RPC/history queries. + +## Update requirement + +Update this guide in the same change whenever `storage-proto` behavior or contracts change. In particular, update it for changes to: + +- protobuf files under `storage-proto/proto/`, field numbers, enum values, optionality, or package names; +- build-time code generation in `storage-proto/build.rs`, including generated module names or `tonic_prost_build` options; +- public modules or conversion impls in `storage-proto/src/convert.rs`; +- `Stored*` compatibility structs in `storage-proto/src/lib.rs`, especially defaults used to read older serialized ledger data; +- transaction or instruction error enum mappings, including MagicBlock-specific variants such as `CommitCancelled`; +- how `magicblock-ledger` writes, reads, or migrates protobuf values from RocksDB; +- validation commands for protobuf generation, conversion round-trips, or ledger persistence compatibility. + +Also update this file if another crate changes a type or persistence contract consumed here, such as Solana transaction-status APIs used by the conversion impls. + +## Where it sits in the repository + +| Path | Role | +|---|---| +| `storage-proto/Cargo.toml` | Package manifest. The package is named `solana-storage-proto`; the Rust library is `solana_storage_proto`. Depends on Solana transaction/status types, `prost`, `bincode`, `serde`, `bs58`, and uses `tonic-prost-build` plus `protobuf-src` at build time. | +| `storage-proto/README.md` | Short operator/developer note: generated structs come from `proto/*.proto`; edit proto files to update them. | +| `storage-proto/build.rs` | Sets `PROTOC` from `protobuf_src::protoc()` on non-Windows when not already set, registers proto files for rebuild, and compiles them with client generation enabled and server generation disabled. Adds test-only `enum_iterator::Sequence` derives for error enums. | +| `storage-proto/proto/confirmed_block.proto` | Wire schema for confirmed blocks, transactions, transaction status metadata, token balances, return data, rewards, block time/height, and address table lookups. | +| `storage-proto/proto/transaction_by_addr.proto` | Wire schema for per-address transaction history entries and the explicit transaction/instruction error enum mappings. | +| `storage-proto/proto/entries.proto` | Wire schema for compact entry summaries. | +| `storage-proto/src/lib.rs` | Public crate root. Exposes `convert`, stored compatibility structs, and conversions between stored/bincode-compatible forms and Solana status types. | +| `storage-proto/src/convert.rs` | Includes generated protobuf modules from `OUT_DIR` and implements `From` / `TryFrom` conversions between generated messages and Solana transaction-status types. Contains conversion tests. | +| `Cargo.toml` | Workspace dependency and `[patch.crates-io]` entry point this crate uses to replace upstream `solana-storage-proto`. Also pins `prost` with a comment to keep it in sync with codegen. | +| `test-integration/Cargo.toml` | Integration workspace patch for the same local `solana-storage-proto` replacement. | +| `magicblock-ledger/src/database/columns.rs` | Defines `cf::TransactionStatus` as a `ProtobufColumn` whose value type is `solana_storage_proto::convert::generated::TransactionStatusMeta`. | +| `magicblock-ledger/src/database/ledger_column.rs` | Encodes/decodes `ProtobufColumn` values with `prost::Message`; includes fallback helpers for older bincode values. | +| `magicblock-ledger/src/store/api.rs` | Writes `TransactionStatusMeta` through this crate's generated type and converts protobuf values back to Solana status metadata for ledger/RPC reads. | + +Main consumers: + +- `magicblock-ledger` is the direct workspace consumer and uses the generated `TransactionStatusMeta` as persisted RocksDB data. +- Solana dependencies patched by the workspace may depend on `solana-storage-proto`; the root and integration `[patch.crates-io]` entries force them to this local crate. +- RPC/history consumers indirectly depend on this crate through ledger methods that read transaction status, confirmed transactions, and address-signature history. + +Important upstream dependencies: + +- `prost` and `tonic-prost-build` define generated message APIs and encode/decode behavior. +- `protobuf-src` supplies a bundled `protoc` on non-Windows; Windows users must provide `PROTOC` manually per `Cargo.toml` comments. +- Solana crates such as `solana-transaction-status`, `solana-transaction-error`, `solana-message`, `solana-transaction`, `solana-pubkey`, `solana-signature`, and `solana-hash` define the canonical runtime/RPC types converted by this crate. + +## Public API shape / Main public types and APIs + +Public surface from `src/lib.rs`: + +- `pub mod convert;` is the main public module. +- `StoredExtendedReward`, `StoredExtendedRewards`, `StoredTokenAmount`, `StoredTransactionTokenBalance`, `StoredTransactionStatusMeta`, and `StoredTransactionReturnData` are serde-compatible helper types for older bincode-era representations. +- `default_on_eof` is private but important: it lets deserialization of older stored data default fields that may not exist in older serialized bytes. +- Conversions preserve older fields and explicitly reject deprecated bincode status serialization when loaded addresses are present, because the old format cannot represent them. + +Public surface from `src/convert.rs`: + +- `convert::generated` includes `OUT_DIR/solana.storage.confirmed_block.rs` generated from `confirmed_block.proto`. +- `convert::tx_by_addr` includes `OUT_DIR/solana.storage.transaction_by_addr.rs` generated from `transaction_by_addr.proto`. +- `convert::entries` includes `OUT_DIR/solana.storage.entries.rs` generated from `entries.proto`. +- `From` conversions cover Solana-to-protobuf paths for rewards, confirmed blocks, transactions, versioned messages, transaction status metadata, token balances, address table lookups, return data, compiled/inner instructions, transaction-by-address entries, and entry summaries. +- `TryFrom` conversions cover protobuf-to-Solana paths that can fail due to invalid bincode error payloads, invalid signatures/pubkeys/hashes, or unmapped enum values. + +Key public/generated types consumed outside the crate: + +| Type/module | Use | +|---|---| +| `convert::generated::TransactionStatusMeta` | Persisted value type for `magicblock-ledger`'s `transaction_status` column. | +| `convert::generated::ConfirmedBlock` / `ConfirmedTransaction` | Protobuf representation of block and transaction data. | +| `convert::tx_by_addr::TransactionByAddrInfo` | Protobuf representation for address-signature history entries. | +| `convert::entries::Entry` | Protobuf representation for entry summaries. | + +## Runtime flows + +### Build-time protobuf generation + +1. Cargo runs `storage-proto/build.rs`. +2. If `PROTOC` is unset and the target is not Windows, the build script points `PROTOC` at `protobuf_src::protoc()`. +3. The build script registers `confirmed_block.proto`, `entries.proto`, and `transaction_by_addr.proto` with `cargo:rerun-if-changed`. +4. `tonic_prost_build::configure()` compiles the schemas with clients enabled and servers disabled. +5. Generated Rust files are emitted under `OUT_DIR` and included by `convert::{generated, tx_by_addr, entries}`. + +Do not check generated files into the repository unless the build strategy changes. The source of truth is `storage-proto/proto/*.proto` plus `build.rs`. + +### Ledger transaction-status write/read path + +```text +processor/API records transaction + -> magicblock-ledger::Ledger::write_transaction_status + -> solana_storage_proto::convert::generated::TransactionStatusMeta::from(TransactionStatusMeta) + -> LedgerColumn::put_protobuf + -> prost encodes bytes into RocksDB + +RPC/history read + -> LedgerColumn::get_protobuf + -> prost decodes generated::TransactionStatusMeta + -> TransactionStatusMeta::try_from(generated value) + -> ledger/RPC returns Solana transaction-status data +``` + +Pitfalls: + +- `magicblock-ledger` stores only status metadata as protobuf in this path. It still stores `VersionedTransaction` bytes with bincode in the `Transaction` column. +- `get_protobuf_or_bincode` exists for fallback-compatible reads, but `Ledger::read_transaction_status` currently uses `get_protobuf`; do not assume all ledger columns can transparently read old formats. +- Conversion failures surface as ledger errors and can break RPC history reads for persisted transactions. + +### Transaction-by-address and error conversion + +1. `TransactionByAddrInfo` converts to `tx_by_addr::TransactionByAddrInfo` by serializing signatures as bytes and optional fields as protobuf messages. +2. `TransactionError` maps to explicit `TransactionErrorType`/`InstructionErrorType` enum values. +3. Some transaction errors require extra `TransactionDetails` or `InstructionError` payloads, for example duplicate instruction indexes, rent account indexes, restricted program account indexes, custom instruction errors, and instruction error indexes. +4. Reverse conversion validates enum values and returns `Err(&'static str)` for unmapped/invalid variants. + +The numeric enum values in `transaction_by_addr.proto` are compatibility-sensitive. If Solana or MagicBlock adds/removes transaction errors, update the proto enum, both conversion directions, and the `test_error_enums` coverage together. + +## Important internals and caveats + +### Protobuf schema compatibility + +Treat proto field numbers and enum discriminants as persisted wire format. New fields should generally use new field numbers and optional/repeated fields where old data must remain readable. Renaming a Rust field generated by `prost` is less important than changing a number or type, but generated names still affect conversion code. + +### `Stored*` compatibility helpers + +`StoredTransactionStatusMeta` and related `Stored*` structs model older bincode-compatible representations. Several fields use `#[serde(deserialize_with = "default_on_eof")]` so older serialized values can still be read after fields were added. Preserve these defaults when adding stored fields unless you have a deliberate migration plan. + +### Loaded addresses and deprecated bincode status metadata + +`TryFrom for StoredTransactionStatusMeta` rejects values with non-empty `loaded_addresses` because the deprecated bincode format cannot represent them. The protobuf path in `convert::generated::TransactionStatusMeta` does include loaded writable/read-only addresses. Do not route modern v0 transaction metadata through the deprecated stored/bincode conversion unless loaded addresses are impossible. + +### Panics versus fallible conversion + +Most protobuf-to-Solana conversions are fallible where malformed external-sized data is expected, but some conversions use `expect`/`unwrap` for fields that should be produced only by this crate's own encoding path, such as required message headers or pubkey/hash byte lengths. If you make these types ingest untrusted or externally produced protobuf bytes, consider whether those conversions need to become fallible and update consumers/tests accordingly. + +### Workspace patching + +The root workspace patches `solana-storage-proto` to `./storage-proto` because Solana dependencies may otherwise pull an upstream crate version with incompatible protobuf tooling. Keep the root and `test-integration` patches aligned when dependency or build-tool versions change. + +## Important invariants + +1. The crate directory is `storage-proto/`, but the package/crate contract is `solana-storage-proto` / `solana_storage_proto`; preserve this naming unless all workspace patches and consumers are migrated together. +2. Proto field numbers and enum numeric values must remain backward-compatible with persisted ledger bytes. +3. `prost` versions, `tonic-prost-build` output, and workspace comments about `solana-storage-proto` codegen must stay in sync. +4. `TransactionStatusMeta` conversion must preserve status, fee, balances, inner instructions, logs, token balances, rewards, loaded addresses, return data, compute units, and cost units semantics expected by Solana RPC/history consumers. +5. Transaction and instruction error mappings must be exhaustive for the generated enums covered by tests and must preserve MagicBlock-specific `TransactionError::CommitCancelled`. +6. Older stored data must remain readable where compatibility helpers exist; added serde fields should default safely for missing older data. +7. Ledger write/read paths must avoid extra encode/decode cycles, excessive allocation, and heavy logging in hot transaction-history paths. +8. Generated protobuf modules must remain included from `OUT_DIR`; `proto/*.proto` and `build.rs` are the editable sources of truth. + +## Common change areas and what to inspect + +### Add or change transaction-status fields + +Start with: + +- `storage-proto/proto/confirmed_block.proto` +- `storage-proto/src/convert.rs` conversions for `TransactionStatusMeta` +- `storage-proto/src/lib.rs` `StoredTransactionStatusMeta` if older bincode compatibility is affected +- `magicblock-ledger/src/store/api.rs` read/write paths and tests around `create_transaction_status_meta` + +Check that old data remains readable, RPC-facing metadata still matches Solana expectations, and loaded-address behavior is not regressed. + +### Update transaction or instruction error support + +Start with: + +- `storage-proto/proto/transaction_by_addr.proto` +- `impl TryFrom for TransactionError` +- `impl From for tx_by_addr::TransactionError` +- `test_transaction_error_encode` and `test_error_enums` in `storage-proto/src/convert.rs` + +Preserve numeric enum values. Add new enum values at the end unless a deliberate migration requires otherwise. + +### Change protobuf generation or build dependencies + +Start with: + +- `storage-proto/build.rs` +- `storage-proto/Cargo.toml` +- root `Cargo.toml` workspace `prost`, `protobuf-src`, and `[patch.crates-io]` entries +- `test-integration/Cargo.toml` patches + +Validate on a clean build if possible so stale `OUT_DIR` artifacts do not hide codegen problems. + +### Change ledger usage of generated types + +Start with: + +- `magicblock-ledger/src/database/columns.rs` `ProtobufColumn` implementation for `TransactionStatus` +- `magicblock-ledger/src/database/ledger_column.rs` protobuf encode/decode helpers +- `magicblock-ledger/src/store/api.rs` transaction status read/write functions + +This can affect persistence compatibility and RPC history. Run both storage-proto conversion tests and ledger tests. + +## Tests and validation + +For documentation-only changes: + +```bash +git diff --check -- .agents/context/crates/storage-proto.md .agents/context/crate-map.md AGENTS.md +``` + +For changes to this crate: + +```bash +cargo fmt +cargo nextest run -p solana-storage-proto +``` + +For codegen/schema changes, prefer a clean targeted build/test: + +```bash +cargo clean -p solana-storage-proto +cargo nextest run -p solana-storage-proto +``` + +For ledger persistence or consumer changes, also run: + +```bash +cargo nextest run -p magicblock-ledger +``` + +Broader baseline before handing off Rust behavior changes, time permitting: + +```bash +cargo clippy --workspace --all-targets -- -D warnings +cargo nextest run --workspace +``` + +Performance-sensitive paths touched: ledger status writes and history reads. If conversion logic, allocation patterns, or ledger encode/decode behavior changes, report whether targeted ledger tests were run and whether any performance validation was skipped. + +## Related docs + +- `AGENTS.md` for required agent-documentation workflow. +- `.agents/context/architecture.md` for the local persistence layer and ledger role. +- `.agents/context/crate-map.md` for crate ownership and consumers. +- `.agents/rules/testing-and-validation.md` for repository validation expectations. +- `storage-proto/README.md` for the short protobuf generation note. +- `magicblock-ledger/src/database/columns.rs`, `magicblock-ledger/src/database/ledger_column.rs`, and `magicblock-ledger/src/store/api.rs` for direct consumers. diff --git a/.agents/context/crates/test-kit.md b/.agents/context/crates/test-kit.md new file mode 100644 index 000000000..3d1aa576b --- /dev/null +++ b/.agents/context/crates/test-kit.md @@ -0,0 +1,320 @@ +# `test-kit` + +## Purpose + +`test-kit/` is the shared Rust test-support crate for validator unit tests, crate tests, and integration-test crates. It provides a lightweight in-process validator execution harness plus common logging and Solana convenience re-exports so tests can exercise scheduler, SVM execution, AccountsDb, Ledger, Magic Program, and RPC-adjacent behavior without duplicating setup code. + +High-level responsibilities: + +- create temporary `AccountsDb` and `Ledger` instances for tests; +- wire `magicblock-core::link` channels into a `magicblock-processor::TransactionScheduler`; +- build an SVM environment, load the `guinea` test program, fund test payers, and expose helpers for account setup; +- submit transactions through execute, schedule, simulate, and replay paths; +- expose test logging macros and devnet-availability skip helpers; +- re-export common Solana instruction/signing types and the `guinea` test program for concise tests. + +This crate is not production runtime code, but it wraps performance- and correctness-sensitive runtime crates. Changes here can hide or create false failures in scheduler/execution, ledger, Magic Program, RPC, and integration suites. Keep helpers faithful to production semantics unless a test-only shortcut is deliberate and documented in this guide. + +## Update requirement + +Update this guide in the same change whenever `test-kit` behavior or contracts change. In particular, update it for changes to: + +- `ExecutionTestEnv` construction, scheduler startup/mode handling, shutdown behavior, default fee, block time, or superblock sizing; +- helper semantics that mark accounts delegated, fund payers, advance slots, load programs, or write directly to `AccountsDb`/`Ledger`; +- public helper methods, re-exports, macros, or logging environment variables; +- the required location or build assumptions for `../programs/elfs/guinea.so`; +- validation commands or important consumer suites that should be run after harness changes. + +Also update this file if another crate changes an API that alters how this harness wires runtime services, such as processor scheduler handles, ledger block APIs, account flags, or core channel endpoints. + +## Where it sits in the repository + +| Path | Role | +|---|---| +| `test-kit/Cargo.toml` | Package manifest. Depends on `guinea`, `magicblock-accounts-db`, `magicblock-core`, `magicblock-ledger`, `magicblock-processor`, Solana transaction/instruction/signing crates, `tempfile`, `tokio`, and tracing/logging crates. | +| `test-kit/src/lib.rs` | Main public API. Defines `ExecutionTestEnv`, `CommitableAccount`, constants, transaction/account helpers, scheduler lifecycle helpers, and public re-exports. | +| `test-kit/src/macros.rs` | Logging/devnet helpers plus exported `init_logger!` and `skip_if_devnet_down!` macros. | +| `programs/guinea/` | Test-only program re-exported as `test_kit::guinea` and loaded into `ExecutionTestEnv`. | +| `programs/elfs/guinea.so` | Runtime artifact loaded by `ExecutionTestEnv` using relative path `../programs/elfs/guinea.so`. Tests that instantiate the harness require this ELF to exist at that path relative to the test process working directory. | +| `magicblock-processor/tests/` | Main direct consumer of `ExecutionTestEnv` for execution, scheduling, replay, replica ordering, simulation, fees, security, and ephemeral-account tests. | +| `magicblock-aperture/tests/` | Builds RPC test environments around `ExecutionTestEnv` and uses re-exported instruction/signing conveniences. | +| `magicblock-ledger/src/**` and `magicblock-ledger/tests/` | Use `init_logger!` for ledger unit/integration tests. | +| `programs/magicblock/src/**` | Uses `init_logger!` in Magic Program unit tests. | +| `test-integration/**` | Uses `init_logger!`, `Signer`, `Instruction`, `AccountMeta`, and `guinea` across cloning, committor, config, restore-ledger, schedule-intent, table-mania, and MagicBlock API suites. | + +Main consumers: + +- `magicblock-processor` tests use the full execution harness most heavily. +- `magicblock-aperture` tests embed `ExecutionTestEnv` behind live JSON-RPC/pubsub servers. +- `programs/magicblock`, `magicblock-ledger`, `magicblock-committor-service`, and integration crates mostly use logging macros and re-exports. + +Important upstream dependencies: + +- `magicblock-core::link` channel types and `TransactionSchedulerHandle` define the harness boundary for transaction submission and event observation. +- `magicblock-processor::{build_svm_env, TransactionScheduler, TransactionSchedulerState}` define execution semantics. +- `magicblock-accounts-db` and `magicblock-ledger` provide temporary persistence state. +- Solana account, instruction, keypair, signer, transaction, and status types define test API compatibility. + +## Public API shape / Main public types and APIs + +Public re-exports from `src/lib.rs`: + +- `pub use guinea;` exposes `test_kit::guinea::{ID, GuineaInstruction, ...}`. +- `pub use solana_instruction::*;` exposes `Instruction`, `AccountMeta`, and related instruction types. +- `pub use solana_signer::Signer;` keeps tests concise when calling `.pubkey()` on keypairs. +- `pub mod macros;` exposes logging/devnet helper functions in addition to exported macros. + +Key constants: + +| Item | Meaning | +|---|---| +| `ExecutionTestEnv::BASE_FEE` | Default base fee, currently `1000` lamports. | +| `BLOCK_TIME` | Scheduler block time used by the harness, currently `50ms`. | +| `SUPERBLOCK_SIZE` | Default superblock interval, currently `72000`. | + +### `ExecutionTestEnv` + +`ExecutionTestEnv` owns an in-process execution stack for tests: + +- `payers: Vec` — one generated payer per configured executor. +- `accountsdb: Arc` and `ledger: Arc` — temporary persistent state. +- `transaction_scheduler: TransactionSchedulerHandle` — async submission API exposed through core dispatch channels. +- `dispatch: DispatchEndpoints` — event/submission endpoints for tests that need account/status streams. +- `scheduler: Option` and `run_scheduler()` — deferred-start support for tests that need to enqueue work before launching the scheduler. +- `shutdown: CancellationToken` and `Drop` — cancels and joins the scheduler thread on environment drop. + +Important constructors: + +| Constructor | Use | +|---|---| +| `ExecutionTestEnv::new()` | Default primary-mode environment with `BASE_FEE`, one executor, and immediate scheduler startup. | +| `ExecutionTestEnv::new_with_config(fee, executors, defer_startup)` | Primary-mode environment with custom fee/executor count and optional deferred scheduler startup. | +| `ExecutionTestEnv::new_replica_mode(executors, defer_startup)` | Replica-mode environment for replay-ordering tests; does not pre-send `Primary` mode. | +| `ExecutionTestEnv::new_replica_mode_with_superblock_size(executors, defer_startup, superblock_size)` | Replica-mode variant with custom superblock interval. | + +Important methods: + +- Scheduler/mode helpers: `run_scheduler`, `switch_to_primary_mode`, `wait_for_scheduler_ready`, `yield_to_scheduler`, `wait_for_next_slot`. +- Slot helper: `advance_slot` writes a new `LatestBlockInner`, updates AccountsDb slot, and yields the current thread. +- Account helpers: `create_account_with_config`, `create_account`, `fund_account`, `fund_account_with_owner`, `get_account`, `try_get_account`, `get_payer`. +- Transaction helpers: `build_transaction`, `build_transaction_with_signers`, `execute_transaction`, `schedule_transaction`, `simulate_transaction`, `replay_transaction`, `get_transaction`. + +### `CommitableAccount` + +`CommitableAccount<'db>` is a mutable account snapshot returned by `get_account`, `try_get_account`, and `get_payer`: + +- it dereferences to `AccountSharedData` and supports mutable account edits; +- changes are local to the wrapper until `commit()` reinserts the account into `AccountsDb`; +- dropping without `commit()` discards modifications. + +### Macros and logging helpers + +`test-kit/src/macros.rs` provides: + +- `init_logger_for_tests()` — initializes `LogTracer` and a tracing subscriber, respecting `RUST_LOG`; if `RUST_LOG_STYLE=test` and `TEST_FILE_PATH` is set, it appends the test file stem at debug level. +- `init_logger_for_test_path(full_path_to_test_file)` — legacy path-aware logger used by `init_logger!`; if `RUST_LOG` ends with `,` or is `info`, it appends `=`. +- `init_logger!()` — exported macro that passes `std::file!()` to `init_logger_for_test_path`. +- `is_devnet_up().await` — checks `https://api.devnet.solana.com` with a nonblocking RPC client. +- `skip_if_devnet_down!()` — exported async-test macro that logs a warning and returns early when devnet is unavailable. + +Logger initialization intentionally ignores repeated-initialization errors so tests can call it freely. + +## Runtime flows + +### Primary-mode execution harness setup + +1. `ExecutionTestEnv::new()` calls `new_with_config(BASE_FEE, 1, false)`. +2. Construction initializes tracing through `init_logger!()`. +3. A temporary directory is created and used to open both `AccountsDb` and `Ledger`. +4. `magicblock_core::link::link()` creates dispatch endpoints and validator channels. +5. The current ledger blockhash seeds `build_svm_env(&accountsdb, blockhash, fee)`. +6. One payer keypair is generated per configured executor. +7. A mode channel is created and `SchedulerMode::Primary` is pre-sent for primary-mode constructors. +8. The harness advances to slot `1` before loading programs. +9. `load_upgradeable_programs` loads `(guinea::ID, "../programs/elfs/guinea.so")` into AccountsDb. +10. `TransactionSchedulerState` is assembled from AccountsDb, Ledger, channel receivers/senders, SVM environment, feature set, shutdown token, mode receiver, pause permit, block time, and superblock size. +11. A `TransactionScheduler` is created. It is spawned immediately unless `defer_startup` is true. +12. Each payer is funded with `LAMPORTS_PER_SOL` through `fund_account`, which creates delegated system accounts. + +Pitfalls: + +- The relative guinea ELF path is part of the current test harness contract. If tests run from a different working directory or before SBF artifacts are built, harness construction can fail. +- `payers` length equals the configured executor count. Most tests use at least one executor; adding zero-executor use requires auditing payer indexing and scheduler behavior. + +### Deferred scheduler flow + +1. Construct with `defer_startup = true`; the scheduler is stored in `ExecutionTestEnv::scheduler` and no scheduler thread is running. +2. Tests may enqueue transactions through `schedule_transaction` or inspect channels before execution begins. +3. `run_scheduler()` takes the stored scheduler and spawns it, storing the join handle. +4. Calling `run_scheduler()` again after the scheduler has been taken is a no-op. + +Use this only for tests that intentionally need deterministic pre-start scheduling. For ordinary execution tests, prefer immediate startup. + +### Replica and replay flow + +1. `new_replica_mode` constructs the same storage/channel stack but does not pre-send `SchedulerMode::Primary`. +2. The scheduler starts in replica-oriented behavior and accepts replay submissions via `replay_transaction`. +3. `replay_transaction(persist, txn)` submits a `ReplayPosition { slot: 0, index: 0, persist }`. +4. `switch_to_primary_mode()` can later send `SchedulerMode::Primary` when a test needs a transition. + +Replica-mode tests depend on ordering and persistence semantics; avoid changing replay defaults without updating `magicblock-processor/tests/replay.rs` and `magicblock-processor/tests/replica_ordering.rs`. + +### Account and transaction helper flow + +```text +test creates/funds account + -> helper writes AccountSharedData directly into AccountsDb + -> helper marks many created/funded accounts delegated + -> test builds transaction with current ledger latest_blockhash + -> TransactionSchedulerHandle execute/simulate/schedule/replay path + -> processor updates AccountsDb/Ledger/events + -> test reads status/account helpers +``` + +Direct AccountsDb writes are a test shortcut. They bypass cloning, delegation-record fetching, and RPC/account-sync behavior. Use integration tests with live validators when the behavior under test depends on those layers. + +## Important internals and caveats + +### Delegated-by-default account helpers + +`create_account_with_config` and `fund_account_with_owner` call `account.set_delegated(true)` before inserting into AccountsDb; `create_account` and `fund_account` inherit that behavior. This makes local execution helpers convenient because MagicBlock SVM access validation allows delegated writable accounts. If a test needs an undelegated or confined account, it must fetch the account with `get_account`, mutate flags/owner/data, and `commit()` the change. + +### Payer rotation + +`build_transaction` and `build_transaction_with_signers` use an atomic counter and rotate across `payers[index % payers.len()]`. This lets multi-executor tests avoid a single payer becoming a universal write-lock bottleneck. Preserve this behavior when changing transaction builders unless the affected scheduling tests are updated deliberately. + +### Commitable account snapshots + +`get_account` clones the current account into a `CommitableAccount`; mutations do not affect AccountsDb until `commit()`. This pattern is easy to misuse in tests: always call `commit()` after changing lamports, data, owner, or account flags. + +### Scheduler readiness waits + +`wait_for_scheduler_ready` waits for the latest block slot to advance and times out after five seconds. `yield_to_scheduler` only yields and sleeps briefly. Choose the stronger readiness helper when a test depends on the scheduler processing mode changes or slot ticks; use the lighter helper for replay tests that should not require slot advancement. + +### Logger initialization + +Both logger helpers ignore repeated initialization failures from `LogTracer` and tracing subscriber setup. This is intentional for parallel test binaries. Do not replace it with panicking initialization unless every caller is audited. + +### Devnet skip helper + +`skip_if_devnet_down!()` performs a real network call to public Solana devnet and returns from the current async test when unavailable. Keep this out of unit tests and deterministic local tests; prefer it only for tests that already require devnet. + +## Important invariants + +1. `test-kit` must remain test-only support; do not make production runtime crates depend on it outside dev/test contexts. +2. `ExecutionTestEnv` must cancel the scheduler and join its thread on drop to avoid leaking background workers between tests. +3. The harness must wire scheduler channels consistently with `magicblock-core::link`; otherwise tests may pass while production orchestration differs. +4. Account helpers that currently create delegated accounts must keep that behavior or all consumers relying on writable local execution must be updated. +5. Transaction builders must sign with the selected payer and use `ledger.latest_blockhash()` so tests exercise current blockhash behavior. +6. `simulate_transaction` must not persist account or ledger side effects; consumers use it to verify simulation isolation. +7. Replica-mode constructors must not silently switch to primary mode; replay-ordering tests rely on the mode distinction. +8. `CommitableAccount` mutations must remain explicit via `commit()`; implicit write-back on drop would change many tests' semantics. +9. Logger macros must stay safe to call multiple times in one test process. +10. The guinea program ID and loaded ELF must stay aligned with `test_kit::guinea`; stale ELF paths or mismatched program IDs make harness results misleading. + +## Common change areas and what to inspect + +### Change execution harness startup or scheduler wiring + +Start with: + +- `test-kit/src/lib.rs` constructors and `TransactionSchedulerState` assembly; +- `magicblock-core/src/link/**` for channel/API changes; +- `magicblock-processor/src/scheduler/**` for scheduler state and mode handling; +- `magicblock-processor/tests/scheduling.rs`, `replay.rs`, and `replica_ordering.rs`. + +Check deferred startup, primary mode pre-send, replica mode, shutdown, and event channel behavior. + +### Change account setup helpers or default flags + +Start with: + +- `create_account_with_config`, `fund_account_with_owner`, `get_account`, `try_get_account`, and `CommitableAccount::commit`; +- `magicblock-processor/tests/security.rs`, `fees.rs`, `ephemeral_accounts.rs`, and `execution.rs`; +- MagicBlock SVM access-validation rules in `.agents/specs/validator-specification.md`. + +Be explicit about whether helper-created accounts should be delegated, undelegated, confined, ephemeral, or system-owned. + +### Change transaction builders or payer behavior + +Start with: + +- `build_transaction`, `build_transaction_with_signers`, and `payer_index` handling; +- scheduling tests that depend on avoiding payer lock contention; +- fee tests that inspect payer balances and gasless mode. + +Preserve payer rotation unless deliberately changing lock-conflict behavior in tests. + +### Change logging macros or devnet helpers + +Start with: + +- `test-kit/src/macros.rs`; +- `magicblock-ledger` and `programs/magicblock` unit tests using `init_logger!`; +- `test-integration/**` tests using `init_logger!` or devnet skips. + +Keep repeated initialization non-fatal and document any environment-variable changes. + +### Change guinea loading or re-exports + +Start with: + +- `test-kit/src/lib.rs` `pub use guinea` and `load_upgradeable_programs` call; +- `programs/guinea/` and generated/build artifacts under `programs/elfs/`; +- processor, aperture, and integration tests importing `test_kit::guinea`, `Instruction`, `AccountMeta`, or `Signer`. + +Validate from the same working directory used by CI/test commands so the relative ELF path is exercised. + +## Tests and validation + +For documentation-only changes: + +```bash +git diff --check -- .agents/context/crates/test-kit.md .agents/context/crate-map.md AGENTS.md prompts/ai/agents/03_batch-crates-plan.md +``` + +For changes to this crate: + +```bash +cargo fmt +cargo nextest run -p test-kit +``` + +Because `test-kit` has no meaningful standalone tests today, also run the smallest consumer suite affected by the change. Common targeted choices: + +```bash +cargo nextest run -p magicblock-processor +cargo nextest run -p magicblock-ledger +cargo nextest run -p magicblock-aperture +cargo nextest run -p magicblock-committor-service +``` + +For integration-test helper or logging changes, run the relevant integration suite from `test-integration/`, for example: + +```bash +cd test-integration +make test-cloning +make test-magicblock-api +make test-restore-ledger +make test-schedule-intents +``` + +Broader baseline before handing off Rust behavior changes, time permitting: + +```bash +cargo clippy --workspace --all-targets -- -D warnings +cargo nextest run --workspace +``` + +Performance-sensitive paths touched: the harness exercises scheduler/execution/ledger paths but is not production code. If a change alters scheduler configuration, executor count, payer rotation, block timing, channel wiring, or direct storage setup, report whether processor scheduling/execution tests were run and whether any remaining performance/concurrency risk is unmeasured. + +## Related docs + +- `AGENTS.md` for required agent-documentation workflow. +- `.agents/specs/validator-specification.md` for delegated/ephemeral access rules and scheduler behavior that tests often exercise. +- `.agents/context/architecture.md` for execution and local persistence layer boundaries. +- `.agents/context/crate-map.md` for crate ownership and consumers. +- `.agents/rules/testing-and-validation.md` for repository validation expectations. +- `.agents/context/crates/magicblock-core.md` for channel and shared-type boundaries used by the harness. +- `.agents/context/crates/magicblock-magic-program-api.md` and `.agents/context/crates/magicblock-task-scheduler.md` for protocol/task tests that commonly consume `test-kit` helpers. +- `magicblock-processor/tests/` and `magicblock-aperture/tests/setup.rs` for representative full-harness call sites. diff --git a/.agents/context/overview.md b/.agents/context/overview.md new file mode 100644 index 000000000..8bb44e2b3 --- /dev/null +++ b/.agents/context/overview.md @@ -0,0 +1,66 @@ +# MagicBlock Validator Overview + +This is the shortest orientation document for agents. It explains what this repository is for and points to the files that contain the deeper details. Avoid duplicating detailed protocol or crate architecture here. + +## What this validator is + +The MagicBlock validator is a specialized Solana Virtual Machine (SVM) runtime for Ephemeral Rollups (ERs). It executes transactions locally against delegated and ephemeral state, while preserving a path for state to be synchronized back to Solana. + +In practical terms, the validator: + +- accepts Solana-style RPC and transaction traffic, +- clones required base-layer accounts/programs into local state, +- executes transactions with MagicBlock-specific SVM rules, +- persists local account and ledger state, +- schedules commits/undelegations back to the base layer, +- supports task scheduling, replication, metrics, and operator/admin flows. + +## Core concepts + +| Concept | Short meaning | +|---|---| +| Ephemeral Rollup | A low-latency SVM execution environment attached to Solana. | +| Delegated account | A Solana state account locked by the Delegation Program and assigned to an ER validator for local execution. | +| Ephemeral account | An ER-only account sponsored by delegated state. | +| Commit | Synchronize ER state back to Solana while keeping the account delegated. | +| Commit and undelegate | Synchronize ER state and return ownership to the original program on Solana. | +| Magic Program | ER program used for commit scheduling, intent bundles, tasks, ephemeral accounts, and validator operations. | +| Committor | Validator-side service that realizes scheduled base-layer intents. | + +## Which agent doc to read + +- `.agents/rules/validator-goals.md` — read when deciding whether a change is aligned with product/system goals. +- `.agents/specs/validator-specification.md` — read before changing protocol behavior: delegation, cloning, execution rules, commits, undelegation, Magic Actions, ephemeral accounts, RPC/router behavior, or recovery. +- `.agents/context/architecture.md` — read before changing service wiring or interactions between crates. +- `.agents/context/crate-map.md` — read to find which crate owns an area and which other crates may be affected. +- `.agents/rules/testing-and-validation.md` — read before deciding how to validate a change. +- `.agents/memory/agent-memory-and-docs.md` — read when a task may discover or change durable repository knowledge that future agents should remember. + +## Non-negotiable agent rules + +Before changing behavior, check the relevant agent docs and preserve the validator's goals, invariants, and performance expectations. + +**Security comes before everything else, including performance.** This validator handles real funds: a security regression can cause the validator or its customers to lose money. Under no circumstances may a change weaken security. Concretely: + +- **Signer requirements must never be relaxed.** Anything that currently requires a signer must keep requiring it. Never remove, weaken, or bypass signer/authority checks for convenience or performance. +- **State must stay in sync with the Solana base layer.** Subscriptions, fetching, delegation-record resolution, and account synchronization must remain at least as correct and stable as they are now. Never allow local state to silently diverge from base-layer truth. +- **Attacker-triggerable conditions are forbidden.** Do not introduce or expose conditions an attacker could trigger, including race conditions, timing/ordering attacks, validator stalls/deadlocks/hangs, resource exhaustion, or any path that lets untrusted input corrupt state or settlement. + +If you cannot make a change without weakening any of the above, stop and surface the conflict explicitly rather than proceeding. A performance win never justifies a security loss. + +The validator is performance-sensitive infrastructure. Do not degrade critical RPC, account sync, scheduling, execution, persistence, or settlement paths unless there is no viable alternative; if such a tradeoff is unavoidable, state it explicitly with expected impact and mitigation. + +When behavior changes, update the relevant `.agents/` file in the same change. These docs must remain synchronized with the real implementation; stale guidance is worse than no guidance. + +When you discover durable knowledge that is missing or wrong in `./.agents/`—for example a feature behavior, protocol invariant, crate responsibility, validation workflow, recurring pitfall, or performance constraint—update the most relevant existing document, or create a focused new one if no suitable document exists. If you cannot make the documentation update, report the exact follow-up before handing off. + +## Things to be especially careful with + +- **Security-critical paths above all** (see non-negotiable rules): signer/authority enforcement, base-layer sync correctness, and any attacker-triggerable race, timing, stall, or exhaustion condition. +- Writable account access rules for delegated, ephemeral, confined, and explicitly allowed accounts. +- Delegation and undelegation lifecycle transitions. +- Commit intent durability and restart recovery. +- Account cloning distinctions between delegated, undelegated/read-only, fee-payer, program, and large accounts. +- Scheduler/account-lock correctness and executor parallelism. +- Avoiding avoidable latency, throughput, allocation, lock-contention, or I/O regressions in hot paths. +- Startup/shutdown ordering and persistent store flushing. diff --git a/.agents/memory/agent-memory-and-docs.md b/.agents/memory/agent-memory-and-docs.md new file mode 100644 index 000000000..321847032 --- /dev/null +++ b/.agents/memory/agent-memory-and-docs.md @@ -0,0 +1,58 @@ +# Agent Memory and Documentation Stewardship + +This file defines how agents keep repository knowledge current. Treat the files in `./.agents/` as the repository's persistent agent memory: when an agent discovers durable information that future agents should rely on, the agent must update these documents in the same change whenever practical. + +## Core rule + +Whenever you discover that the current `./.agents/` guidance is missing, incomplete, inaccurate, or stale, update it before finishing the task. + +This applies even when the discovery is incidental to another task. Do not leave known gaps for a future agent unless you are blocked from editing documentation; if blocked, report the exact missing update and where it should go. + +**Documented elsewhere is not an excuse to skip the update.** A durable fact being present in the source code, a code comment, an unrelated `.agents/` file, an external repo, or any other location does *not* satisfy this rule. The test is not "does this fact exist somewhere?" — it is "would an agent who opens the single most relevant `.agents/` document for this concern find it there?" If the answer is no, you must capture it in that document, even if a related or partial mention already lives in a different file. Each `.agents/` document must be self-sufficient for an agent working in the area it covers; never rely on the reader having read another file. When the same fact is genuinely relevant in two places, put the full explanation in the most specific canonical file and add a short pointer (not a silent omission) from the other. + +Concretely: if you investigate code to answer a question and find that the mechanism, behavior, or invariant you relied on is *not* spelled out in the crate/spec/rules document an agent would consult for that area, document it there now — regardless of whether a higher-level or differently-scoped file happens to mention it. + +**This rule applies to read-only and question-answering tasks too, not only code changes.** If you investigate the code to answer a question and learn a durable fact — especially a divergence from agave/Solana upstream behavior (e.g. a missing limit, different default, or relaxed validation) — capture it before finishing, then report it per the Final response requirement below. + +## What must be captured + +Update or create agent documentation when you discover durable information such as: + +- a feature, behavior, invariant, lifecycle rule, or protocol detail that is not documented yet; +- a documented behavior that is wrong, misleading, renamed, removed, or implemented differently; +- a new testing, validation, debugging, benchmarking, or operational workflow; +- a crate responsibility, API boundary, dependency, startup/shutdown interaction, or hot-path performance consideration; +- a recurring pitfall, failure mode, race condition, recovery requirement, or security/correctness constraint; +- a new crate-specific area that needs its own guide under `.agents/context/crates/`; +- any other knowledge that future agents should remember to make safe, correct, and efficient changes. + +Do not document one-off observations that are only relevant to the current local environment unless they reveal a reusable workflow, constraint, or repository behavior. + +## Where to put updates + +Prefer updating the most specific existing file: + +- `.agents/rules/validator-goals.md` for goals, correctness constraints, and decision criteria. +- `.agents/specs/validator-specification.md` for protocol-level behavior and lifecycle rules. +- `.agents/context/architecture.md` for cross-crate service interactions and boundaries. +- `.agents/context/crate-map.md` for crate ownership, dependencies, consumers, and where to start. +- `.agents/rules/testing-and-validation.md` for validation commands, debugging workflows, and test selection. +- `.agents/context/crates/.md` for crate-specific behavior, APIs, invariants, pitfalls, or tests. + +If no suitable document exists, create a new focused file in `.agents/` or `.agents/context/crates/`. When adding, removing, renaming, or reorganizing agent documentation, update `AGENTS.md` so the entrypoint remains accurate. + +## How to update + +Keep updates concise and operational: + +1. State the behavior or workflow future agents need to know. +2. Include the owning crate/path/API when relevant. +3. Include validation commands or tests when the discovery changes how work should be checked. +4. Call out performance-sensitive paths and tradeoffs if relevant. +5. Avoid duplicating large blocks across files; link or point to the canonical file instead. + +When behavior changes in code, update the docs in the same change as the implementation. When the task is documentation-only, verify that file paths and cross-references remain accurate. + +## Final response requirement + +When finishing a task, report whether agent documentation was updated. If it was not updated, state why no durable agent-memory update was needed, or list the blocked documentation follow-up explicitly. diff --git a/.agents/personas/README.md b/.agents/personas/README.md new file mode 100644 index 000000000..8d1b5290f --- /dev/null +++ b/.agents/personas/README.md @@ -0,0 +1,3 @@ +# Agent Personas + +This directory is reserved for specialized agent profiles, such as QA, security-review, or performance-review personas. Add a focused Markdown file here when a durable specialized profile is needed. diff --git a/.agents/rules/testing-and-validation.md b/.agents/rules/testing-and-validation.md new file mode 100644 index 000000000..89205a221 --- /dev/null +++ b/.agents/rules/testing-and-validation.md @@ -0,0 +1,214 @@ +# Testing and Validation + +This file tells agents how to validate changes. Keep it focused on commands and workflow. If validation behavior changes, update this file in the same change. + +## Baseline rule + +Every code change must be validated. Use the repository-local `mbv-check` skill for Rust changes once it is available in the agent environment. The intent of that skill is to run this validator's standard Rust quality gate and help fix any failures. + +The validator is performance-sensitive. When a change touches critical RPC, account synchronization, scheduler/executor, AccountsDb/ledger, replication, or committor paths, validation should include the smallest available test or measurement that can reveal latency, throughput, contention, allocation, or I/O regressions. If no practical performance validation is run, say so and explain the residual risk. + +The validator is also security-critical, and security outranks performance (see `.agents/rules/validator-goals.md` and `.agents/specs/validator-specification.md`). Before handing off any change, explicitly verify it does not: + +- relax a signer/authority requirement that exists today, +- weaken base-layer synchronization (account fetching, subscriptions, delegation-record resolution, slot/commitment/freshness handling), +- introduce an attacker-triggerable condition (race, time-of-check/time-of-use gap, ordering/timing attack, validator stall/deadlock/hang, or unbounded resource consumption). + +When a change touches signer/authority checks, account-sync correctness, lock acquisition/ordering, or any path driven by untrusted RPC/transaction input, add or run the test that exercises the security-relevant behavior (for example concurrency/race tests, delegation/sync ordering tests, or auth-rejection tests). If you cannot validate a security-relevant path, say so and call out the residual risk explicitly — do not treat it as low priority. + +Until or unless the skill provides a more specific command set, treat the required baseline as: + +```bash +make fmt +cargo clippy --workspace --all-targets -- -D warnings +cargo nextest run --workspace +``` + +Run tests with `cargo nextest`. If `nextest` is not available in the environment, fall back to `cargo test` for the same scope (for example `cargo test --workspace`). This is the single source of truth for that choice; other sections assume it. + +For small or targeted changes, run the smallest relevant test first, then run the broader checks before handing off if time allows. + +For documentation-only changes, at minimum verify the changed files are in the right location and that links/filenames mentioned in `AGENTS.md` and `.agents/` stay in sync. + +When you discover a new reliable way to test, debug, benchmark, or validate the codebase, update this file or the relevant crate-specific guide before finishing. If the approach is specific to one crate, prefer `.agents/context/crates/.md` and link from broader docs only when needed. + +## Choosing what to run + +Use the crate map and touched files to pick tests: + +- Runtime/execution changes: test the affected crate and relevant processor/account tests. +- RPC changes: test the affected aperture/API area and any matching integration suite. +- Delegation/cloning changes: run chainlink/cloning integration tests. +- Commit/undelegation changes: run committor, schedule intent, and related integration tests. +- Config changes: run config tests and at least one validator startup path. +- Task scheduler changes: run task scheduler tests. +- Ledger/recovery changes: run ledger restore tests. + +Always report exactly which commands were run and whether they passed. Also report any performance-sensitive paths touched, what performance validation was performed, and any unavoidable performance tradeoff or unmeasured risk. + +## Workspace checks + +Common root-level commands: + +```bash +make fmt +cargo clippy --workspace --all-targets -- -D warnings +cargo nextest run --workspace +``` + +Useful targeted forms: + +```bash +cargo nextest run -p +cargo nextest run -p --no-capture +``` + +`cargo test` and `cargo nextest` both run tests; use one of them rather than both for the same scope. Prefer `cargo nextest` when available. Use `cargo test` when `nextest` is unavailable, when you need libtest flags such as `--exact` or `--test-threads=1`, or when matching the integration runner's behavior exactly. + +## Integration test structure + +Integration tests live under `test-integration/`. The integration workspace has its own `Cargo.toml`, test crates, test programs, and Makefile. + +Run all integration tests: + +```bash +cd test-integration +make test +``` + +Run one integration suite through the runner: + +```bash +cd test-integration +make test-chainlink +make test-cloning +make test-restore-ledger +make test-magicblock-api +make test-table-mania +make test-committor +make test-pubsub +make test-config +make test-schedule-intents +make test-task-scheduler +``` + +The Makefile uses `RUN_TESTS=` internally. You can also invoke it directly: + +```bash +cd test-integration +RUN_TESTS=cloning make test +RUN_TESTS=committor_intent_executor make test +``` + +The integration runner builds required SBF programs via `make programs` as dependencies of `make test`. + +## Isolating one integration test + +For fast debugging, start only the validators needed by a suite in one terminal, then run the desired Rust test directly in another terminal. + +### 1. Start validators and leave them running + +From terminal A: + +```bash +cd test-integration +make setup--devnet +# or +make setup--ephem +# or +make setup--both +``` + +Examples: + +```bash +cd test-integration +make setup-cloning-both +make setup-chainlink-devnet +make setup-magicblock-api-both +make setup-pubsub-both +make setup-schedule-intents-both +make setup-task-scheduler-devnet +``` + +The setup targets set `SETUP_ONLY` and then wait for Ctrl-C. Keep that terminal open while running the isolated test. + +Available setup targets are listed by: + +```bash +cd test-integration +make list +``` + +### 2. Run the specific test directly + +From terminal B, run the test in the relevant integration crate. + +Prefer `cargo nextest` when available: + +```bash +cd test-integration +RUST_LOG=info cargo nextest run -p --test --no-capture +``` + +Or with an exact nextest expression: + +```bash +cd test-integration +RUST_LOG=info cargo nextest run -p --test -E 'test()' --no-capture +``` + +Use `cargo test` instead when you need libtest-only flags such as `--test-threads=1` or `--exact`: + +```bash +cd test-integration +RUST_LOG=info cargo test -p --test -- --test-threads=1 --nocapture +``` + +### 3. Stop validators + +Return to terminal A and press Ctrl-C. Do not leave local validators running after the test. + +## Suite names and setup targets + +Common suite names used by the Makefile: + +| Area | Runner target | Setup target(s) | Test crate/package | +|---|---|---|---| +| Chainlink/account sync | `make test-chainlink` | `make setup-chainlink-devnet` | `test-chainlink` | +| Cloning | `make test-cloning` | `make setup-cloning-devnet`, `make setup-cloning-ephem`, `make setup-cloning-both` | `test-cloning` | +| Ledger restore/recovery | `make test-restore-ledger` | `make setup-restore-ledger-devnet` | `test-ledger-restore` | +| MagicBlock API | `make test-magicblock-api` | `make setup-magicblock-api-devnet`, `make setup-magicblock-api-ephem`, `make setup-magicblock-api-both` | `test-magicblock-api` | +| Pubsub | `make test-pubsub` | `make setup-pubsub-devnet`, `make setup-pubsub-ephem`, `make setup-pubsub-both` | `test-pubsub` | +| Config | `make test-config` | `make setup-config-devnet` | `test-config` | +| Schedule intents | `make test-schedule-intents` | `make setup-schedule-intents-devnet`, `make setup-schedule-intents-ephem`, `make setup-schedule-intents-both` | `test-schedule-intent` | +| Task scheduler | `make test-task-scheduler` | `make setup-task-scheduler-devnet` | `test-task-scheduler` | +| TableMania | `make test-table-mania` | `make setup-table-mania-devnet` | `test-table-mania` | +| Committor | `make test-committor` and narrower committor targets | `make setup-committor-devnet` | `test-committor-service` | + +Committor has narrower Make targets for long suites: + +```bash +make test-committor-ix-singles +make test-committor-preparators +make test-committor-ix-order +make test-committor-ix-multi +make test-committor-bundles +make test-committor-intent-bundles +make test-committor-bundles-heavy +make test-committor-commitfinalize +make test-committor-intent-executor +make test-committor-intent-executor-recovery +``` + +## Reporting validation + +When finishing a task, include: + +- commands run, +- pass/fail result, +- if skipped, why it was skipped, +- any remaining risk, especially for integration tests that require validators or long-running suites, +- any performance-sensitive paths touched and whether performance regression risk was measured, reasoned about, or left unmeasured, +- any security-relevant paths touched (signer/authority enforcement, base-layer sync, locking/concurrency, untrusted-input handling), confirmation that no security property was weakened, and how that was checked or what residual risk remains, +- whether agent documentation was updated for any durable discovery, or why no update was needed. diff --git a/.agents/rules/validator-goals.md b/.agents/rules/validator-goals.md new file mode 100644 index 000000000..05d89a831 --- /dev/null +++ b/.agents/rules/validator-goals.md @@ -0,0 +1,156 @@ +# MagicBlock Validator Goals + +This file states the goals that should guide implementation decisions. It intentionally avoids protocol details; use `.agents/specs/validator-specification.md` for exact behavior and `.agents/context/architecture.md` for crate/service interactions. + +## Overriding goal: security + +Security is the highest-priority goal and overrides every other goal in this document, including performance. The validator custodies and settles real funds; a single security regression can make the validator or its customers lose money. When any other goal conflicts with security, security wins. + +No change may, under any circumstances: + +- **Relax signer/authority requirements.** Whatever requires a signer today must keep requiring it. Never drop, weaken, or work around signer or authority checks. +- **Let local state drift out of sync with the Solana base layer.** Subscriptions, fetching, delegation-record resolution, and account synchronization must stay at least as correct and stable as they are now. +- **Create attacker-triggerable failure modes.** No race conditions, timing/ordering attacks, validator stalls/deadlocks/hangs, resource exhaustion, or any condition that untrusted input can exploit to corrupt state, bypass validation, or affect settlement. + +If a desired change cannot be made without compromising any of the above, do not make it; surface the conflict explicitly instead. + +## Primary goal + +The validator must provide a Solana-compatible, low-latency execution environment for delegated Solana state, then synchronize that state back to Solana safely through MagicBlock's commit and undelegation flows. + +A good change preserves: + +1. **Security** — signer/authority enforcement, base-layer sync integrity, and freedom from attacker-triggerable conditions. This is non-negotiable and takes precedence over everything below. +2. **Compatibility** — clients and programs should work like normal Solana where the ER model allows it. +3. **ER correctness** — only accounts that are allowed to change in the ER may change in the ER. +4. **Settlement safety** — commits, undelegations, and base-layer actions must be explicit, durable, and recoverable. +5. **Performance** — critical RPC, account synchronization, scheduling, execution, persistence, and settlement paths must remain low-latency and high-throughput, but never at the cost of security. + +## Product goals + +### Real-time execution + +The validator should support applications that need very low latency and high throughput. + +Prefer changes that: + +- keep RPC, scheduling, and execution paths lean, +- avoid blocking critical loops, +- preserve parallel execution for unrelated accounts, +- avoid unnecessary allocations, lock contention, I/O, polling, cloning, serialization, and logging in hot paths, +- maintain clear separation between async service work and CPU-bound transaction execution. + +Do not accept performance regressions unless there is no viable alternative. If correctness or safety requires a slower path, document the tradeoff, expected impact, and mitigation in the change handoff. + +### Low-cost user experience + +The ER is expected to support gasless-feeling or low-cost application flows. + +Prefer changes that: + +- do not blindly import base-layer fee assumptions into ER execution, +- preserve fee-payer and sponsor behavior where the ER intentionally differs from Solana, +- keep commit/settlement fees explicit in the commit pipeline. + +### Solana composability + +Delegated state should still compose with Solana programs and read-only Solana state. + +Prefer changes that: + +- keep standard program execution familiar, +- preserve read access to cloned base-layer accounts/programs, +- avoid fragmenting application state into incompatible ER-only copies unless the account is explicitly ephemeral. + +### Safe settlement + +Local ER state must have a safe path back to the base layer. + +Prefer changes that: + +- keep commit and undelegation scheduling explicit, +- preserve commit ordering/nonce requirements, +- recover pending settlement work after restart, +- do not make local success look like base-layer settlement before the settlement pipeline has actually run. + +### High availability and observability + +The validator should remain operable in production-like deployments. + +Prefer changes that: + +- keep replication semantics intentional, +- make failures observable, +- avoid hiding commit, clone, scheduler, or RPC errors, +- preserve metrics/admin/operator visibility. + +## Security goals + +Security constraints are mandatory and override performance and convenience. See the overriding goal at the top of this file. In practice: + +### Signer and authority enforcement + +- Every instruction/operation that requires a signer or specific authority today must keep requiring it. This includes Magic Program operations, ephemeral-account create signers, validator-signed `AcceptScheduleCommits`, commit/undelegation authority, and admin/operator entrypoints. +- Never replace an authenticated path with an unauthenticated one, infer authority from untrusted input, or skip checks on an assumed "fast path." + +### Base-layer synchronization integrity + +- The validator's correctness depends on local state faithfully tracking Solana. Subscriptions, fetching, delegation-record resolution, slot/commitment handling, and freshness checks must stay at least as strong as today. +- Do not introduce paths where the validator serves or executes against stale, forged, or out-of-sync account state, or where it can miss base-layer updates that affect delegation/undelegation truth. + +### Resistance to attacker-triggerable conditions + +- Untrusted clients submit transactions and RPC requests; assume inputs are adversarial. No change may add a condition an attacker can trigger to harm the validator or other users. +- Specifically forbidden: race conditions, time-of-check/time-of-use gaps, ordering/timing attacks, validator stalls/deadlocks/hangs, unbounded resource consumption (CPU, memory, disk, subscriptions, locks), and any path letting one user's input affect another user's funds or state. +- Preserve existing atomicity, locking, deduplication, and bounds; they are often security controls, not just performance optimizations. + +## Correctness goals + +### Account access + +The central correctness goal is that ordinary ER execution must not mutate arbitrary Solana state. Writable accounts must satisfy the MagicBlock access model described in `.agents/specs/validator-specification.md`. + +### Lifecycle transitions + +Delegation, commit, commit-and-undelegate, and ephemeral account lifecycle transitions must remain coherent across: + +- local account flags/representation, +- SVM execution, +- Magic Program scheduling, +- committor service execution, +- restart recovery. + +### Persistence and recovery + +Persistent state is part of the system contract. Changes must not break recovery for: + +- local account state, +- ledger/history, +- pending commit intents, +- scheduled tasks, +- replica/primary state where applicable. + +## Non-goals + +Do not optimize for making the validator identical to a normal Solana validator when that conflicts with ER semantics. + +Do not bypass delegation records, access validation, commit nonces, or intent persistence to simplify implementation. + +Do not trade away any security property — signer/authority enforcement, base-layer sync integrity, or resistance to attacker-triggerable conditions — for performance, simplicity, or compatibility. There is no acceptable tradeoff here. + +Do not add protocol behavior in an isolated crate without updating the docs and checking the cross-crate architecture. + +## Change checklist + +Before finishing a feature, bug fix, or refactor, ask: + +0. **Security first:** Did I keep every existing signer/authority requirement? Did I keep base-layer synchronization (subscriptions/fetching/delegation resolution) at least as correct and stable? Did I avoid introducing any attacker-triggerable race, timing, stall, deadlock, or resource-exhaustion condition? If any answer is "no," the change must not ship. +1. Did I preserve the delegated/ephemeral writable-account invariant? +2. Did I preserve the distinction between base-layer state, cloned local state, and ER-only state? +3. Did I preserve commit and undelegation lifecycle behavior? +4. Did I preserve restart recovery for any pending or persisted work? +5. Did I avoid blocking critical scheduler/RPC/executor paths? +6. Did I avoid degrading critical-path latency, throughput, memory use, lock contention, and I/O behavior? +7. If a performance tradeoff was unavoidable, did I explicitly call out why, the expected impact, and mitigation? +8. Did I update the relevant file in `.agents/` if behavior changed? +9. Did I update or create agent documentation for any newly discovered durable behavior, workflow, pitfall, invariant, crate responsibility, validation approach, or stale/missing guidance? diff --git a/.agents/skills/README.md b/.agents/skills/README.md new file mode 100644 index 000000000..3d7376fbe --- /dev/null +++ b/.agents/skills/README.md @@ -0,0 +1,5 @@ +# Agent Skills + +This directory is reserved for executable scripts or concrete capabilities agents can run, such as validation helpers, deployment scripts, migrations, or other tool-like workflows. + +Do not put general policy, reference context, memory, or specifications here; use `../rules/`, `../context/`, `../memory/`, or `../specs/` instead. diff --git a/.agents/skills/mbv-check/SKILL.md b/.agents/skills/mbv-check/SKILL.md new file mode 100644 index 000000000..70b78b20f --- /dev/null +++ b/.agents/skills/mbv-check/SKILL.md @@ -0,0 +1,165 @@ +--- +name: mbv-check +description: Formats, lints, and tests the magicblock-validator Rust workspace using its exact cargo/make commands. Runs nightly rustfmt, workspace clippy, and nextest with optional error fixing. Use for Rust code quality checks and testing in this repository. +--- + +# MagicBlock Validator Check Skill + +Automated formatting, linting, and testing tailored to the `magicblock-validator` +workspace. This repository is **Rust only**; there are no JS/TS fallbacks. + +Before running, the validator's own guidance in `.agents/rules/testing-and-validation.md` +is the source of truth for validation. This skill encodes the exact commands that +repository uses so you do not have to re-derive them. + +## Overview + +Runs three sequential checks: +1. **Format** — nightly rustfmt with the repo's strict config +2. **Lint** — workspace clippy with warnings denied +3. **Test** — `cargo nextest` across the workspace + +## Usage + +When this skill loads, immediately run format, lint, and test with the default +mode (`fix-lint`). Do not ask the user for input. + +Assume all required tooling is installed and available, including `cargo`, the +nightly toolchain, `cargo fmt`, `cargo clippy`, and `cargo nextest`. Do not check +whether tools are installed and do not hedge about availability. + +All commands run from the workspace root unless explicitly noted (integration +tests run from `test-integration/`). + +### Options (optional, user may specify) + +- **fix-lint**: Fix clippy findings automatically (default mode) +- **fix-tests**: Attempt to fix test failures automatically +- **no-fix**: Don't fix anything, only report + +If no option is specified, use `fix-lint`. + +## Workflow + +### 1. Format + +This repo formats with **nightly** rustfmt and a stricter config +(`rustfmt-nightly.toml`: `imports_granularity = "Crate"`, +`group_imports = "StdExternalCrate"`). Use the repo target: + +```bash +make fmt +# equivalent to: +cargo +nightly fmt -- --config-path rustfmt-nightly.toml +``` + +To only verify without writing (matches CI): + +```bash +make ci-fmt +# equivalent to: +cargo +nightly fmt --check -- --config-path rustfmt-nightly.toml +``` + +Format always runs first regardless of options. + +### 2. Lint + +```bash +make lint +# equivalent to: +cargo clippy --all-targets -- -D warnings +``` + +Prefer the full-workspace form for broader changes (matches the documented +baseline in `.agents/rules/testing-and-validation.md`): + +```bash +cargo clippy --workspace --all-targets -- -D warnings +``` + +- With `fix-lint` (default): `cargo clippy --fix --allow-dirty --allow-staged --workspace --all-targets`, then re-run the deny-warnings check above to confirm it is clean. +- With `no-fix`: report findings only. + +### 3. Test + +Use `cargo nextest`. The repo's `make test` runs the workspace suite **and** the +integration suite; the integration suite is slow and needs SBF programs/validators, +so for a normal quality gate run the **workspace unit/integration-crate tests only**: + +```bash +cargo nextest run --workspace +``` + +Targeted forms: + +```bash +cargo nextest run -p +cargo nextest run -p --no-capture +``` + +If `nextest` is genuinely unavailable, fall back to `cargo test --workspace`. + +- With `fix-tests`: attempt focused fixes for real failures; do not mask root causes. +- With `no-fix`: report failures only. + +## Choosing what to run (targeted changes) + +For small changes, run the smallest relevant crate test first, then widen. Map +touched areas to crates/suites (see `.agents/context/crate-map.md` and +`.agents/rules/testing-and-validation.md`): + +- Runtime/execution: `magicblock-processor`, account/runtime tests +- RPC: `magicblock-aperture`, `magicblock-api` +- Delegation/cloning: `magicblock-chainlink`, `magicblock-account-cloner` +- Commit/undelegation: `magicblock-committor-service`, schedule-intent tests +- Config: `magicblock-config` +- Task scheduler: `magicblock-task-scheduler` +- Ledger/recovery: `magicblock-ledger` + +## Integration tests (only when relevant / requested) + +Integration tests live under `test-integration/`, have their own workspace, build +SBF programs, and spin up validators. They are NOT part of the default gate. Run +them when a change touches an integration-covered path or the user asks. + +```bash +cd test-integration +make test # all integration suites (builds programs first) +make test-chainlink +make test-cloning +make test-committor +make test-magicblock-api +make test-pubsub +make test-schedule-intents +make test-task-scheduler +make test-table-mania +make test-restore-ledger +make test-config +``` + +To isolate a single integration test, start validators in one terminal +(`make setup--{devnet,ephem,both}`), then run the test directly in another +(`cargo nextest run -p --test --no-capture`). Stop the +validators with Ctrl-C afterward. See `.agents/rules/testing-and-validation.md` for the +full suite/setup-target table and details. + +## Notes & guardrails + +- Treat all commands here as ready to run locally; no install/availability checks. +- Format halts nothing; lint and test failures are logged but don't abort the skill. +- This validator is **performance-sensitive and security-critical**. When a change + touches critical RPC, account sync, scheduler/executor, AccountsDb/ledger, + replication, committor, signer/authority, or locking/concurrency paths, also run + the smallest test that exercises that behavior and report any unmeasured + perf/security risk (per `.agents/rules/testing-and-validation.md`). +- Use `fix-tests` carefully; automatic fixes may not address root causes. + +## Reporting + +When finishing, report: +- exact commands run and pass/fail for each, +- anything skipped and why (especially integration suites), +- any performance-sensitive or security-relevant paths touched and how risk was + checked or what residual risk remains, +- whether `.agents/` docs needed updates for any durable discovery. diff --git a/.agents/skills/mbv-run-single-integration-test/SKILL.md b/.agents/skills/mbv-run-single-integration-test/SKILL.md new file mode 100644 index 000000000..a5de3c7ec --- /dev/null +++ b/.agents/skills/mbv-run-single-integration-test/SKILL.md @@ -0,0 +1,71 @@ +--- +name: mbv-run-single-integration-test +description: Runs a single magicblock-validator integration test with the correct validator setup. Brings up only the devnet and/or ephemeral validators a suite needs (mirroring CI), then runs one targeted test from the owning crate. Use when you need to run or debug one specific integration test or test function locally. +--- + +# Run Single Integration Test Skill + +Run one integration test in the `magicblock-validator` repo with the correct +validator topology. Integration tests live under `test-integration/`, build SBF +programs, and spin up validators, so a single test still needs the right +validators running first. + +`.agents/rules/testing-and-validation.md` is the source of truth for the suite → +setup-target → test-crate table and the exact commands. **Read its "Isolating +one integration test" and "Suite names and setup targets" sections** rather than +duplicating them here. This skill just drives that workflow. + +Assume all required tooling is installed (`cargo`, `cargo nextest`, `make`); do +not check. + +## Workflow + +1. **Identify the suite** from the target test path, using the table in + `.agents/rules/testing-and-validation.md`. If unsure, confirm against + `test-integration/test-runner/bin/run_tests.rs` (suite name + whether it needs + devnet, ephem, or both). +2. **Build programs** if the `.so` files may be missing/stale: + `cd test-integration && make programs`. +3. **Terminal A — start only the needed validators** and leave running: + `make setup--{devnet,ephem,both}` (sets `SETUP_ONLY`, waits for Ctrl-C; + `make list` shows targets). +4. **Terminal B — run the single test** from the owning crate (prefer nextest; + use `cargo test` when you need `--test-threads=1`/`--exact`). See the examples + below and the doc for the exact forms. +5. **Stop validators**: Ctrl-C in terminal A. Do not leave them running. + +## Examples + +`task-scheduler` (devnet only): + +```bash +# Terminal A +cd test-integration && make setup-task-scheduler-devnet +# Terminal B +cd test-integration +cargo test -p test-task-scheduler --test test_schedule_magic_cpi_crank test_crank_can_execute_program_that_cpis_into_magic -- --test-threads=1 --nocapture +``` + +Schedule intents (devnet + ephem): + +```bash +# Terminal A +cd test-integration && make setup-schedule-intents-both +# Terminal B +cd test-integration +cargo test -p test-schedule-intent --test 01_invocations test_schedule_commit_directly_with_single_ix -- --test-threads=1 --nocapture +``` + +## Troubleshooting + +- Missing chain account/PDA after setup → inspect the suite config; a missing + program in the devnet config can fail setup transactions before the asserted + step. For `task-scheduler`, ensure every program the test touches is in + `test-integration/configs/schedule-task.devnet.toml`. +- Builds but setup fails → rerun `make programs` in `test-integration`. + +## Reporting + +Report the suite + setup target used, the exact test command and its pass/fail +result, that validators were stopped, and whether any `.agents/` doc needed +updates. diff --git a/.agents/specs/validator-specification.md b/.agents/specs/validator-specification.md new file mode 100644 index 000000000..a7217ef9a --- /dev/null +++ b/.agents/specs/validator-specification.md @@ -0,0 +1,374 @@ +# MagicBlock Validator Specification Notes + +This document captures specification-level behavior that AI .agents should understand before modifying the validator. Treat it as a working spec for fixes, not as a complete formal protocol document. + +## Security invariants (highest priority) + +These invariants override all other behavior described in this document. The validator handles real funds; violating any of them can cause the validator or its customers to lose money. Under no circumstances may a change weaken them. + +1. **Signers stay required.** Every signature/authority check that exists today must remain. This includes (non-exhaustively): the MagicContext payer signer on `ScheduleIntentBundle`, the validator-signed `AcceptScheduleCommits`, ephemeral-account create signer (required on create to prevent pubkey squatting), delegation/commit/undelegation authorities, Magic Action escrow authority, and admin/operator entrypoints. Never add an unsigned or weaker-authority path to an operation that is authenticated today. +2. **Local state stays in sync with the base layer.** Account fetching, websocket/gRPC subscriptions, delegation-record resolution, slot/`min_context_slot` and commitment handling, and clone-freshness checks must remain at least as strong and stable as the current implementation. The validator must not serve or execute against stale, forged, or out-of-sync state, and must not miss base-layer updates that change delegation/undelegation truth. +3. **No attacker-triggerable conditions.** All RPC and transaction inputs are untrusted and potentially adversarial. Do not introduce race conditions, time-of-check/time-of-use gaps, ordering/timing attacks, validator stalls/deadlocks/hangs, unbounded resource consumption, or any path where one user's input can corrupt state, bypass validation, or affect another user's funds. Existing atomic lock acquisition, deduplication, slot-matching, and bounded capacity are security controls — preserve them. + +If a change cannot satisfy all three, do not make it; surface the conflict explicitly. The sections below describe specific mechanisms; read them through the lens of these invariants. + +## Terminology + +| Term | Meaning | +|---|---| +| Base layer | Solana mainnet/devnet/local validator where original accounts and programs live. | +| ER | Ephemeral Rollup: a MagicBlock SVM execution runtime operated by an ER validator. | +| Delegation Program | Base-layer program `DELeGGvXpWV2fqJUhqcF5ZSYMS4JTLjteaAMARRSaeSh` that locks delegated account ownership and stores delegation metadata. | +| Magic Program | ER-local program `programs/magicblock` used for commits, undelegations, scheduled tasks, ephemeral accounts, cloning, and validator-only operations. | +| MagicContext | Account used by the Magic Program to stage scheduled base-layer intents. | +| Delegated account | A base-layer state account whose ownership is locked by the Delegation Program and assigned to an ER validator. Locally it is cloned and presented with its original owner. | +| Undelegated account | A normal Solana account not delegated to this ER. It may be cloned for reads but must not be mutated by ordinary ER execution. | +| Ephemeral account | An ER-only account sponsored by a delegated account; it can be created/resized/closed inside the ER. | +| Commit | Synchronize ER state back to the base layer while keeping account delegation active. | +| Commit and undelegate | Synchronize ER state back to the base layer and return account ownership to the original program. | +| Magic Action | A base-layer instruction/action attached to a commit and run after committed state is available. | + +## Delegation specification + +### What delegation means + +Delegation transfers ownership of one or more program PDAs/state accounts to MagicBlock's Delegation Program on the base layer. The account becomes locked on Solana and associated with an ER validator. The original owner is recorded so the ER can execute the original program against the delegated state. + +Important properties: + +- Delegation is performed on the base layer, usually by a program-specific delegation hook. +- The ER account is not necessarily created at delegation time. +- Delegated accounts must already exist on Solana before delegation. +- Program accounts are not delegated. Program accounts are cloned when needed. +- A delegation attempt should fail if the account is already delegated. +- Public docs describe delegation configuration as including the ER validator, account lifetime, and synchronization frequency. + +### Validator assignment + +Delegation should identify the ER validator. MagicBlock public docs list development validators by region and show that programs may pass a specific validator account in delegation config. Local development uses validator identity `mAGicPQYBMvcYveUZA5F5UNNwyHvfYh5xkLS2Fr1mev` for `localhost:7799`. + +The router/API can expose delegation status for a single account via `getDelegationStatus`, returning at least: + +- `isDelegated` +- `fqdn` when known +- `delegationRecord.authority` +- `delegationRecord.owner` +- `delegationRecord.delegationSlot` +- `delegationRecord.lamports` + +### Local representation + +When a delegated account is cloned into the ER: + +- The validator fetches account data and delegation metadata from the base layer. +- The account is installed into local AccountsDb. +- The account is presented to the local SVM with its original owner so application programs can use it normally. +- Delegation-related flags/metadata must remain available for access validation and lifecycle handling. + +## Account cloning specification + +### Just-in-time cloning + +The validator clones accounts when transactions or RPC reads need accounts that are not present or are stale locally. + +Triggers include: + +- A transaction submitted to the validator. +- A read request that misses local AccountsDb. +- Remote account update notifications that indicate cached clone state may be stale. + +### Account flavors + +The account cloner distinguishes these important flavors: + +| Flavor | Meaning | ER behavior | +|---|---|---| +| Fee payer | On-curve system account with no data and not properly delegated. | Can be used to pay transaction costs; cloner may special-case lamports. | +| Undelegated | Non-delegated account with data. | May be cloned/read; should not be written by ordinary ER transactions. | +| Delegated | Account with a valid delegation record. | May be cloned and locally modified if assigned/valid for this ER. | + +### Cloning freshness + +The cloner tracks remote updates and clone outputs. If a remote account has changed since the last clone, the next clone request should fetch a newer base-layer version. Fetching uses base-layer RPC and delegation record lookups; websocket subscriptions track future changes. + +### Large accounts and programs + +The Magic Program API includes validator-only clone instructions: + +- `CloneAccount` for accounts that fit in one transaction. +- `CloneAccountInit` / `CloneAccountContinue` / `CleanupPartialClone` for large accounts. + +Program accounts are cloned/redeployed locally via loader-specific paths. Program accounts are not delegated. + +## Transaction routing and execution specification + +### Routing model + +MagicBlock's product docs specify the routing behavior for ordinary program instructions: + +- If **all writable accounts are delegated**, execute on the ER. + - Newly delegated accounts are cloned from the base layer on first ER use. + - Already cloned delegated accounts are reused. + - Undelegated non-writable accounts may be cloned for reads. +- If **all writable accounts are undelegated**, execute on the base layer. +- If a transaction mixes delegated and undelegated writable accounts, it fails or should not be routed to ER execution. + +Inside this repository, RPC/router-equivalent paths must preserve the same effective invariant even if routing is performed outside the validator. + +### Scheduler and locking + +The transaction scheduler runs on a dedicated OS thread and uses a pool of executor workers. This is a critical performance path: preserve low-latency scheduling, bounded lock contention, and parallel execution for unrelated accounts. Do not add blocking I/O, unbounded work, excessive logging, or avoidable allocation/serialization to scheduler or executor hot paths unless there is no viable alternative, and explicitly document any unavoidable tradeoff. + +This path is also security-critical. Atomic, all-or-nothing account lock acquisition (step 3 below) is what prevents concurrent transactions from racing on the same accounts; it must never be relaxed into partial or non-atomic locking. Because untrusted clients drive this loop, also guard against attacker-triggerable stalls and resource exhaustion: keep work bounded, never let one transaction block the scheduler indefinitely, and never let lock release/queueing leave the scheduler deadlocked. + +Required behavior: + +1. Receive a processable transaction. +2. Select an available executor. +3. Acquire all account locks atomically. +4. If any lock conflicts, release partial locks and queue the transaction behind the blocking executor. +5. If locks succeed, execute on the assigned executor. +6. On completion, commit account changes, write ledger/status data, emit events, and mark the executor ready. + +Account locks are bitmask-based: + +- One `u64` per account. +- MSB represents write lock. +- Low bits represent reader executor IDs. +- This implies a hard cap of 63 executor IDs. + +### SVM access validation + +The forked SVM includes MagicBlock-specific access validation after execution: + +> Writable accounts must be delegated, ephemeral, or confined, except for explicitly allowed cases such as fee payers, Magic Program instruction allowlists, and special post-delegation action executor patterns. + +Any change touching account flags, account loading, SVM commit/rollback, or transaction sanitization must preserve this invariant. This is a security boundary: weakening it would let ordinary (untrusted) ER transactions mutate state they must not touch, which can lose funds. Do not relax it for performance or convenience. + +#### Privileged accounts + +Accounts carry a `privileged` flag (defined in the forked `solana-account`, accessed via `privileged()` / `set_privileged()`). The validator marks exactly one account privileged: the **validator identity (authority) account**, set in `init_validator_identity` (`magicblock-api/src/fund_account.rs`), called once during startup in `magic_validator.rs`. No other account is ever flagged privileged in this repo. + +In the executor's commit-to-local-state path (`magicblock-processor/src/executor/processing.rs`) the flag grants two bypasses: + +- **Persistence bypass** — privileged accounts are always written back to AccountsDb, even when not dirty (normal accounts persist only if dirty). +- **Integrity-check bypass** — when the fee payer is privileged, `verify_account_states` returns early, skipping the confined-account integrity checks that otherwise apply. + +This is why the validator identity must remain privileged: validator-internal/system transactions (e.g. funding, identity operations) bypass the access-validation checks that constrain untrusted ER transactions. Do not flag additional accounts privileged, and do not remove the validator identity's privilege. (Distinct from the "privileged instruction" concept in `programs/magicblock/src/schedule_task/mod.rs`, which is about instructions disallowed inside cranks, not the account flag.) + +### Sysvars + +The validator supports a subset of Solana sysvars. Current documented support includes: + +- `clock` +- `epoch_schedule` +- `fees` with currently imperfect fee values +- `recent_blockhashes` with currently imperfect fee values +- `rent` +- `last_restart_slot` set to `0` when enabled by feature set + +Other sysvars may be stubbed or unsupported depending on whether they make sense for the ER runtime. + +## Commit specification + +### What commit means + +Commit synchronizes account state from the ER back to the base layer while leaving the account delegated. After finalization, the PDA remains locked on the base layer under the Delegation Program. + +A commit is scheduled from ER execution, not performed as a direct synchronous base-layer write by the user instruction. + +### User/program entrypoint + +Programs schedule commits by invoking the Magic Program from ER instructions. Current SDK examples use `MagicIntentBundleBuilder`: + +- `.commit(&[account_infos])` to commit accounts. +- `.commit_and_undelegate(&[account_infos])` to commit and undelegate. +- `.add_post_commit_actions([...])` to attach Magic Actions. +- `.build_and_invoke()` to invoke the Magic Program. + +For Anchor accounts, programs must ensure modified account state is serialized before the commit CPI sees it; examples call `counter.exit(&crate::ID)?` before scheduling commit after mutation. + +### Magic Program scheduling instructions + +The Magic Program API defines these relevant scheduling instructions: + +- `ScheduleCommit` +- `ScheduleCommitAndUndelegate` +- `ScheduleBaseIntent(MagicBaseIntentArgs)` +- `ScheduleIntentBundle(MagicIntentBundleArgs)` +- `AcceptScheduleCommits` +- `ScheduledCommitSent((intent_id, bump))` + +The older single-purpose `ScheduleCommit` and `ScheduleCommitAndUndelegate` paths are described as two-stage scheduling: + +1. User/program instruction stages the intent in MagicContext. +2. A validator-signed `AcceptScheduleCommits` moves staged commits into the global scheduled commits map at the start of a slot so the validator can realize them immediately after. + +`ScheduleIntentBundle` is the recommended bundled path for multiple independent intents with shared account overhead. It stores a scheduled intent bundle in MagicContext and logs the precomputed `ScheduledCommitSent` signature. + +### MagicContext and intent IDs + +When scheduling an intent bundle, the Magic Program: + +1. Verifies the MagicContext account. +2. Verifies the payer signer. +3. Deserializes MagicContext. +4. Allocates the next intent ID. +5. Constructs the requested commit/action/undelegation bundle. +6. Applies cross-intent validation. +7. Charges delegated payer/fee vault or checks commit limits. +8. Adds the scheduled action to MagicContext. +9. Writes MagicContext back to account data. + +### Intent validation + +Bundle validation must reject invalid bundles, including: + +- Empty intent bundles. +- Duplicate committed account pubkeys across the whole bundle. +- Cross references where the same account is both committed and commit-and-undelegated in incompatible fields. + +### Commit fees and limits + +The Magic Program code documents: + +- `ACTUAL_COMMIT_LIMIT = 25` for commits covered by user DLP PDAs. +- `COMMIT_FEE_LAMPORTS = 100_000` fixed fee per commit, matching Delegation Program constants. +- Base actions have compute-unit price handling. + +Changes to commit construction or scheduling should account for fee charging, fee vault behavior, and commit limits. + +## Undelegation specification + +### What undelegation means + +Undelegation commits the latest ER state and returns account ownership from the Delegation Program to the original program on the base layer. + +Product docs describe the flow as: + +- **ER**: schedule commit for account(s); mark accounts as undelegating/owned by Delegation Program locally. +- **Base layer**: CPI callback gets/finalizes/recreates or updates the account from ER state and restores original program ownership. + +### Local immutability after scheduling + +When `ScheduleIntentBundle` includes undelegation, the Magic Program marks each undelegated account locally via `mark_account_as_undelegated`. Code comments state: + +> Once account is undelegated we need to make it immutable in our validator. + +This is a critical lifecycle invariant. Do not allow normal ER transactions to keep mutating an account after commit-and-undelegate has been scheduled. + +### Callback discriminator + +The public docs specify an undelegation callback discriminator: + +```text +[196, 28, 41, 206, 48, 37, 51, 167] +``` + +Programs must include the corresponding instruction processor, or use MagicBlock SDK macros that inject it. The callback is triggered by the Delegation Program on the base layer to revert account ownership after ER undelegation. + +## Committor service specification + +### Responsibility + +The committor service realizes scheduled base-layer intents. Its inputs are scheduled commits/intents; its outputs are Solana base-layer transactions and confirmation results. + +### Pipeline + +The documented commit pipeline is: + +1. Magic Program schedules intents in MagicContext. +2. `magicblock-accounts` / scheduled commit processing picks up intents each slot. +3. `magicblock-committor-service` schedules and executes intents. +4. Task building creates atomic base-layer tasks such as commit, undelegate, finalize, and action. +5. Task strategy packs tasks into valid transactions. +6. Delivery preparation handles address lookup tables and commit buffers. +7. RPC client sends and confirms transactions. +8. SQLite persister preserves state across restarts. + +### Task strategy + +Commit tasks may be represented in different forms: + +- `ArgsTask`: commit data passed as instruction args. +- `BufferTask`: commit data uploaded through buffers, used for large changesets. + +The strategist may convert args tasks into buffer tasks when needed to fit transaction constraints. + +### Parallelism and blocking + +The committor service uses scheduling because intents can block one another. It can run multiple intent executors in parallel, but scheduling must respect conflicts/dependencies between intents. Preserve this parallelism and avoid avoidable throughput regressions in commit preparation, task packing, buffer upload, ALT handling, send/confirm loops, and persistence; call out any unavoidable performance tradeoff explicitly. + +## Magic Actions specification + +Magic Actions attach base-layer call instructions that run automatically after an ER commit. They allow committed state to drive base-layer workflows. + +Properties: + +- Actions are scheduled with a commit or as standalone base actions. +- Actions specify destination program, short account metas, args, escrow authority, and compute units. +- Secure action handling should identify the source program where applicable. +- Actions run on the base layer after the relevant committed state is available. + +## Ephemeral account specification + +Ephemeral accounts are ER-only accounts. + +Properties from public docs and Magic Program API: + +- Created, resized, and closed in the ER through Magic Program instructions. +- Owned by the calling program. +- Funded by a sponsor account that must be delegated. +- Rent is currently specified as `32 lamports/byte` with `60` bytes overhead. +- Sponsor pays additional rent on grow and receives refunds on shrink/close. +- Ephemeral account signer is required on create to prevent pubkey squatting, but not for resize/close. + +Magic Program instructions: + +- `CreateEphemeralAccount { data_len }` +- `ResizeEphemeralAccount { new_data_len }` +- `CloseEphemeralAccount` + +## RPC and router specification + +The MagicBlock Router API implements most standard Solana JSON-RPC methods and adds MagicBlock-specific methods. + +Important ER/router methods include: + +- `getDelegationStatus` +- `getBlockhashForAccounts` +- `getRoutes` +- router-aware `getAccountInfo` and signature status methods + +The validator RPC layer should remain aligned with Solana JSON-RPC expectations where it implements standard methods, while supporting MagicBlock-specific delegation and local-clone behavior. RPC changes must preserve low-latency request handling and avoid unnecessary blocking, clone/fetch amplification, or heavy per-request work on hot paths. + +## Startup, shutdown, and recovery specification + +### Startup + +The documented validator startup sequence is: + +1. Open ledger. +2. Sync keypair. +3. Open AccountsDb. +4. Connect replication broker and optionally fetch snapshot. +5. Initialize committor service. +6. Initialize chainlink. +7. Initialize genesis accounts. +8. Initialize metrics. +9. Load programs. +10. Spawn transaction scheduler in `StartingUp` mode. +11. Spawn RPC thread. +12. Initialize task scheduler. +13. On `start()`: optionally replay ledger, defragment AccountsDb, reset stale accounts, recover pending commit intents. +14. Switch scheduler to `Primary` or `Replica` mode. + +### Shutdown + +Shutdown ordering matters: + +1. Trigger cancellation tokens. +2. Stop services while preserving in-flight intent safety. +3. Stop committor service last among services where needed for in-flight intents. +4. Join threads. +5. Flush AccountsDb and ledger. diff --git a/.codex/skills/run-targeted-integration-test/SKILL.md b/.codex/skills/run-targeted-integration-test/SKILL.md deleted file mode 100644 index 4011297ea..000000000 --- a/.codex/skills/run-targeted-integration-test/SKILL.md +++ /dev/null @@ -1,156 +0,0 @@ ---- -name: run-targeted-integration-test -description: Run a single integration test in the magicblock-validator repo with the correct validator setup. Use when Codex needs to run one specific integration test or test function locally, bring up only the devnet and/or ephemeral validators required for a suite, mirror the suite setup used by CI, or determine which config files and commands match a target test path. ---- - -# Run Targeted Integration Test - -## Quick Start - -Prefer the repo's `test-runner` setup-only mode over hand-built validator commands. It uses the same suite-to-config mapping as CI and avoids guessing which `.toml` files belong to the target test. - -Use this workflow: - -1. Identify the suite from the target test path. -2. Read `test-integration/test-runner/bin/run_tests.rs` to find the suite's devnet and ephem configs. -3. Build programs if the required `.so` files may be missing or stale: - -```bash -cd test-integration -make programs -``` - -4. Start only the validators the suite needs: - -```bash -cd test-integration -env RUN_TESTS= SETUP_ONLY=devnet cargo run --package test-runner --bin run-tests -``` - -```bash -cd test-integration -env RUN_TESTS= SETUP_ONLY=ephem cargo run --package test-runner --bin run-tests -``` - -```bash -cd test-integration -env RUN_TESTS= SETUP_ONLY=both cargo run --package test-runner --bin run-tests -``` - -5. In another shell, run only the target test from the owning crate: - -```bash -cargo test --test --profile test -- --test-threads=1 --nocapture -``` - -6. Stop the validator processes with `Ctrl-C` after the test finishes. - -Always use `--test-threads=1` for targeted integration runs unless there is a clear reason not to. - -## Source Of Truth - -Use `test-integration/test-runner/bin/run_tests.rs` as the source of truth for: - -- The suite name to pass in `RUN_TESTS` -- Whether the suite needs devnet only, ephem only, or both -- Which config files in `test-integration/configs/` the suite uses - -Read `test-integration/test-runner/src/env_config.rs` when you need to confirm `RUN_TESTS` and `SETUP_ONLY` behavior. - -`SETUP_ONLY` accepts: - -- `devnet` -- `ephem` -- `both` - -## Common Mappings - -Use these known mappings first. - -### `task-scheduler` - -Path: - -- `test-integration/test-task-scheduler` - -Topology: - -- devnet only - -Config: - -- `test-integration/configs/schedule-task.devnet.toml` - -Setup command: - -```bash -cd test-integration -env RUN_TESTS=task-scheduler SETUP_ONLY=devnet cargo run --package test-runner --bin run-tests -``` - -Concrete example: - -```bash -cd test-integration/test-task-scheduler -cargo test --test test_schedule_magic_cpi_crank test_crank_can_execute_program_that_cpis_into_magic --profile test -- --test-threads=1 --nocapture -``` - -Do not start an ephem validator for this suite. - -### `schedulecommit` - -Paths: - -- `test-integration/schedulecommit/test-security` -- `test-integration/schedulecommit/test-scenarios` - -Topology: - -- devnet plus ephem - -Configs: - -- devnet: `test-integration/configs/schedulecommit-conf.devnet.toml` -- ephem: `test-integration/configs/schedulecommit-conf-fees.ephem.toml` - -Setup command: - -```bash -cd test-integration -env RUN_TESTS=schedulecommit SETUP_ONLY=both cargo run --package test-runner --bin run-tests -``` - -Concrete example: - -```bash -cd test-integration/schedulecommit/test-security -cargo test --test 01_invocations test_schedule_commit_directly_with_single_ix --profile test -- --test-threads=1 --nocapture -``` - -## Manual Fallback - -Use manual validator startup only when the user explicitly asks for raw config-based setup instead of `test-runner`. - -When doing that: - -1. Pick the matching `.devnet.toml` and `.ephem.toml` files from `test-integration/configs/`. -2. Start the chain validator with the repo's prewired script or equivalent `solana-test-validator` command. -3. Start the ephemeral validator with: - -```bash -cargo run -- -``` - -4. Run the targeted `cargo test` command from the owning crate. - -Prefer `test-runner` unless there is a specific reason to bypass it. - -## Troubleshooting - -If the test fails with a missing chain account or missing PDA after setup transactions, inspect the suite config first. A missing program in the devnet config can cause setup transactions to fail earlier than the observed assertion. - -For `task-scheduler`, remember that the suite config must load every program the test touches on chain. If a new targeted test uses another program, update `test-integration/configs/schedule-task.devnet.toml` rather than guessing from a generic validator script. - -If the test binary builds but the validator setup fails, rebuild the artifacts with `make programs` in `test-integration`. - -If you are unsure which suite owns the test, derive it from the directory and then confirm it in `run_tests.rs` before starting validators. diff --git a/.codex/skills/run-targeted-integration-test/agents/openai.yaml b/.codex/skills/run-targeted-integration-test/agents/openai.yaml deleted file mode 100644 index 42254be35..000000000 --- a/.codex/skills/run-targeted-integration-test/agents/openai.yaml +++ /dev/null @@ -1,4 +0,0 @@ -interface: - display_name: "Run Targeted Integration Test" - short_description: "Run one repo integration test with the right validator setup." - default_prompt: "Run a specific integration test in this repo with the full validator setup and verify the result." diff --git a/.gitignore b/.gitignore index d4b683a07..3c84060fe 100644 --- a/.gitignore +++ b/.gitignore @@ -35,7 +35,6 @@ _integration_test_bins/ # AI related **/CLAUDE.md .claude/ -AGENTS.md # Local configs CODEBASE_MAP.md diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 000000000..b55c6d4ee --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,60 @@ +# Agent Guide + +This repository contains AI-agent guidance in `./.agents/`. These files describe the validator's intended behavior, goals, protocol-level expectations, architecture, crate ownership, validation workflow, and documentation-memory rules. + +## Required acknowledgement + +At the start of any task that may change code, behavior, tests, documentation, configuration, or architecture, the agent **must first read the index at `./.agents/README.md` and explicitly say so** before proceeding. Do not pre-read every document — the index is a routing map that tells you which document to open for a given concern. + +Use wording like: + +> I read the agent index in `./.agents/README.md` and will open the relevant detailed docs as needed. + +If the task is only a trivial file operation and no `./.agents/` file is relevant, say that explicitly. + +## Announce available `mbv-*` skills + +At startup, before doing task work, the agent **must make the user aware of every `mbv-*` skill** — these are the skills customized for the magicblock-validator. List each by name with its one-line description; do **not** read the skill bodies. Discover them from the skill set available in the environment and/or by listing `.agents/skills/mbv-*/SKILL.md`. + +Currently available: + +- `mbv-check` — formats, lints, and tests the magicblock-validator Rust workspace (nightly rustfmt, workspace clippy, nextest) with optional error fixing. +- `mbv-run-single-integration-test` — runs a single integration test with the correct validator setup (brings up only the devnet and/or ephem validators the suite needs, then runs one targeted test). + +Only load a skill (read its `SKILL.md`) when the task actually calls for it. + +## Directory layout + +- `.agents/rules/` — invariant behavioral and decision-making rules agents must follow. +- `.agents/context/` — static reference context, including overview, architecture, crate map, and crate-specific guides. +- `.agents/memory/` — durable project-memory and documentation-stewardship rules. +- `.agents/specs/` — active protocol/specification notes. +- `.agents/skills/` — executable scripts or capabilities agents can run, when present. +- `.agents/personas/` — specialized agent profiles when this repository needs them. + +## Start here + +Read `.agents/README.md` first. It is a compact index whose routing table maps each concern (goals, protocol, architecture, crate ownership, validation, memory, per-crate guides) to the single document that covers it. **Open a detailed document only when your task touches that concern** — this keeps the context window small. + +Common routes (see the index for the full table): + +- behavior/protocol change → `.agents/specs/validator-specification.md` +- is this change aligned? → `.agents/rules/validator-goals.md` +- service wiring/interactions → `.agents/context/architecture.md` +- which crate owns this? → `.agents/context/crate-map.md`, then `.agents/context/crates/.md` +- how to validate → `.agents/rules/testing-and-validation.md` +- captured durable knowledge → `.agents/memory/agent-memory-and-docs.md` + +Before changing code, consult the matching `./.agents` material so the change does not violate the validator's goals, invariants, performance requirements, or specification. This acknowledgement is required; do not proceed silently. + +The validator is performance-sensitive infrastructure. Changes must not degrade critical-path performance unless there is no viable alternative; if a tradeoff is unavoidable, call it out explicitly with the reason, expected impact, and any mitigation. + +When a feature is added, removed, or changed, the relevant file in `./.agents/` **MUST be updated** to match the current implementation. These files cannot go out of sync with reality; if they do, they lose their usefulness for future agents and maintainers. + +When an agent discovers durable repository knowledge that is missing, incomplete, inaccurate, or stale in `./.agents/`—including feature behavior, protocol details, crate responsibilities, validation/debugging workflows, pitfalls, or performance constraints—the agent **MUST** update the most relevant existing document or create a focused new document if none exists. If documentation cannot be updated, the agent must report the blocked follow-up explicitly. + +**This obligation is not limited to code-changing tasks.** It applies equally to read-only and question-answering tasks: if you investigate the code to answer a question and learn a durable fact the docs lack or get wrong—especially a divergence from agave/Solana upstream behavior (a missing limit, different default, relaxed validation)—capture it before finishing. In every task's final reply, state whether agent docs were updated, or why no update was needed. + +**A fact being documented elsewhere does not excuse skipping the update.** The fact already existing in the source code, a code comment, an external repo, or even a *different* `./.agents/` file does not satisfy this obligation. The bar is: would an agent who opens only the single most relevant `./.agents/` document for that concern find the fact there? If not, add it to that document—even if a related or partial mention already lives somewhere else. Each `./.agents/` document must stand on its own for the area it covers. + +If anything is added to, removed from, renamed, or reorganized inside `./.agents/`, update this `AGENTS.md` file in the same change so this entrypoint remains accurate.