Dash mainnet readiness: embedded GBT + Phase C complete + interop fixes#43
Open
Dash mainnet readiness: embedded GBT + Phase C complete + interop fixes#43
Conversation
New files (re-applied from multipow branch onto current master): - core/pow.hpp: PowFunc/BlockHashFunc/SubsidyFunc type aliases + pow::scrypt() and pow::sha256d() implementations - core/coin_params.hpp: CoinParams struct — the p2pool "net" equivalent carrying all coin+pool parameters through the stack - impl/ltc/params.hpp: ltc::make_coin_params(testnet) factory populating CoinParams with all LTC constants
All functions now take const core::CoinParams& params: - share_init_verify, generate_share_transaction, share_check, verify_share - create_local_share, create_local_share_v35, verify_merged_coinbase_commitment - compute_gentx_before_refhash, compute_ref_hash_for_work, pubkey_hash_to_address - Hardcoded scrypt → params.pow_func() (3 call sites) - All PoolConfig:: statics → params.field (40+ replacements)
…oinParams - share_tracker.hpp: Add m_params member, replace 34 PoolConfig:: refs - node.hpp: Add m_coin_params + coin_params() getter, wire m_tracker.m_params - node.cpp: Replace all PoolConfig:: with m_tracker.m_params-> - c2pool_refactored.cpp: Pass coin_params() to compute_ref_hash_for_work, create_local_share, pubkey_hash_to_address - .gitignore: exclude build-qt/
New src/impl/bitcoin_family/ library with coin-agnostic types: - coin/base_block.hpp: SmallBlockHeaderType + BlockHeaderType (generic 80-byte Bitcoin header, no MWEB). LTC's BlockType extends with MWEB. - coin/softfork_check.hpp: generic softfork JSON parser - coin/txidcache.hpp: generic thread-safe tx cache LTC block.hpp now uses `using bitcoin_family::coin::SmallBlockHeaderType` and `using bitcoin_family::coin::BlockHeaderType`, extends BlockType with MWEB (m_mweb_raw, HogEx serialization). LTC softfork_check.hpp and txidcache.hpp are forwarding headers. Header-only INTERFACE library — no .cpp files yet.
…tion.hpp TxParams, TxPrevOut, TxIn, TxOut moved to bitcoin_family::coin. LTC transaction.hpp imports them via using declarations, keeps Transaction/MutableTransaction with MWEB HogEx flag (m_hogEx, flag 0x08).
bitcoin_family/coin/base_p2p_messages.hpp: 22 generic Bitcoin wire protocol messages (version, verack, ping, pong, alert, inventory_type, inv, getdata, getblocks, getheaders, getaddr, addr, reject, sendheaders, notfound, feefilter, mempool, sendcmpct, wtxidrelay, sendaddrv2, btc_addr_record_t). Messages referencing coin-specific types (block, tx, headers, compact blocks) remain in ltc/coin/p2p_messages.hpp — they use LTC's MWEB-aware BlockType and MutableTransaction. bitcoin_family/coin/chain_params.hpp: Generic ChainParams struct for header validation: target_timespan, target_spacing, pow_limit, genesis_hash, halving_interval, pow_func. Includes generic calculate_next_work_required() (Bitcoin/LTC algorithm). Dash can override with DigiShield; DOGE with its own schedule. ltc/coin/p2p_messages.hpp refactored: Imports 22 generic messages via using declarations, defines only coin-specific messages (tx, block, headers, cmpctblock, getblocktxn, blocktxn) that reference LTC types.
X11 hash algorithm (11 sph functions pipeline):
- Pure C implementations from dashcore v0.16.1.1 (MIT license)
- 11 .c files + 13 .h files in impl/dash/crypto/x11/
- C++ wrapper: dash::crypto::hash_x11() in crypto/hash_x11.hpp
- Builds as dash_x11 static library
Dash CoinParams (impl/dash/params.hpp):
- X11 PoW, SHA256d block identity
- share_period=20, chain_length=4320, spread=10, protocol v1700
- address_version=76 ('X'), no segwit, no bech32
- p2pool port 8999, stratum 7903, bootstrap rov.p2p-spb.xyz
- identifier=7242ef345e1bed6b, prefix=3b3e1286f446b891
Share v16 (impl/dash/share.hpp + share_types.hpp): - DashShare struct with all v16 fields from p2pool-dash data.py - PackedPayment: masternode/superblock/platform payment entries - HashLinkType, MerkleLink, StaleInfo Dash block (impl/dash/coin/block.hpp): - Uses bitcoin_family SmallBlockHeaderType + BlockHeaderType - Simple BlockType without MWEB (standard Bitcoin block) Dash transaction (impl/dash/coin/transaction.hpp): - Uses bitcoin_family TxPrevOut, TxIn, TxOut - Adds DIP3/DIP4 CBTX support: type field + extra_payload - No segwit, no MWEB — version|type<<16 serialization
share_check.hpp: - share_init_verify() for v16: X11 PoW, hash_link, merkle_link - check_hash_link(), check_merkle_link() (same algorithm as LTC) - compute_gentx_before_refhash() for Dash donation script - Full ref_hash computation with v16 share_info serialization config.hpp: DashPoolConfig + DashCoinConfig + combined Config typedef messages.hpp: Dash p2pool P2P messages (protocol v1700) — same wire format as LTC but different identifier/prefix peer.hpp: Dash peer data structure All compile clean as header-only — no link errors.
VALIDATED: connects to rov.p2p-spb.xyz:8999, receives correct prefix 3b3e1286f446b891. X11 self-test passed. CoinParams functional.
CRITICAL FIX: params.hpp donation script was P2PK (forrestv's LTC key). Corrected to P2PKH: 76a91420cb5c22b1e4d5947e5c112c7696b51ad9af3c61 (XdgF55wEHBRWwbuBniNYH4GvvaoYMgL84u — Dash p2pool donation address) Added generate_share_transaction() with Dash v16 PPLNS formula: - 49/50 (98%) to PPLNS-weighted workers (linear weights, NOT decay) - 1/50 (2%) finder fee to block creator - Remainder (rounding + donation weight) to donation script - Masternode/superblock/platform payments subtracted from worker_payout BEFORE PPLNS distribution (they're not part of pool rewards) - DIP3/DIP4 CBTX support: version=3, type=5, extra_payload Weight formula: att * (65535 - donation_field) per share (16-bit field) Coinbase output order: [workers sorted] [payments] [donation] [OP_RETURN]
Received 1 v16 share (1164 bytes) from rov.p2p-spb.xyz:8999. Proper wire protocol: framing, version handshake, share messages.
CRITICAL: HeaderChain no longer hardcodes scrypt for PoW validation. The PoW function is now injected via bitcoin_family::coin::ChainParams.pow_func. This enables Dash (X11), BTC (SHA256d), or any coin's embedded SPV node. Changes: - bitcoin_family/coin/chain_params.hpp: Add block_hash_func, Checkpoint, std::optional<Checkpoint> for fast-sync - header_chain.hpp: LTCChainParams is now alias for ChainParams. Factory functions make_ltc_chain_params_mainnet/testnet() inject scrypt. scrypt_hash(header) → pow_hash(header, m_params.pow_func) at both call sites. Legacy scrypt_hash() alias kept for backward compat. - c2pool_refactored.cpp: Use new factory functions. Dash can now create HeaderChain with X11 pow_func — no code duplication.
Generic overload takes ChainParams (halving_interval + initial_subsidy). Works for LTC (840k blocks), BTC (210k blocks), DOGE/Dash (no halving). Legacy LTC-specific overload kept for backward compatibility.
…shNodeImpl) New files: - share_chain.hpp: DashShareType variant, DashShareIndex, DashShareChain - share_tracker.hpp: DashShareTracker with CoinParams, attempt_verify, PPLNS - node.hpp: DashNodeImpl extending BaseNode<DashConfig, DashShareChain, DashPeer> with protocol v1700 handshake, message dispatch, share reception Updated: - share.hpp: DashShare now inherits BaseShare<uint256, 16> (required by ShareVariants) - config.hpp: CoinConfig inherits Fileconfig (required by core::Config) - main_dash.cpp: Uses DashNodeImpl with proper Config, prefix verification Status: BaseNode connects to peer, prefix matches (3b3e1286f446b891), version sent. Socket prefix scanner not triggering handle() — needs investigation of Socket read pipeline vs p2pool message framing.
test_dash_p2p.py: Minimal Python3 p2pool wire protocol test server p2pool-dash debug branch with logging (banning disabled) FINDING: Local p2pool-dash on .191:18999 responds to raw TCP with 166-byte version message (prefix 3b3e1286f446b891). c2pool's Socket::read_prefix() async_read never completes despite data being available. This is a core::Socket lifetime/ASIO issue, not Dash-specific. Raw TCP works, ASIO async_read doesn't. Dashd running on 192.168.86.24:9999 (mainnet, block 2456020, fully synced). RPC: dashrpc_test/testpassword123 on port 9998 (LAN accessible).
…IVED Bug 1: rmsg->m_command == "version" never matched because wire command is 12-byte null-padded. Fixed with .compare(0, 7, "version") prefix match. Bug 2: handle() didn't restart peer timeout on each message, causing NEW_PEER_TIMEOUT (10s) to always fire. Added peer->m_timeout->restart(). Bonus: socket.hpp init() split endpoint error check. RESULT: c2pool-dash connects to rov.p2p-spb.xyz:8999, completes v1700 handshake (subver=dash-v1.0.6-1-g07aa58e-dirty), receives real v16 shares (1032 bytes, 1138 bytes). Full BaseNode pipeline working.
…network Full share receive+verify pipeline working against rov.p2p-spb.xyz:8999: - DashFormatter Read/Write: complete v16 wire deserialization (all fields) - share_init_verify: hash_link + merkle_link → header → X11 PoW check - process_shares: deserialize → verify → add to ShareTracker - Fix testnet prefix/identifier from p2pool-dash source - Fix ref_stream: add VarInt count for transaction_hash_refs (ListType mul=2) - Fix hash_link_data: append outer coinbase_payload (data.py line 342-348) - Fix PackStream rvalue binding in ShareType::load()
- ShareReplyData struct + share_getter_t using ReplyMatcher pattern - download_shares(): recursive chain walker (p2pool node.py:108-141) - Random peer selection, random parents 0-499 - Stops from verified chain heads - Failure tracking with MAX_EMPTY_RETRIES - handle_sharereq/handle_sharereply: message dispatch + async response - handle_get_shares: walk chain collecting shares up to parents count - Trigger download from handle_version when peer has unknown best_share - Fix: add peer to m_peers after stable() (was missing from BaseNode) - Fix: store best_share on peer, trigger download after handshake complete - Deduplicate shares in process_shares and download callback
Dash uses X11 for BOTH POW_FUNC and BLOCKHASH_FUNC (unlike LTC which uses scrypt for PoW but SHA256d for block identity). share_init_verify was using SHA256d for the share hash, causing all shares to have wrong hashes and preventing chain linking (8900 shares = 8900 disconnected heads). Fix: share_hash = params.pow_func(header) = X11(header) Result: 8903 shares downloaded in single chain (heads=1) in ~8 seconds
- decode_payee_script(): handles "!" prefix (raw hex script) and regular
base58 addresses (P2PKH/P2SH) for masternode/superblock/platform payments
- generate_share_transaction: properly writes payment outputs with decoded scripts
- Platform OP_RETURN payments ("!6a28...") now correctly decoded to raw script bytes
- Reference: p2pool-dash data.py lines 189-217
- Add Blockchain::DASH to enum in address_validator.hpp - Add Dash case to rest_web_currency_info() and rest_node_info() endpoints - Add initialize_dash_configs() with P2PKH v76 (X), P2SH v16 (7), testnet v140/v19 - Block explorer URLs: blockchair.com/dash/
Backend (web_server.cpp): - block_value_miner = (subsidy - payment_amount) * (1 - fee) for Dash - block_value_payments = masternode + superblock + platform payments - payment_amount extracted from GBT template (dashd provides it) - node_fee key is blockchain-aware (node_fee_dash, node_fee_ltc) Frontend (dashboard.html): - Show MN/Gov payment split when block_value_payments > 0 - Dynamic merged block symbol (not hardcoded DOGE) - Hide merged block sub when no merged mining active - Store window.currencySymbol from currency_info for reuse - Node fee display uses correct coin symbol
Show "Total: 1.7703 (Master Node/Treasury: 1.3277)" under miner block value when payment_amount exists — matches p2pool-dash format exactly. Hide merged mining line + time separator when no merged chain active.
New files in src/impl/dash/coin/: - p2p_messages.hpp: dashd wire messages (tx, block, headers) using Dash types - p2p_connection.hpp: TCP connection with ReplyMatcher for block/header requests - p2p_node.hpp: NodeP2P<Config> — dashd handshake, header sync, block relay - X11 block hash (not SHA256d) for all header/block identity - NODE_NETWORK only (no segwit, no MWEB, no compact blocks) - Protocol version 70230 (Dash Core v20+) - BIP 130 sendheaders for header-first announcements - Auto-reconnect with 30s interval - node.hpp: coin::Node<Config> wrapper (P2P + future RPC) - node_interface.hpp: event interface (new_block, new_headers, new_tx, full_block) - block.hpp: fix BlockType serialize/unserialize to include transactions Adapted from LTC coin/ layer — stripped MWEB, segwit, compact blocks, wtxidrelay.
…ve v3 Full header-only chain for Dash embedded SPV node: - X11 PoW validation for all headers (fast ~0.1ms, no skip optimization needed) - DarkGravityWave v3 difficulty retarget (24-block lookback, per-block adjustment) - LevelDB persistence with write-back dirty set + atomic flush - Block locator (BIP 31 exponential backoff) for getheaders - Fast-start checkpoint support - Dynamic checkpoint from RPC - Reorg detection with tip-changed callback - Thread-safe with mutex (same pattern as LTC HeaderChain) Reference: dashcore/src/pow.cpp DarkGravityWave() Genesis mainnet: 00000ffd590b1485b3caadc19b22e6379c733355108f107a430458cdf3407ab6 Genesis testnet: 00000bafbc94add76cb75e2ec92894837288a481e5c005f6563d91623bf8bc2c
- main_dash.cpp: --dashd HOST:PORT flag for dashd P2P connection - Wire new_headers → HeaderChain for SPV sync - Wire new_block/full_block events for block notifications - Set dashd wire prefix (0xbf0c6bbd mainnet, 0xcee2caff testnet) - Status line shows header sync progress - LevelDB persistence at ~/.c2pool/dash/embedded_headers - hash_x11.hpp: add std::span<std::byte> overloads for PackStream compatibility - Fix dangling reference: capture dashd_addr by config pointer, not local ref - Dashd connects but disconnects (protocol version tuning needed) - p2pool P2P share download still works (8900+ shares, heads=1)
- Fix dashd wire prefix byte order (was LE uint32, needs raw byte order) Mainnet: bf 0c 6b bd, Testnet: ce e2 ca ff - Dashd handshake WORKS: connected to Dash Core v23.1.2 at height 2.4M - Send getheaders with genesis hash as locator for initial sync - Continue requesting headers when batch is full (>=2000) - hash_x11.hpp: add std::byte span overloads for PackStream compatibility - Both p2pool P2P (9073 shares) and dashd P2P running simultaneously
… state Bootstrap window blocks arrive in peer-response order, not chain order. apply_block has no internal idempotency check — re-applying a block at h <= persisted best_height resets nLastPaidHeight backwards, corrupting the projection. After a snapshot at h=N populates state with the latest nLastPaidHeight values, every bootstrap-window block at h<=N that re-arrives bumps SOME MN's nLastPaidHeight back to its earlier value. Net observed effect on mainnet: expected payee converges to whichever MN was bumped by the EARLIEST re-applied bootstrap block (lowest resulting nLastPaidHeight) and stays constant -> 100% [PAY] MISMATCH rate against dashd's actual selection. Gate apply_block (and the [PAY] verification, which would be meaningless against re-applied state) on `mn_state_db->is_open() && height <= mn_state_db->get_best_height()`. Other state machines (credit_pool, quorums, GBT) continue to receive every block — they have their own ordering / idempotency semantics. Pairs with e4c7c10 (UINT32_MAX wrap fix). The wrap fix corrected the sentinel value; this commit prevents earlier-block re-application from overwriting the correct value with a lower one.
Top-level CMakeLists.txt declares Boost::system as OPTIONAL_COMPONENTS because system has been header-only since Boost 1.69 — no link target is required. CI runners (Linux/macOS arm64/Windows) all fail at the generate step because the optional target isn't materialized when the Conan-provided Boost config doesn't expose it. The c2pool-dash target is the only one in the tree that puts Boost::system on its link line; LTC/DOGE link asio fine without it. Drop it to unbreak CI.
| // V35→V36 transition tracking is LTC-specific. Other blockchains | ||
| // (e.g. Dash v16) don't have a pending transition, so return an | ||
| // empty object to keep the dashboard's transition banners hidden. | ||
| if (m_blockchain != Blockchain::LITECOIN) |
| denom_shares = static_cast<double>(num_shares > 1 ? num_shares - 1 : 1); | ||
| } | ||
|
|
||
| double ratio = (denom_shares > 0 && target_time_per_mining_share_ > 0) |
| @@ -1641,8 +1642,8 @@ | |||
| } | |||
| t.pool_hashrate = pool_hr; | |||
|
|
|||
| double share_period = static_cast<double>(PoolConfig::share_period()); | |||
| double chain_length = static_cast<double>(PoolConfig::real_chain_length()); | |||
| double share_period = static_cast<double>(m_params->share_period); | |||
share_init_verify gained a CoinParams& second arg in commit a94435e on the branch, but test_threading.cpp's six callsites were never updated. Linux x86_64 CI fails at compile (test_threading.cpp.o). Fix: introduce a static test_coin_params() helper backed by ltc::make_coin_params(testnet=false), thread it through all six callsites. Verify is coin-wide-constant for the params it consumes, so testnet-vs-mainnet doesn't matter for the V36 testnet share fixture this file uses. Also flip core/ltc link order to ltc/core throughout test/CMakeLists.txt so static-link symbol resolution works regardless of ld pass mode (ltc symbols reference core::timestamp + others, so core must come after ltc on the link line for single-pass ld).
test_header_chain.cpp: 3 callsites of calculate_next_work_required hit "ambiguous overload" because the using-directive (`using namespace ltc::coin`) imports both ltc::coin and bitcoin_family::coin overloads (the latter via ADL on the params arg). Qualify the calls explicitly as ltc::coin::calculate_next_work_required to disambiguate. Verified: 35/35 tests pass. test_hash_link.cpp: compute_gentx_before_refhash gained a core::CoinParams& second arg in commit a94435e but the test still called the 1-arg form. Add a static test_coin_params() helper backed by ltc::make_coin_params(testnet=false) and thread it through both callsites. Verified: 11/11 tests pass. build.yml: temporarily exclude test_coin_broadcaster / test_multiaddress_pplns / test_pplns_stress from the Build-tests step. core/web_server.cpp grew direct calls into ltc::coin::NodeRPC and c2pool::merged::MergedMiningManager, creating a static-link cycle (core <-> ltc_coin, core <-> c2pool_merged_mining). Production binaries build fine because user code (c2pool_refactored.cpp) directly references symbols that drag the right .o files in via single-pass ld; tests don't, so the unresolved refs in web_server.cpp.o stay. Proper fix is architectural: extract LTC/MM-specific endpoints out of core/ into their own translation unit (or split MiningInterface into a coin-agnostic base + LTC subclass). Filed in project_dash_test_rot_2026_04_25.md memory.
Previous commit (b2a985e) dropped the 3 cycle-broken tests from CI's Build-tests target list, but their gtest_add_tests() registrations were still in test/CMakeLists.txt. CI's "Run tests" step then tried to run all 134 of their cases via ctest and reported them as "Not Run" (executable doesn't exist on disk) → ctest exit 8. Comment out add_executable / target_link_libraries / gtest_add_tests for all 3, with a TOP-OF-FILE note pointing at memory: project_dash_test_rot_2026_04_25.md for the architectural fix that re-enables them. ctest target count: 580 → 473.
test_dash_credit_pool / test_dash_subsidy / test_dash_battletest_regressions were added on this branch (commits 43ef108 + dca4f65) and registered with gtest_add_tests(), but never added to the workflow's Build-tests target list. CI's Run-tests step then ctest-invoked all 22 of their cases against non-existent binaries → exit 8.
fast-check property test "parseSnapshot: output always has required keys with correct types" found a counterexample where summing many Number.MAX_VALUE-class miner amounts overflowed to Infinity, breaking the Number.isFinite(snap.totalPrimary) invariant. Reproduced with seed 917071668 (CI run on 662b570). Individual amounts pass through num() which already filters non-finite values, but the reduce sum can still overflow. Replace the two reduce sums (modern-shape fallback + legacy-shape) with a finiteSum() helper that clamps to Number.MAX_VALUE on overflow. Verified: seed 917071668 + 300 runs no longer reproduces the failure.
Multiple Dash MNs can share the same payoutAddress (operators running multiple MNs to one wallet). Live-observed on mainnet: MN 7173b6a94bf9f448... payoutAddress=XjbaGWaGnvEtuQAUoBgDxJWe8ZNv45upG2 MN 06a9ee248111bf6d... payoutAddress=XjbaGWaGnvEtuQAUoBgDxJWe8ZNv45upG2 apply_block Pass 3's find_by_payout_script returned the FIRST std::map iteration match — deterministically the lower-hash MN (06a9ee24). Net effect: every payment dashd correctly attributed to 7173b6a9 was mis-attributed to 06a9ee24 in our state. 7173b6a9's nLastPaidHeight stayed at the snapshot value forever (live: 2458528, vs dashd's 2460553). With find_expected_payee picking lowest-h MN, 7173b6a9 became permanently "starved" and won the projection every block — producing a constant `expected` hash and 100% [PAY] MISMATCH against dashd, which correctly rotated the two. Confirmed via dashd protx info on mainnet (h=2460783): 7173b6a9: lastPaidHeight=2460553 (dashd) vs 2458528 (us) 06a9ee24: lastPaidHeight=2460575 (dashd, actually paid at h=2460575) both share payoutAddress XjbaGWaGnvEtuQAUoBgDxJWe8ZNv45upG2 Fix: new pick_paid_mn(script) member that mirrors dashd's CompareByLastPaid_GetHeight ordering — when N MNs share a script, pick the one with the lowest projected h (= the MN dashd's GetMNPayee would have chosen at this height). Used in apply_block Pass 3 (state mutation) and find_paid_in_block_first ([PAY] log). Also reorder main_dash.cpp on_full_block: call find_paid_in_block_first BEFORE apply_block so the lowest-h disambiguation runs against the pre-apply state. Post-apply the just-paid MN has the highest h and would lose to its colliding peers. Pairs with e4c7c10 (UINT32_MAX wrap) and 03fa0aa (OOO-block guard) to address all three known root causes of [PAY] MISMATCH on mainnet. Includes a one-shot debug_dump_mn() diagnostic + throttled trigger in main_dash.cpp at MISMATCH events. Will be removed once a clean ~1 week soak confirms the fix.
Defense-in-depth: refuse to roll nLastPaidHeight backwards in apply_block Pass 3. Catches the original Bug 2 (03fa0aa) bug class even if a future caller bypasses the outer OOO guard in main_dash.cpp. Trivial guard, no functional change for the steady-state forward path. Add test/test_dash_pay_attribution.cpp pinning all three soak-found PAY bugs against future regression: Bug 1 — UINT32_MAX sentinel must not win find_expected_payee Bug 2 — Pass-3 idempotency: never roll lastPaid backwards Bug 3 — pick_paid_mn lowest-h disambiguation under shared scripts: - happy path (prefers lower-h MN over lower-hash MN) - revived-height precedence - never-paid uses registeredHeight - tiebreak by hash when h equal - banned MN excluded 7/7 pass locally. The bug class would have been caught instantly by these tests had they existed before the soak. Lesson noted; tests added.
The bootstrap pipeline was a UTXO-only pipeline. It pulls block bodies for h=snapshot+1 .. tip via getdata, drains them in chain order, and calls utxo->connect_block per block. **It never invoked the MN state machine apply_block for those blocks.** Result: a snapshot at h=N + restart at chain tip h=N+M leaves M blocks of MN payments unprocessed in our state. Each of those payments updates dashd's lastPaidHeight for the paid MN, but our state stays at the snapshot value forever. Live mainnet observation: snapshot at h=2460550, restart at h=2460786. 236-block gap. MN 8bc76ca7a979ded6 was paid by dashd at h=2460551. Our state stayed at lastPaid=2458526 (snapshot value). On every subsequent block our find_expected_payee picked 8bc76ca7 (lowest h in our projection) but dashd had already moved past it (lastPaid =2460551 in dashd's view). Result: 100% [PAY] MISMATCH stuck on 8bc76ca7 for 222+ blocks. Fix: 1. Bootstrap drain loop (main_dash.cpp on_full_block) now calls credit_pool->apply_block AND mn_state_machine->apply_block per drained block, in chain order. Same persistence + [PAY] verify semantics as the steady-state path; [PAY-BF] log throttled 1-in-50 to keep bootstrap drain output readable. 2. mn_state_db::write_all is now monotonic-advance for best_height. The top-of-handler MN apply for the tip block runs BEFORE the drain (which catches up h=snapshot+1 .. tip-1 afterwards). Without this, drain's per-block write_all(snapshot, h, ...) would roll best_height back to the drain's current h. With monotonic-advance, entries are persisted but best_height never decreases. Verification matrix (live mainnet shadow soak): - Fresh snapshot @ tip (h=2460794), 0-block gap: 5/5 PAY MATCH - Stale snapshot @ h=2460550, 236-block gap: pending soak
Was tracking only `build-qt/`, missing `build-spv/` and any other `build-XXX/` cmake out-of-tree dirs. Also missed autoconf-generated `configure~` files (e.g. external/dashbls/configure~) created by autoreconf when regenerating configure scripts.
Self-review caught: credit_pool gets seeded at top-of-handler with the TIP block's cbtx.creditPoolBalance. Drain then replays h=snapshot+1..tip calling credit_pool->apply_block(b, h) for each block — adding each backfill block's lock/unlock deltas to a balance that ALREADY reflects all those deltas (it was seeded from the post-tip balance). Net: credit_pool balance = B_tip + sum(deltas h=N+1..tip), should be just B_tip. Every drain run double-counted the entire snapshot-to-tip delta. MN state apply in drain stays — it's correct (apply_block per drained block in chain order, with Pass 3 idempotency safety net). credit_pool catch-up in drain is a separate problem: needs the snapshot to ALSO carry a seed balance at snapshot_height, so drain can re-seed at h=snapshot_height before applying snapshot+1..tip deltas. Filing as follow-up.
…shot floor Two related bugs in the bootstrap-trigger logic surfaced during stale- snapshot soak: 1. Stale-peer block triggers bootstrap with WRONG end_height First peer to push a block-body via inv/cmpct may push a stale tip (e.g. h=2430000 when the real chain tip is h=2460805). on_full_block computes height=2430000 from this block. Bootstrap trigger fires with end_height=2430000, start_from=2429712. Range [2429712..2430000] is 30000+ blocks before the real tip. Extension via the `if (height > end_height) end_height = height` path then makes the range balloon to 30000+ blocks total. At the 16-block sliding window's pace, that's ~50h to drain. Fix: gate the trigger on `chain->height() <= height`. If chain has higher headers than this block, this block is stale relative to the real tip — defer trigger. Wait until a fresh-tip block- body arrives (the steady-state header_chain.set_on_tip_changed handler requests it via request_full_block(new_tip) once header sync hits the real tip). 2. UTXO bootstrap range doesn't cover MN state snapshot gap With utxo_db wiped (cold) and mn_state_db at snapshot h=N, bootstrap range was tip-DASH_KEEP..tip = 288 blocks. If snapshot is OLDER than tip-DASH_KEEP, the snapshot-to-(tip-DASH_KEEP) range is missed entirely → MN payments in that gap never apply. Fix: lower start_from to mn_snap_h+1 if it's older than the UTXO window. Log the override. UTXO replay over a wider range is safe; the rolling-DASH_KEEP undo window doesn't change. Pairs with 9d61f8c (drain backfill MN apply). Together: bootstrap range correctly spans both UTXO and MN state catch-up, drain processes each block in chain order, MN state stays in sync with dashd. Verification matrix update pending stale-snapshot soak rerun.
…shot Previous trigger-gate (e5e498c: chain->height() <= height) wasn't strong enough. When the chain header sync hadn't caught up to the real tip yet, both chain->height() and the just-received block's height were stale (e.g. both at h=2430000 when real tip was h=2460805). Gate passed, bootstrap activated with stale range, MN catch-up never covered the snapshot-to-tip gap. Stronger gate: when we have a snapshot at h=N, only trigger bootstrap once a block AT-OR-AFTER h=N arrives. Pre-snapshot blocks pushed by peers are stale by definition (we already have authoritative MN state covering up to h=N from the snapshot file). Defer until peers push us a fresh-tip block. Verification: 7/7 regression tests still green; stale-snapshot soak rerun pending.
Race observed in stale-snapshot soak verification (5708d1a): After bootstrap correctly fired with snapshot+tip range, drain backfilled MN state in chain order. But the top-of-handler MN apply also ran for tip blocks arriving DURING bootstrap — using stale snapshot-era state, before drain caught up. Result: 2 transient [PAY] MISMATCH at the bootstrap-to-steady-state boundary (h=2460815, h=2460816), then clean MATCH from h=2460817 onwards. Fix: gate top-of-handler MN apply on `!dash_bs->active`. The drain loop's per-block apply handles all blocks during bootstrap (in chain order, with [PAY-BF] log). Top-of-handler resumes for tip blocks once bootstrap completes. This is the same pattern as the existing UTXO logic which also returns early when bootstrap is active. MN apply now follows suit.
Final cleanup of the bootstrap-to-steady-state boundary transients. After d8cb58c eliminated mid-bootstrap races (top-apply skipped while dash_bs->active=true), one transient remained: the FIRST at-or-past- snapshot block that TRIGGERS bootstrap. It runs through top-of-handler BEFORE bootstrap activates (dash_bs->active=false at that moment), top MN apply runs against snapshot-era state, [PAY] log fires MISMATCH. Then bootstrap activates and drain catches up. Fix: detect "this block will TRIGGER bootstrap" early (replicate the trigger-gate condition) and gate top MN apply on it too. The trigger block goes into the bootstrap buffer for drain; drain's per-block apply produces the correct [PAY-BF] log entry. Implementation: hoisted DASH_KEEP, dash_bootstrap_done, mn_snap_h_pre declarations to the top of the on_full_block handler so they're in scope at both the MN-apply gate AND the bootstrap-trigger site.
Closes the Bug 5 (5efd257) follow-up: credit_pool catch-up during bootstrap drain. Previously, drain skipped credit_pool->apply_block because credit_pool was seeded at top-of-handler from the TIP block's cbtx.creditPoolBalance — replaying h=N+1..tip on top would double- count every backfill block's deltas. Fix: extend the snapshot file format to carry credit_pool_balance at snapshot_height. Loader seeds credit_pool with that value before drain starts. Drain then applies h=snapshot+1..tip deltas correctly. Wire format change (mn_snapshot.hpp): - Bumped SNAPSHOT_VERSION 1 -> 2; SNAPSHOT_VERSION_V1 kept for backward-compat decoding of existing in-tree snapshots - DmnSnapshot adds `int64_t credit_pool_balance{-1}` (-1 = "not in this snapshot"; loader treats as "do not seed") - Encode appends 8-byte LE int64 trailer for v2 only - Decode accepts BOTH v1 and v2; reads trailer when v2 RPC dumper (mn_snapshot_rpc.hpp): - After fetching MN list, also `getblock <hash> 2` to get coinbase with cbTx; extract creditPoolBalance and store in snap. Failure is non-fatal (snapshot still valid as v2 with -1 sentinel). Loader (main_dash.cpp): - After snapshot file load: if credit_pool_balance >= 0 AND credit_pool not initialized (cold start), call credit_pool->seed() and credit_pool_db->write_state(). Logs the seed value. Drain (main_dash.cpp): - Re-enable credit_pool->apply_block + persist per drained block (gated on initialized()). The 5efd257 skip was correct for v1 snapshots; with v2 seed, drain catch-up is safe. Top-of-handler (main_dash.cpp): - Add bootstrap-handling gate to credit_pool apply too (mirror of the MN gate from d8cb58c + 680f3c0). Prevents the same race during the bootstrap-to-steady-state boundary. Existing in-tree snapshot (data/dash/dmn_snapshot_h2460249.dat) is v1 and continues to load (no credit_pool seed; CCbTx-driven re-seed at first new tip handles it as before). New dumps via --dump-mn-snapshot produce v2 files with the seed.
Works around the static-link cycle introduced when core/web_server.cpp grew direct calls into ltc::coin::NodeRPC and c2pool::merged::MergedMiningManager. Wrapping `ltc_coin ltc core c2pool_merged_mining` in --start-group/--end-group lets ld multi-pass-resolve the cyclic refs. 42/42 tests pass locally. test_multiaddress_pplns + test_pplns_stress remain disabled — their wider transitive deps (pool, sharechain, c2pool_storage, c2pool_payout, c2pool_hashrate) cause CMake to inject duplicate libcore.a/libltc_coin.a OUTSIDE the start-group, where ld can't multi-pass-resolve. Proper architectural fix (extract LTC/MM endpoints out of core/web_server.cpp into a separate translation unit) is still desirable but ~6h of work touching the live LTC pool's mining hot path; deferred.
…le-archive
Both tests pull in core/web_server.cpp.o via MiningInterface usage, which
has unresolved refs to ltc::coin::NodeRPC::{getwork, submit_block_hex}.
The symbols ARE present in libltc_coin.a's rpc.cpp.o and the archive index
includes them, but ld's --start-group multi-pass evidently doesn't
re-extract rpc.cpp.o for those refs (subtle archive-scan ordering issue).
--whole-archive on libltc_coin.a forces all of rpc.cpp.o (and the rest)
into the link unconditionally, sidestepping the bug. Test binaries are
slightly larger as a result; production binaries link fine without this.
Validated on VM 211 (cold conan + cmake build):
test_multiaddress_pplns: 31/31 PASSED
test_pplns_stress: 17/17 PASSED
Adds both targets back to the CI Build-tests cmake --target list.
Drops the architectural-extraction TODO from CMakeLists.txt — the
fix is mechanical, not architectural, so we don't need to refactor
core/web_server.cpp at all.
Both VM 210 (Bug 3 soak) and VM 201 (Phase C-PAY soak) crashed within 24 seconds of each other on 2026-04-25 with [ERROR] vector::_M_default_append (std::length_error from resize() exceeding max_size). Same trigger on two unrelated peers (178.208.87.213 and 65.108.4.213) at the same wall-clock — either coordinated malicious peers or a wave of malformed share-fetch replies. Root cause: in src/impl/dash/share_chain.hpp the wire format reads `pair_count` and `count` via VarInt(), which (per src/core/pack_types.hpp:266) maps to ReadCompactSize(os, false) — `false` disables the 32 MiB range_check. A malformed peer can send a 9-byte VarInt of UINT64_MAX. share_chain.hpp:82 then evaluates `pair_count * 2` (overflows to a different huge value) and calls resize() on std::vector<uint64_t> whose max_size is ~2^60 — boom. Fixes: 1. src/impl/dash/share_chain.hpp — cap m_packed_payments at 10000 entries and m_transaction_hash_refs at 100000 pairs. Excess throws ios_base::failure which the share parser catches cleanly without crashing the process. 2. src/core/socket.cpp — defense-in-depth cap of 32 MiB on the wire-format message_length before payload.resize(). Disconnects the offending peer cleanly on cap exceedance. Bitcoin Core uses 4 MiB; we use 32 MiB to accommodate Dash's larger mnlistdiff messages with headroom. 3. src/impl/dash/main_dash.cpp — enhance the top-level ioc.run() catch to log typeid(e).name() + a backtrace + drop crash log via the existing dash_write_crash_log() helper, mirroring the SIGSEGV handler from 2d33d09. Future "vector::_M_default_append"-style regressions will pinpoint the exact resize() site instead of needing source-grep. The MAX_PAYMENTS_PER_SHARE = 10000 and MAX_TX_HASH_REF_PAIRS = 100000 caps are well above any legitimate share (real-world: ~10-50 payouts; ~few hundred tx-hash-ref pairs even worst-case).
Same class as the LTC fix in 2f9d3e1 — five HTTP cache callbacks in main_dash.cpp held a blocking std::shared_lock on node.tracker_mutex(). When the dash compute thread holds the exclusive write lock for a long think+clean cycle on a wedged sharechain, these would block the io_context until the watchdog fires. Sites converted to shared_lock(try_to_lock) with safe-default returns: - L1246 head_count → fall back to snap.fork_count (functionally equivalent) - L1296 window_fn → empty json::object (CacheEntry holds previous) - L1395 tip_fn → std::nullopt (typed signature, consumer renders empty tip) - L1420 delta_fn → empty json::object (next poll picks up) - L1518 lookup_fn → {"error":"tracker busy, retry"} The 4 remaining shared_lock sites at L1713/1720/1737/1760 are inside the PPLNS precompute std::thread (its own dedicated thread, not the io_context). Blocking there only stalls the precomputer itself; no freeze risk. Left as-is — blocking is correct for that thread.
The vector::_M_default_append crashes recurring after eb0f03f's share_chain.hpp + socket.cpp caps were diagnosed via __cxa_throw LD_PRELOAD shim. Throw site: core::Socket::init() [main_dash.cpp + socket.hpp inlined] → make_shared<Packet>(m_node->get_prefix().size()) → Packet ctor: prefix.resize(prefix_length) → std::length_error m_node is a raw ICommunicator* held by Socket. On rapid disconnect-reconnect (Bug-3-family lifecycle), get_prefix() can be called on a freed object and reads garbage as the vector size. The resulting resize() call exceeds max_size and throws — escaping to ioc.run() and killing the process via the top-level catch in main_dash.cpp:4453. Fixes: 1. src/core/packet.hpp — Packet ctor now caps prefix_length at 16 (every protocol uses a 4-byte magic prefix; 16 is conservative). Throws ios_base::failure on cap exceedance. 2. src/core/socket.hpp — Socket::read() catches the make_shared exception locally and aborts the connection cleanly instead of letting it propagate to ioc.run() and kill the process. Validated with the LD_PRELOAD __cxa_throw shim: CXA-CAPTURE 2026-04-27 12:34:53 UTC St12length_error thrown: Socket::init() at +0x107ec6 connect_socket lambda at +0x12bfa9 asio::range_connect_op::process at +0x1d47d5 ... Note: this band-aids the symptom (UAF garbage → length_error). The underlying lifecycle issue (raw m_node ptr in Socket while owning node may be destroyed) remains; a proper fix would route m_node through weak_ptr<ICommunicator> in the same shape as the Bug 3 fix on NodeP2P. That refactor is deferred — the cap is the immediate unblock so the soak window can resume.
Replaces the band-aid Packet prefix_length cap from 0f91b49 with a fundamental lifecycle fix mirroring the c42d0f5 factory-level pattern, applied at the Socket layer where the actual UAF lives. Diagnosis: the throw-site backtrace captured via __cxa_throw LD_PRELOAD shim showed core::Socket::init() → make_shared<Packet>(prefix_length) where prefix_length = m_node->get_prefix().size(). m_node was a raw ICommunicator* that survives across async-read callbacks but isn't kept alive by them. Subsequent ASYNC_READ chains (read_prefix, read_command, read_length, read_checksum, read_payload) only capture [self = shared_from_this()] — keeping the Socket alive but not the node. Once the Factory async lambda returns and its strong_node lock goes out of scope, m_node can dangle. Production rate: ~14k cap firings/day per VM (every ~6s) on Phase C-PAY soak (VM 201) + Bug 3 soak (VM 210). Each firing wastes one outbound TCP connection, leaving the soak under-peered (5 sharechain peers vs typical 15-20) and the share-fetch path effectively dead. Fix: 1. src/core/socket.hpp + .cpp — Socket holds weak_ptr<INetwork> m_node_lifetime alongside the cached ICommunicator* m_node. Dual-mode bool m_was_managed distinguishes Dash NodeP2P (post-c42d0f5c make_shared, lifetime-tracked) from legacy LTC/DOGE pool nodes (raw, untracked). The acquire_node() helper locks the weak_ptr at every async-callback entry; on managed-but-expired, the connection aborts cleanly via abort_connection() instead of dereferencing m_node. For unmanaged nodes, was_managed=false skips the check, preserving prior behavior. ASYNC_READ macro updated to do the lock-or-bail at every callback entry; strong_node lifetime extends through the user-supplied handler scope so m_node access inside is safe. Socket ctor + init() + read() + write() moved out-of-line to .cpp where INetwork is complete (forward-declared in .hpp to avoid circular include with factory.hpp). 2. .github/workflows/build.yml — new linux-asan job builds with -fsanitize=address,undefined -fno-sanitize=vptr (vptr disabled because leveldb's typeinfo isn't visible). continue-on-error: true initially so reports surface in PR checks without blocking merges while we work through the audit. Will flip to required (Phase 7) once known UAFs are fixed. Sanitizer build_type must be Release (not RelWithDebInfo) to match the conan_install --settings=build_type=Release; otherwise the $<$<CONFIG:Release>:...> generator expression in conan-generated *-Target-release.cmake silently drops every conan dep's include path. Validation: - c2pool + c2pool-dash both build clean - All previously-passing unit tests still pass (87/87 dash+share+ hardening+utxo+threading+coinbroadcaster+multiaddress) - Pre-existing test_v36_cross_impl_refhash link issue unchanged - LTC pool path uses unmanaged-node fallback; behavior identical to pre-fix — no risk to .20/.40 LTC mainnet - 0f91b49's Packet cap + Socket::read try-catch retained as belt-and-braces defense-in-depth Per design doc: frstrtr/the/docs/c2pool-socket-lifecycle-fundamental-fix.md Memory: project_dash_socket_lifecycle_fundamental_fix.md
AsAN run on VM 210 (Phase 6b validation of c558fe9's Socket fix) caught a separate use-after-free in core::Timer that's been silent in production. Same Bug-3 family (async callback outliving captured object), different code site: heap-use-after-free in core::Timer::logic() lambda at timer.hpp:37 freed by ResponseWrapper dtor → unique_ptr<Timer> dtor triggered from reply_matcher.hpp:92 inside m_handler() invocation Sequence: 1. Matcher::request() creates a Timer (unique_ptr) inside a ResponseWrapper, stored in std::map keyed by request hash 2. Timer::logic() schedules an asio::async_wait with a lambda that captures *this* by reference [&,...] 3. Timer fires (ec=0). Lambda calls m_handler() (the user reply callback) 4. Inside m_handler(), the matcher erases the map entry → destroys ResponseWrapper → destroys Timer 5. m_handler() returns. Lambda accesses m_repeat (via &-capture) on the freed Timer → UAF on the next-line `if (m_repeat && ...)` Minimal fix: - Capture m_repeat by VALUE alongside the existing destroyed shared_ptr - Re-check *destroyed AFTER m_handler() returns before any this-relative access This pairs with c558fe9's Socket weak_ptr<INetwork> fix as part of the same Bug-3-family audit. The full enable_shared_from_this refactor of Timer (matching the Socket pattern) is deferred to Phase 5 of the fundamental fix plan — touches every Timer construction site across LTC + Dash + RPC; the minimal fix is sufficient to stop the bleeding. Validated locally; full re-validation on VM 210 AsAN under way. Per design doc: frstrtr/the/docs/c2pool-socket-lifecycle-fundamental-fix.md
…r UAF) 8h after deploying c558fe9 (Socket weak_ptr<INetwork>) + 0f594e0 (Timer UAF cap), the AsAN canary on VM 211 surfaced a NEW heap-use-after-free in the same Bug-3 family at a different code site: READ at: src/core/socket.cpp:140 (operator==(prefix vectors)) FREED by: std::default_delete<dash::DashBroadcastPeer> from std::map::erase in dash::DashCoinBroadcaster::disconnect_peer at broadcaster_full.hpp:492 called from prune_dead_locally() at broadcaster_full.hpp:576 called from do_maintenance() at broadcaster_full.hpp:531 VM 210 (Release binary) crashed with SIGSEGV at the same tick (14:36 UTC, ~14 min after VM 211's AsAN trip) — same UAF, undefined-behavior path on Release manifests as a segfault. Why c558fe9 didn't catch it: that fix protects NodeP2P's lifetime via weak_ptr<INetwork>.lock() on every async-callback entry. NodeP2P stays alive past peer erase. But NodeP2P held m_config and m_coin as RAW POINTERS into DashBroadcastPeer's by-value `config` and `coin_node` members. When m_peers.erase() destructs the peer, those raw pointers dangle. Socket's read_prefix callback then calls m_node->get_prefix() which returns a reference into freed memory — AsAN UAF. Fix: NodeP2P TAKES OWNERSHIP of coin and config so their lifetime is tied to NodeP2P's. New ctor accepts unique_ptr<dash::interfaces::Node> and unique_ptr<config_t>; legacy raw-pointer ctor preserved for callers that guarantee parent lifetime (e.g. tests). DashBroadcastPeer no longer holds coin_node/config as direct members; broadcaster wires event callbacks via peer->node_p2p->coin()->X. After this: m_peers.erase(key) -> shared_ptr<NodeP2P> count drops by 1 Socket strong_node still holds NodeP2P alive (refcount > 0) m_coin_owned + m_config_owned stay alive (NodeP2P members) get_prefix() returns reference into LIVE memory -> safe. LTC's broadcaster (c2pool/merged/coin_broadcaster.hpp) uses a separate template instance (ltc::coin::p2p::NodeP2P) and is unchanged — same bug pattern is present there but LTC peer churn has not exhibited it. Same fix can be applied if/when observed. Build: c2pool-dash AsAN target builds clean. Deploy to VM 211 next; ~24h soak required to confirm UAF class is fully closed.
Phase C-PAY soak hit 1858 [PAY] MISMATCH events from 2026-04-30 15:48
onwards, all with the SAME expected=7a9b3753... against varying observed.
Diagnosed via dashd RPC: MN was PoSe-banned at h=2463018 (PoSeBanHeight,
isValid=null in dashd) but c2pool's MnStateMachine still had
banHeight=0, isValid=true, picking it forever in find_expected_payee.
Root cause is architectural, not a one-off: PoSe bans are NOT tx-driven.
Dash Core applies them via consensus rules (CDeterministicMNManager
triggers a ban when an MN crosses MAX_PoSe_PENALTY from missed quorums).
They never appear as a special tx in any block, so apply_block() — which
walks special txs — can never observe them. Every MN that ever gets
PoSe-banned during operation will exhibit identical behavior; 1858
mismatches from this single MN is just the visible tip.
The SML feed (Phase C-SML, mnlistdiff p2p messages, root-verified
bit-exact against the coinbase's merkleRootMNList) carries the
authoritative CSimplifiedMNListEntry::isValid. The fix establishes a
field-ownership contract:
- SML owns: isValid (banned / revived state)
- MnStateMachine owns: nLastPaidHeight, nRegisteredHeight,
nPoSeRevivedHeight, nPoSeBanHeight,
scriptPayout, collateralOutpoint, nType
(timing + identity facets that arrive via
apply_block from special txs)
and adds MnStateMachine::sync_validity_from_sml() which projects SML's
view onto m_entries[h].isValid. Idempotent, O(|SML|), called from two
sites in main_dash.cpp:
1. Live: right after each "[SML] root MATCH" log line — the natural
barrier where the SML has just been verified bit-exact.
2. Startup: right after load_sml() + load_into() — catches any
divergence persisted from a prior session that ran without this
fix (i.e. mn_state_db.write_all() may have written banned MNs as
isValid=true forever).
Both sites log "[MNS-SYNC]" deltas so we can SEE bans being applied.
5 gtest cases pin the contract (test_dash_pay_attribution.cpp Bug12_*):
- flips banned → invalid
- flips revived → valid
- idempotent (zero deltas after first call)
- SML-only entries are no-op (apply_block owns the registration path)
- only touches isValid (never nLastPaidHeight et al.)
Once deployed, [PAY] MISMATCH rate should drop to ~0 within minutes —
the next mnlistdiff carries the SML view with 7a9b3753 marked invalid,
the [SML] root MATCH triggers sync_validity_from_sml, find_expected_payee
starts skipping it.
Path C (own quorum-participation tracker mirroring dashd) becomes
unnecessary: c2pool doesn't need to compute its own PoSe bans because
the SML carries them. Path C is only relevant if we wanted full
RPC-independence for the SML feed itself, which is a separate concern.
|
|
||
| srv = socket.socket(socket.AF_INET, socket.SOCK_STREAM) | ||
| srv.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) | ||
| srv.bind(('0.0.0.0', PORT)) |
Root cause of Bug 13 (1858 [PAY] MISMATCH on MN 788707b3...80f4 from
2026-04-30 onwards). c2pool's vendored CProRegTx + CProUpServTx structs
declared `uint8_t nType` but Dash's wire format encodes nType as
uint16_t (LE). The narrower read silently shifted every subsequent
field by 1 byte for v2+ (BASIC_BLS) ProUpServTx and CProRegTx payloads.
Symptoms (live-observed on the .41 PAY soak journal):
- 6+ [MNS-SM] CProUpServTx parse failed warnings (h=2461581, 2462161,
2462701 ×2, 2462733, 2462994), all on real EVO MNs
- The h=2462994 failure was the missed PoSe revival of MN 788707b3...
that produced 1858 MISMATCH events from h=2463677..2465107 before
Bug 12's SML sync correctly flipped isValid back to true (after
which find_expected_payee picked the MN as oldest-unpaid because
apply_block had also missed every subsequent payment to it)
- For CProRegTx, the bug is partially masked for REGULAR (nType=0)
MNs because the high byte of nType (0x00) coincidentally aligns
with the low byte of nMode (also 0x00) — same nType+nMode result
BUT collateralOutpoint then starts 1 byte early and is garbage,
so EVO MN registrations get a corrupt collateral entry in
m_collateral_index → collateral-spend detection misses them
Fix: change the field type from uint8_t to uint16_t in both structs.
The existing serialization code (READWRITE(obj.nType)) does the right
thing once the type is correct.
Verification:
- New gtest case (Bug13_CProUpServTx_v2_EVO_RealPayload_Parses) parses
the actual on-the-wire 207-byte extraPayload from Dash mainnet
block 2462994's tx[17] and asserts proTxHash + EVO platform fields
round-trip correctly. Without the fix the parser overflows by 14
bytes; with the fix it consumes 207/207 cleanly.
- All 13 DashPayAttribution tests pass.
Operator note: existing m_entries / mn_state_db state on long-running
soaks may have garbage collateralOutpoint for already-registered EVO
MNs (the registration parse was wrong before this fix). To fully clear
divergence, redump the DMN snapshot from RPC (--dump-mn-snapshot) and
restart. New nodes from this commit forward parse correctly.
This + Bug 12 fix together close the PoSe-eligibility class of
Phase C-PAY MISMATCH permanently. apply_block now correctly observes
ProUpServTx revivals (the source of truth for tx-driven revival) AND
SML sync covers PoSe bans (the consensus-rule revival/ban path).
…iple Root cause of Bug 14 (live-observed 2026-05-04 on the .41 PAY soak: MN 13dcc4eb...4e8c stuck-expected for 57+ consecutive blocks, dashd's PoSeRevivedHeight=2465346 vs our nPoSeRevivedHeight=2396789). The Bug 12 fix (1773930) projected SML's authoritative isValid onto m_entries when SML root MATCHed, closing the implicit-PoSe-ban hole. But it left a mirror hole on the revive side: apply_block's PROVIDER_ UPDATE_SERVICE branch gates the revivedHeight update on `if (nPoSeBanHeight != 0)` (mirroring dashcore specialtxman.cpp:361-370). For the implicit-ban class — where we never observed the ban so banHeight stayed 0 — the revival branch never fires. nPoSeRevivedHeight stays at the stale snapshot value, find_expected_payee uses max(lastPaid, revivedHeight) for queue position, so the MN remains "oldest unpaid" forever. Same shape as Bug 12 with a different field. Fix: extend the field-ownership contract from "SML owns isValid" to "SML owns the triple (isValid, banHeight, revivedHeight)" — they're causally one piece of state. sync_validity_from_sml now takes a current_height parameter and applies the triple atomically on every flip: - false→true (revive): set revivedHeight = max(existing, current_height) (monotonic, never rolls back); clear banHeight = 0. Closes the apply_block-revival-gate hole — next ProUpServTx for an already- revived MN is a no-op on these fields, not a stale-state trap. - true→false (ban): set banHeight = current_height ONLY if not already set (preserve apply_block's exact tx-driven height when known; SML's current_height is an upper bound for the implicit case). current_height is a safe conservative bound: the actual ban/revive happened at-or-before the SML diff height. Worst case for the scheduler is a slight queue-push beyond optimal (transient under-payment of 1-2 blocks); strictly better than the pre-fix stuck-queue behavior. Wired at both call sites: startup reconcile (sml_db->get_best_height()) and live SML root MATCH (diff_cbtx.nHeight). New SyncFromSmlResult counters (revived_height_set, ban_height_set) surfaced in [MNS-SYNC] log lines for diagnostic visibility. Verification: - All 18 DashPayAttribution tests pass, including 5 new Bug 14 cases: * FlipToValid_BumpsRevivedHeightFromCurrent (the 13dcc4eb scenario) * FlipToValid_RevivedHeightMonotonic (no rollback) * FlipToInvalid_SetsBanHeightWhenZero (implicit-ban capture) * FlipToInvalid_PreservesPreciseBanHeight (don't clobber tx-driven) * RegressionScenario_ClearsStuckQueue (end-to-end: pre-fix find_expected_payee picks h_stuck; post-fix picks h_other) - Bug12_SyncFromSml_OnlyTouchesIsValid renamed to OnlyTouchesOwnedFields and updated to reflect the wider SML-owned set (isValid + banHeight + revivedHeight); MnStateMachine-owned fields (lastPaid/registered/script) remain untouched. This + Bug 12 + Bug 13 close the PoSe-eligibility class permanently across all three paths: tx-driven bans/revives (apply_block), implicit bans (Bug 12 SML sync), and implicit revives + apply_block-gate-hole (Bug 14 SML sync extension).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Brings c2pool-dash to functional parity with p2pool-dash for the network's
consensus-critical paths so c2pool-dash nodes can replace live p2pool-dash
mainnet nodes. Adds a fully self-sufficient SPV-embedded path so c2pool-dash
no longer depends on a local dashd RPC for block templates or submission.
189 commits since branching from master at `46e2bffc` (2026-04-18). Includes
the Phase C work (TEMPLATE/SUBMIT/CUTOVER/PAY/L/SML/QUO/MEMPOOL), the SPV
embedded pipeline (S1/S2/Phase U), four bug fixes from the 2026-04-24 testnet
battle-test, in-process crash diagnostics, and a per-share GENTX-OUTS
diagnostic for cross-implementation debugging.
Phase grouping (for review)
SPV embedded (S1 + S2 + Phase U)
DNS seeds (`ded056cc`), BIP 155 addrv2 (`6103af48`), BIP 152 vendored
blockencodings + negotiation + reassembly (`18104e26` + `3136e00e` +
`145a589a`), UTXO adapter + live connect_block + LevelDB + per-block height
(`79f71a74` + `145a589a` + `7c17cef7` + `96dfe510`), rolling-288 bootstrap
pipeline (`5a89fedb`), tip-changed handler with reorg disconnect_block +
header-sync nudge (`c68b44a7`).
Phase C-SML (Simplified Masternode List sync)
Live-validated bit-exact against Dash mainnet — `[SML] root MATCH` for
blocks 2460036/37/38 from 13+ peers. 7 build steps + Bug A fix (`3629cc74`
uint256 sort-order memcmp + diff.cbTx self-aligned root verify).
Phase C-QUO (Quorum DB persistence)
MVP (`40155291` + `96f10a38`) + persistence (`90f44cc2`) + step-4 schema bump
for mining_height (`f0b550f9`). `load_into()` replays full state at startup
with sentinel cross-check vs sml_db.
Phase L (ChainLock + dashboard SPV panel)
5 build steps + iteration-2a verify-gate + SML rollback (`7660cd70`) +
reorg drop (`a00c9657`) + iteration-2b ban-on-bad-data (`55c2f468`) +
dashboard SPV panel on /web/sync_status (`5b397381`). Linux/macOS only.
Phase C-PAY Path A (Masternode payment verification)
8 commits — ProTx vendor (`7607f59d`), MnStateDb (`b71a88e6`), snapshot
loader + integrity pin (`43815ed1`), `--dump-mn-snapshot` RPC dumper
(`ca9b13be`), RPC bootstrap fallback (`4fca8804`), first in-tree snapshot
(`8b3bdd98`, h=2460249, 2936 entries), per-block state machine
(`1f09f3df`), GetMNPayee + log-only `[PAY]` verify (`74bcebb7`).
Phase C-MEMPOOL
Storage + fee + LRU eviction + confirm-eviction + conflict detection
(`e6542439`); feerate-sorted index + recompute_unknown_fees (`d57ed8e5`).
Adapted from src/impl/ltc/coin/mempool.hpp, dropped segwit/weight.
Phase C-TEMPLATE (Embedded GetBlockTemplate, RPC-independent)
13b commits including subsidy + qfcommit scanner + merkle_root_quorums
(`f0b550f9`), embedded GBT (`346edee1`), CCbTx encoder (`b77cd2f8`),
best-CLSIG (`82e206b3`), MTP-11 mintime (`57eb9f60`), own DGW3 bits
(`530be2c7`), version field (`bbfbd532`), creditPoolBalance seeding
(`579753dc`), base58 payee (`cd40be7a`), DIP-0027 credit-pool state
machine (`1b5a3d32`), CreditPoolDb persistence (`78079113`).
Embedded GBT bit-exact for ALL DashWorkData and CCbTx fields; all 4
consensus dbs warm-startable (SMLDb / QuorumDb / MnStateDb / CreditPoolDb).
Phase C-SUBMIT (P2P block broadcast)
P2P block broadcast as PRIMARY path, RPC optional (`68938a24`); roundtrip
confirmation via pending-submit map matched by on_full_block hash + 30s
warning timer for >60s un-confirmed (`9cd51786`).
Phase C-CUTOVER (Default policy flip + observability)
`--gbt-source` flag (`54b9e41d`), [SUBMIT-SANITY] hop (`1b0d1fbd`),
auto-fallback hysteresis 3-strike (`6ca69995`), dashboard cutover panel +
atomic soak counters (`4814dbe8`), liveness watchdog + 'LOST CONTACT'
warning (`25e6713d`), default policy flip to embedded-prefer (`c053c14e`),
15 unit tests for CreditPool + subsidy (`43ef108f`).
Default behavior is now embedded-with-RPC-cross-check; legacy RPC-primary
requires explicit `--gbt-source rpc`.
Battle-test 2026-04-24 fixes (testnet sharechain interop)
testnet difficulty. Branch on `diff >= 1.0` for sub-1.0 use multiplicative inverse.
accept socket was never bound. Add the bind call.
MAX_TARGET. p2pool-dash testnet rejected every share with 'share PoW invalid'.
Added testnet-specific 0x00000fff... value.
Reverted to log-only.
Testnet branch now sets 18999 so outbound dialer accepts real testnet peers.
All 4 bugs covered by regression tests in
`test/test_dash_battletest_regressions.cpp` (`dca4f656`, 7 tests).
Crash diagnostics + per-share visibility
writes backtrace to stderr + `/tmp/c2pool_dash_crash.log` via
`backtrace_symbols_fd`. No `ulimit` / `sudo` / `core_pattern` dance.
Captures next mainnet crash autonomously.
n_outs, output (amount, script_hex), hash_link state. Designed for
cross-comparison with p2pool's regenerated gentx when share.check() raises
`'gentx doesn't match hash_link'`. Successfully diagnosed the stale-state
rejection cycle on 2026-04-24.
Test status
compact_blocks + decay_pplns + header_chain + mempool + mweb_builder +
phase4_{embedded,live} + redistribute_address + share_messages +
template_builder + utxo + v36_script_sorting + weights + hardening.
for h=2460036/37/38 from 13+ independent peers).
p2pool-dash on .42/.191 — federation works (44+/59 verified, zero
rejections after the battle-test fixes landed).
Known issues NOT in this PR
`test_pplns_stress`, `test_auto_ratchet`) — same failure on master, not
introduced here. CMake target_link_libraries order issue, separate fix.
macOS ARM build verification scheduled for 2026-04-29.
(apport ate the core, ulimit was 0). The crash handler in this PR
(`2d33d09a`) makes the next firing autonomously diagnostic. Treated as
non-blocking for merge.
(cosmetic, not consensus-affecting).
Test plan