Add threshold sign to end of reshare#148
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
🧰 Additional context used🧠 Learnings (4)📓 Common learnings📚 Learning: 2026-04-28T14:49:17.416ZApplied to files:
📚 Learning: 2026-04-29T16:47:26.051ZApplied to files:
📚 Learning: 2026-04-30T20:31:34.188ZApplied to files:
🔇 Additional comments (2)
📝 WalkthroughWalkthroughAdds reshare signing and readiness machinery plus timing constants; tightens many Dkg generic bounds to require Changes
Sequence Diagram(s)sequenceDiagram
participant Node1 as Node-1 (new committee)
participant NewNode as New Committee Member
participant SignCoord as SignCoordinator
participant SessionMgr as SessionStateManager
participant Bulletin as BulletinStore
participant Background as Poller
NewNode->>SessionMgr: mark_reshare_signature_ready(key)
Node1->>SignCoord: create RingReshareUpdateStatement
SignCoord->>SignCoord: retry signing loop (unique request_id per attempt)
SignCoord->>SignCoord: local validation via validate_ring_reshare_update_statement
SignCoord->>Node1: request threshold signature (with context RingReshareUpdate)
alt signature obtained
Node1->>Bulletin: post bulletin update with signature metadata
SignCoord->>Background: spawn poller to wait for bulletin confirmation or timeout
Background->>Bulletin: poll for bulletin confirmed (poll interval)
Background->>SessionMgr: on confirmation or timeout -> unmark_ring_pss & remove_session
else non-retryable failure
SignCoord->>Node1: abort signing, return error
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 60 minutes.Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
bin/orbis-node/src/dkg/coordinator/phases.rs (1)
1007-1241:⚠️ Potential issue | 🟠 Major | ⚡ Quick winKeep the ring claimed until the bulletin update actually finishes.
This new retry/signing path runs after the earlier
unmark_ring_pss()call, so the ring becomes claimable again while node 1 is still collecting threshold shares or before the bulletin write succeeds. That allows a second refresh/reshare to overlap with an in-flight finalization of the same ring.Suggested fix
- // Clear the in-progress ceremony flag now that Phase 4 has succeeded. - // (Phase4Complete was already set by check_and_trigger_phase4 before this - // function was called; no second update_phase needed here.) - if let Some(ring_key) = kind.ring_key() { - coord - .app_state - .dkg_session_state - .unmark_ring_pss(ring_key) - .await; - } - let ring_pk_bytes = CryptoSerialize::to_bytes(&aggregate_pk).map_err(|e| { DkgError::Serialization(format!("Failed to serialize aggregate public key: {}", e)) })?; @@ if let SessionKind::Reshare { ring_pk_hex, new_threshold, .. } = &kind { ... } + + // Only release the claim once the success path is fully finalized. + // The error path already goes through remove_session()/cleanup. + if let Some(ring_key) = kind.ring_key() { + coord + .app_state + .dkg_session_state + .unmark_ring_pss(ring_key) + .await; + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/orbis-node/src/dkg/coordinator/phases.rs` around lines 1007 - 1241, The ring is being unmarked (via unmark_ring_pss) before node 1 finishes collecting signatures and updating the bulletin, making the ring claimable during the in-flight reshare finalization; ensure the ring remains claimed until the bulletin update completes by moving or deferring the unmark call: keep the ring PSS claim active across collect_reshare_update_signature_with_retry(...) and the subsequent coord.app_state.bulletin.update(...) call (use session_id and ring_pk_hex to identify the ring), and only call unmark_ring_pss (or equivalent) after the bulletin.update Future resolves successfully (and on error ensure you clean up/unmark appropriately to avoid dead claims).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bin/orbis-node/src/dkg/session_state.rs`:
- Around line 583-591: When tearing down a session via remove_session(), also
clear any reshare-ready markers so abandoned sessions can't pass
validate_ring_reshare_update_statement(); specifically, remove the corresponding
entry from the reshare_signature_ready set (the same set manipulated by
mark_reshare_signature_ready and checked by is_reshare_signature_ready) for that
session's (session_id, bulletin_post_id, payload hashes) tuple. Locate
remove_session and add an awaited write lock on reshare_signature_ready to
delete all keys tied to the session being removed (or the single key if
available) so markers are purged on normal session teardown rather than waiting
for cleanup/expiration workers.
---
Outside diff comments:
In `@bin/orbis-node/src/dkg/coordinator/phases.rs`:
- Around line 1007-1241: The ring is being unmarked (via unmark_ring_pss) before
node 1 finishes collecting signatures and updating the bulletin, making the ring
claimable during the in-flight reshare finalization; ensure the ring remains
claimed until the bulletin update completes by moving or deferring the unmark
call: keep the ring PSS claim active across
collect_reshare_update_signature_with_retry(...) and the subsequent
coord.app_state.bulletin.update(...) call (use session_id and ring_pk_hex to
identify the ring), and only call unmark_ring_pss (or equivalent) after the
bulletin.update Future resolves successfully (and on error ensure you clean
up/unmark appropriately to avoid dead claims).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a8a745e6-9d85-4cf6-b77d-8eb15d4199d2
📒 Files selected for processing (15)
bin/orbis-node/src/constants.rsbin/orbis-node/src/dkg/coordinator/message_handlers.rsbin/orbis-node/src/dkg/coordinator/mod.rsbin/orbis-node/src/dkg/coordinator/network.rsbin/orbis-node/src/dkg/coordinator/phases.rsbin/orbis-node/src/dkg/protocol_handler.rsbin/orbis-node/src/dkg/service.rsbin/orbis-node/src/dkg/session_state.rsbin/orbis-node/src/dkg/tests/reshare.rsbin/orbis-node/src/helpers/create_routers.rsbin/orbis-node/src/helpers/test_helpers.rsbin/orbis-node/src/pss/mod.rsbin/orbis-node/src/sign/coordinator.rsbin/orbis-node/src/sign/helpers.rsbin/orbis-node/src/sign/messages.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: lint
- GitHub Check: build
🔇 Additional comments (8)
bin/orbis-node/src/constants.rs (1)
287-291: Good addition of centralized retry controls.Keeping these reshare-signature retry parameters in constants improves operability and avoids magic numbers in coordinator logic.
bin/orbis-node/src/sign/messages.rs (1)
10-11: Protocol message extension is well-structured.The new domain separator, canonical statement type, and explicit
SignContextvariant cleanly isolate the reshare-update signing pathway.Also applies to: 30-51, 68-70
bin/orbis-node/src/dkg/service.rs (1)
39-44: Type-bound tightening looks consistent and safe.This keeps
DkgServiceImplaligned with the coordinator/type expectations introduced across the DKG stack.bin/orbis-node/src/dkg/coordinator/network.rs (1)
7-8: Generic constraints are consistently propagated.Nice job making coordinator network helpers use the same
PubPolycontract end-to-end.Also applies to: 34-39, 63-68, 101-106, 201-206
bin/orbis-node/src/helpers/test_helpers.rs (1)
346-386: Test network wiring update is appropriate.Enabling DKG+PRE+Sign handlers in this helper aligns test topology with the new reshare signing behavior.
bin/orbis-node/src/dkg/protocol_handler.rs (1)
15-20: Bound update is consistent with the rest of the DKG pipeline.Looks good for maintaining a single concrete
PubPolytype through protocol handling.bin/orbis-node/src/pss/mod.rs (1)
44-45: PSS generic-bound updates are coherent.These changes keep refresh/reshare scheduler paths aligned with the concrete DKG public polynomial type used elsewhere.
Also applies to: 62-63, 94-95, 124-125, 240-241, 448-449
bin/orbis-node/src/dkg/coordinator/message_handlers.rs (1)
18-19: Handler-bound unification is clean and consistent.Good compile-time alignment across all DKG coordinator message paths.
Also applies to: 59-64, 533-538, 793-798, 923-928, 1008-1013, 1126-1131, 1197-1202
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bin/orbis-node/src/dkg/coordinator/phases.rs`:
- Around line 1248-1260: The ring claim is being released too early for
non-node1 signers; instead of immediately calling
dkg_session_state.unmark_ring_pss(ring_key) and coord.remove_session(session_id)
at end of Phase 4, ensure the claim is held until peers have observed the
finalized bulletin (or wait for an explicit post-update completion signal).
Modify the logic around SessionKind::Reshare and the Session removal to (a)
detect whether this node is the sole waiter (new_node_id == 1) or not, and (b)
for non-node1 paths block/unawait unmark_ring_pss and remove_session until
either a bulletin-finalized confirmation is received or a retry/ack from
initiate_signing completes; use or add a completion signal/flag in app_state
(e.g., a bulletin_observed or post_update_notify) to drive
unmark_ring_pss(ring_key) and coord.remove_session(session_id) so the ring
remains claimed while other nodes may still retry initiate_signing().
- Around line 1061-1071: The ReshareSignatureReadyKey is being created with the
raw session ring_pk_hex which can mismatch the normalized storage key used by
validate_ring_reshare_update_statement; update the call that constructs
ReshareSignatureReadyKey for mark_reshare_signature_ready (the block calling
coord.app_state.dkg_session_state.mark_reshare_signature_ready) to set ring_key
to the previously computed storage_key variable instead of ring_pk_hex, keeping
all other fields (session_id, bulletin_post_id, current_payload_sha256,
updated_payload_sha256) the same so the readiness check uses the canonical key.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5fff1fff-c825-4a7c-83e7-779ce575c287
📒 Files selected for processing (1)
bin/orbis-node/src/dkg/coordinator/phases.rs
📜 Review details
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: JesseAbram
Repo: sourcenetwork/orbis-rs PR: 147
File: bin/orbis-node/src/store_secret/service.rs:0-0
Timestamp: 2026-04-29T16:47:26.051Z
Learning: In `bin/orbis-node/src/store_secret/service.rs`, the pre-store `verify_encryption` check (which previously passed ring key, effective key, and policy metadata into `validate_encrypted_document`) was intentionally removed as part of PR `#147` (closes issue `#139`). The rationale: running this verification before storing requires exposing ring key and policy metadata to the validation function, which leaks privacy. Since a caller could always store cryptographically junk data regardless of this check, removing it does not weaken the protocol's security invariants — it only improves privacy. Proof field validation (shared_point/challenge/response) is deferred to downstream consumers.
Learnt from: JesseAbram
Repo: sourcenetwork/orbis-rs PR: 142
File: bin/orbis-node/src/dkg/helpers.rs:126-145
Timestamp: 2026-04-28T14:49:17.416Z
Learning: In `bin/orbis-node/src/dkg/helpers.rs`, `load_reshare_ring_payload` intentionally prefers the local `RingIndex` bulletin_post_id over the wire-provided one for old-committee nodes. This is by design: local `RingIndex` and `RingShareBundle` are part of the validity boundary. An old-committee member with a stale or missing local post ID is considered unhealthy; falling back to the wire-supplied post ID would mask stale-state drift rather than surface it. Pure receivers (nodes not in the old committee, with no local index entry) are the only case where the wire `bulletin_post_id` is used as a fallback, which is explicitly allowed.
📚 Learning: 2026-04-28T14:49:17.416Z
Learnt from: JesseAbram
Repo: sourcenetwork/orbis-rs PR: 142
File: bin/orbis-node/src/dkg/helpers.rs:126-145
Timestamp: 2026-04-28T14:49:17.416Z
Learning: In `bin/orbis-node/src/dkg/helpers.rs`, `load_reshare_ring_payload` intentionally prefers the local `RingIndex` bulletin_post_id over the wire-provided one for old-committee nodes. This is by design: local `RingIndex` and `RingShareBundle` are part of the validity boundary. An old-committee member with a stale or missing local post ID is considered unhealthy; falling back to the wire-supplied post ID would mask stale-state drift rather than surface it. Pure receivers (nodes not in the old committee, with no local index entry) are the only case where the wire `bulletin_post_id` is used as a fallback, which is explicitly allowed.
Applied to files:
bin/orbis-node/src/dkg/coordinator/phases.rs
📚 Learning: 2026-04-29T16:47:26.051Z
Learnt from: JesseAbram
Repo: sourcenetwork/orbis-rs PR: 147
File: bin/orbis-node/src/store_secret/service.rs:0-0
Timestamp: 2026-04-29T16:47:26.051Z
Learning: In `bin/orbis-node/src/store_secret/service.rs`, the pre-store `verify_encryption` check (which previously passed ring key, effective key, and policy metadata into `validate_encrypted_document`) was intentionally removed as part of PR `#147` (closes issue `#139`). The rationale: running this verification before storing requires exposing ring key and policy metadata to the validation function, which leaks privacy. Since a caller could always store cryptographically junk data regardless of this check, removing it does not weaken the protocol's security invariants — it only improves privacy. Proof field validation (shared_point/challenge/response) is deferred to downstream consumers.
Applied to files:
bin/orbis-node/src/dkg/coordinator/phases.rs
🔇 Additional comments (1)
bin/orbis-node/src/dkg/coordinator/phases.rs (1)
39-68: Nice retry isolation.Attempt-scoped request IDs here are the right shape for avoiding reuse across retries, and the test locks that sequencing in.
Also applies to: 1276-1308
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
bin/orbis-node/src/dkg/coordinator/phases.rs (1)
1088-1090:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon't fall back to
next_peer_idsfor the final committee order.Phase 2 treats
reshare_params.new_peer_idsas the sorted new-committee order used for reshare routing, but this branch silently falls back to the wire-ordernext_peer_ids. If those orders differ,RingConfigand the finalRingPayloadwill use a different node ordering than the share distribution did, which can break signer node IDs during finalization. Fail fast if the sorted list is missing.Suggested fix
- let sorted_new_peer_ids = reshare_new_peer_ids - .clone() - .unwrap_or_else(|| next_peer_ids.clone()); + let sorted_new_peer_ids = reshare_new_peer_ids.clone().ok_or_else(|| { + DkgError::ProtocolError( + "Reshare: missing sorted new-committee peer list".to_string(), + ) + })?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/orbis-node/src/dkg/coordinator/phases.rs` around lines 1088 - 1090, The code falls back from reshare_params.new_peer_ids to next_peer_ids when computing sorted_new_peer_ids, which can cause RingConfig/RingPayload ordering mismatches; change the logic in the phase where sorted_new_peer_ids is computed so that if reshare_new_peer_ids (reshare_params.new_peer_ids) is None you return an error (fail fast) instead of using next_peer_ids, ensuring RingConfig and RingPayload use the same ordering as the share distribution; update error propagation in the surrounding function (where sorted_new_peer_ids, reshare_new_peer_ids, next_peer_ids, RingConfig, and RingPayload are referenced) to return a descriptive Err when the sorted list is missing.
🧹 Nitpick comments (1)
bin/orbis-node/src/dkg/coordinator/phases.rs (1)
711-717: Use the same session-removal path in the background cleanup task.This cleanup task calls
app_state.dkg_session_state.remove_session()directly while error paths throughout the file usecoord.remove_session(). SinceDkgCoordinator::remove_session()is a thin pass-through wrapper with no extra logic, both are functionally equivalent. However, for consistency and maintainability, route this through the coordinator abstraction like the rest of the codebase.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/orbis-node/src/dkg/coordinator/phases.rs` around lines 711 - 717, The cleanup path should call the coordinator's removal helper instead of touching the session state directly: replace the direct call to app_state.dkg_session_state.remove_session(&session_id).await with the coordinator wrapper (e.g. coord.remove_session(session_id).await or coord.remove_session(&session_id).await matching the existing DkgCoordinator::remove_session signature) and keep the preceding unmark_ring_pss logic as-is so the task uses the same coordinator abstraction as the rest of the file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@bin/orbis-node/src/dkg/coordinator/phases.rs`:
- Around line 1088-1090: The code falls back from reshare_params.new_peer_ids to
next_peer_ids when computing sorted_new_peer_ids, which can cause
RingConfig/RingPayload ordering mismatches; change the logic in the phase where
sorted_new_peer_ids is computed so that if reshare_new_peer_ids
(reshare_params.new_peer_ids) is None you return an error (fail fast) instead of
using next_peer_ids, ensuring RingConfig and RingPayload use the same ordering
as the share distribution; update error propagation in the surrounding function
(where sorted_new_peer_ids, reshare_new_peer_ids, next_peer_ids, RingConfig, and
RingPayload are referenced) to return a descriptive Err when the sorted list is
missing.
---
Nitpick comments:
In `@bin/orbis-node/src/dkg/coordinator/phases.rs`:
- Around line 711-717: The cleanup path should call the coordinator's removal
helper instead of touching the session state directly: replace the direct call
to app_state.dkg_session_state.remove_session(&session_id).await with the
coordinator wrapper (e.g. coord.remove_session(session_id).await or
coord.remove_session(&session_id).await matching the existing
DkgCoordinator::remove_session signature) and keep the preceding unmark_ring_pss
logic as-is so the task uses the same coordinator abstraction as the rest of the
file.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 16eedcb5-05eb-4bb6-8a9e-bfb63dc29d52
📒 Files selected for processing (2)
bin/orbis-node/src/constants.rsbin/orbis-node/src/dkg/coordinator/phases.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: lint
- GitHub Check: build
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-04-28T14:49:17.416Z
Learnt from: JesseAbram
Repo: sourcenetwork/orbis-rs PR: 142
File: bin/orbis-node/src/dkg/helpers.rs:126-145
Timestamp: 2026-04-28T14:49:17.416Z
Learning: In `bin/orbis-node/src/dkg/helpers.rs`, `load_reshare_ring_payload` intentionally prefers the local `RingIndex` bulletin_post_id over the wire-provided one for old-committee nodes. This is by design: local `RingIndex` and `RingShareBundle` are part of the validity boundary. An old-committee member with a stale or missing local post ID is considered unhealthy; falling back to the wire-supplied post ID would mask stale-state drift rather than surface it. Pure receivers (nodes not in the old committee, with no local index entry) are the only case where the wire `bulletin_post_id` is used as a fallback, which is explicitly allowed.
Applied to files:
bin/orbis-node/src/constants.rsbin/orbis-node/src/dkg/coordinator/phases.rs
📚 Learning: 2026-04-29T16:47:26.051Z
Learnt from: JesseAbram
Repo: sourcenetwork/orbis-rs PR: 147
File: bin/orbis-node/src/store_secret/service.rs:0-0
Timestamp: 2026-04-29T16:47:26.051Z
Learning: In `bin/orbis-node/src/store_secret/service.rs`, the pre-store `verify_encryption` check (which previously passed ring key, effective key, and policy metadata into `validate_encrypted_document`) was intentionally removed as part of PR `#147` (closes issue `#139`). The rationale: running this verification before storing requires exposing ring key and policy metadata to the validation function, which leaks privacy. Since a caller could always store cryptographically junk data regardless of this check, removing it does not weaken the protocol's security invariants — it only improves privacy. Proof field validation (shared_point/challenge/response) is deferred to downstream consumers.
Applied to files:
bin/orbis-node/src/dkg/coordinator/phases.rs
🔇 Additional comments (1)
bin/orbis-node/src/constants.rs (1)
287-300: These reshare timing constants fit the new flow well.The retry window and bulletin-confirmation window are documented clearly, and centralizing them here makes the new Phase 4 behavior easier to tune.
Related #112