Skip to content

Add threshold sign to end of reshare#148

Merged
JesseAbram merged 7 commits intodevelopfrom
Add-threhsold-sign-to-end-of-reshare
Apr 30, 2026
Merged

Add threshold sign to end of reshare#148
JesseAbram merged 7 commits intodevelopfrom
Add-threhsold-sign-to-end-of-reshare

Conversation

@JesseAbram
Copy link
Copy Markdown
Collaborator

@JesseAbram JesseAbram commented Apr 29, 2026

Related #112

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 51c06cbd-29f6-4d60-9d2b-ff5228a6dee9

📥 Commits

Reviewing files that changed from the base of the PR and between 402563e and 8f3603f.

📒 Files selected for processing (2)
  • bin/orbis-node/src/dkg/session_state.rs
  • bin/orbis-node/src/sign/helpers.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • bin/orbis-node/src/dkg/session_state.rs
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: lint
  • GitHub Check: build
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: JesseAbram
Repo: sourcenetwork/orbis-rs PR: 148
File: bin/orbis-node/src/dkg/coordinator/phases.rs:689-695
Timestamp: 2026-04-30T20:31:34.188Z
Learning: In `bin/orbis-node/src/dkg/coordinator/phases.rs`, `wait_for_reshare_bulletin_finalized` intentionally gates PSS-claim release only on `next_peer_ids` and `new_threshold` being `None`. This is by design: an upcoming PR (related to PR `#148`) will make the threshold signature the trigger that clears those fields on the bulletin, so this poll condition will be sufficient once that change lands. Do not flag the absence of explicit signature-metadata verification in this function as a bug.
Learnt from: JesseAbram
Repo: sourcenetwork/orbis-rs PR: 142
File: bin/orbis-node/src/dkg/helpers.rs:126-145
Timestamp: 2026-04-28T14:49:17.416Z
Learning: In `bin/orbis-node/src/dkg/helpers.rs`, `load_reshare_ring_payload` intentionally prefers the local `RingIndex` bulletin_post_id over the wire-provided one for old-committee nodes. This is by design: local `RingIndex` and `RingShareBundle` are part of the validity boundary. An old-committee member with a stale or missing local post ID is considered unhealthy; falling back to the wire-supplied post ID would mask stale-state drift rather than surface it. Pure receivers (nodes not in the old committee, with no local index entry) are the only case where the wire `bulletin_post_id` is used as a fallback, which is explicitly allowed.
📚 Learning: 2026-04-28T14:49:17.416Z
Learnt from: JesseAbram
Repo: sourcenetwork/orbis-rs PR: 142
File: bin/orbis-node/src/dkg/helpers.rs:126-145
Timestamp: 2026-04-28T14:49:17.416Z
Learning: In `bin/orbis-node/src/dkg/helpers.rs`, `load_reshare_ring_payload` intentionally prefers the local `RingIndex` bulletin_post_id over the wire-provided one for old-committee nodes. This is by design: local `RingIndex` and `RingShareBundle` are part of the validity boundary. An old-committee member with a stale or missing local post ID is considered unhealthy; falling back to the wire-supplied post ID would mask stale-state drift rather than surface it. Pure receivers (nodes not in the old committee, with no local index entry) are the only case where the wire `bulletin_post_id` is used as a fallback, which is explicitly allowed.

Applied to files:

  • bin/orbis-node/src/sign/helpers.rs
📚 Learning: 2026-04-29T16:47:26.051Z
Learnt from: JesseAbram
Repo: sourcenetwork/orbis-rs PR: 147
File: bin/orbis-node/src/store_secret/service.rs:0-0
Timestamp: 2026-04-29T16:47:26.051Z
Learning: In `bin/orbis-node/src/store_secret/service.rs`, the pre-store `verify_encryption` check (which previously passed ring key, effective key, and policy metadata into `validate_encrypted_document`) was intentionally removed as part of PR `#147` (closes issue `#139`). The rationale: running this verification before storing requires exposing ring key and policy metadata to the validation function, which leaks privacy. Since a caller could always store cryptographically junk data regardless of this check, removing it does not weaken the protocol's security invariants — it only improves privacy. Proof field validation (shared_point/challenge/response) is deferred to downstream consumers.

Applied to files:

  • bin/orbis-node/src/sign/helpers.rs
📚 Learning: 2026-04-30T20:31:34.188Z
Learnt from: JesseAbram
Repo: sourcenetwork/orbis-rs PR: 148
File: bin/orbis-node/src/dkg/coordinator/phases.rs:689-695
Timestamp: 2026-04-30T20:31:34.188Z
Learning: In `bin/orbis-node/src/dkg/coordinator/phases.rs`, `wait_for_reshare_bulletin_finalized` intentionally gates PSS-claim release only on `next_peer_ids` and `new_threshold` being `None`. This is by design: an upcoming PR (related to PR `#148`) will make the threshold signature the trigger that clears those fields on the bulletin, so this poll condition will be sufficient once that change lands. Do not flag the absence of explicit signature-metadata verification in this function as a bug.

Applied to files:

  • bin/orbis-node/src/sign/helpers.rs
🔇 Additional comments (2)
bin/orbis-node/src/sign/helpers.rs (2)

111-252: Validator logic looks aligned with the reshare flow.

The fallback to the current committee/threshold when only one announcement field is present, plus the readiness key binding to the bulletin post and both payload hashes, makes this path reject mixed-state updates cleanly.

Based on learnings: wait_for_reshare_bulletin_finalized intentionally treats next_peer_ids and new_threshold as an either-or announcement until a later threshold-signature update clears both fields.


695-767: Good coverage for the one-sided reshare cases.

These tests hit the intended matrix: no pending announcement, missing ready marker, committee-only reshare, and threshold-only reshare.


📝 Walkthrough

Walkthrough

Adds reshare signing and readiness machinery plus timing constants; tightens many Dkg generic bounds to require PubPoly = PubPolyImpl; introduces reshare-signature retry logic, bulletin-confirmation background poller that defers session unmark/removal, session-state readiness markers, new sign message types/validation, and related test updates.

Changes

Cohort / File(s) Summary
Constants
bin/orbis-node/src/constants.rs
Adds reshare/pss timing and retry constants: RESHARE_SIGNATURE_MAX_ATTEMPTS, RESHARE_SIGNATURE_RETRY_DELAY, RESHARE_BULLETIN_CONFIRM_POLL_INTERVAL, RESHARE_BULLETIN_CONFIRM_TIMEOUT.
DKG coordinator internals
bin/orbis-node/src/dkg/coordinator/message_handlers.rs, bin/orbis-node/src/dkg/coordinator/mod.rs, bin/orbis-node/src/dkg/coordinator/network.rs, bin/orbis-node/src/dkg/coordinator/phases.rs
Tightens Dkg trait bounds to require PubPoly = PubPolyImpl; adds reshare signing retry helper, retry classification, changes Phase-4 reshare bulletin signing flow, and defers unmark/remove via background poller.
DKG surface & helpers
bin/orbis-node/src/dkg/protocol_handler.rs, bin/orbis-node/src/dkg/service.rs, bin/orbis-node/src/helpers/create_routers.rs, bin/orbis-node/src/helpers/test_helpers.rs, bin/orbis-node/src/dkg/network.rs*
Updates generic where bounds to include PubPoly = PubPolyImpl for public/internal router/service/coordinator entrypoints and helper functions.
Session state
bin/orbis-node/src/dkg/session_state.rs
Adds ReshareSignatureReadyKey, shared readiness HashSet, mark_reshare_signature_ready, is_reshare_signature_ready, and cleanup integration to remove markers on session removal/expiration; adds test.
Tests (reshare)
bin/orbis-node/src/dkg/tests/reshare.rs
Expands test routers to include all handlers and adds a bulletin-observation gate polling for final RingPayload state during end-to-end reshare test.
PSS entrypoints
bin/orbis-node/src/pss/mod.rs
Adds PubPoly = PubPolyImpl constraint to PSS scheduler/entrypoint generics; imports PubPolyImpl.
Signing coordinator & messages
bin/orbis-node/src/sign/coordinator.rs, bin/orbis-node/src/sign/helpers.rs, bin/orbis-node/src/sign/messages.rs, bin/orbis-node/src/sign/messages.rs
Introduces RingReshareUpdate sign context and message types; canonical serialization/context helpers; validate_ring_reshare_update_statement enforcing bulletin/state checks and local readiness; signing flow uses reshare-specific context and nonces; adds unit tests.
Minor
bin/orbis-node/src/dkg/session_state.rs (other small edits)
Small visibility/type adjustments and imports to support new types and bounds.

Sequence Diagram(s)

sequenceDiagram
  participant Node1 as Node-1 (new committee)
  participant NewNode as New Committee Member
  participant SignCoord as SignCoordinator
  participant SessionMgr as SessionStateManager
  participant Bulletin as BulletinStore
  participant Background as Poller

  NewNode->>SessionMgr: mark_reshare_signature_ready(key)
  Node1->>SignCoord: create RingReshareUpdateStatement
  SignCoord->>SignCoord: retry signing loop (unique request_id per attempt)
  SignCoord->>SignCoord: local validation via validate_ring_reshare_update_statement
  SignCoord->>Node1: request threshold signature (with context RingReshareUpdate)
  alt signature obtained
    Node1->>Bulletin: post bulletin update with signature metadata
    SignCoord->>Background: spawn poller to wait for bulletin confirmation or timeout
    Background->>Bulletin: poll for bulletin confirmed (poll interval)
    Background->>SessionMgr: on confirmation or timeout -> unmark_ring_pss & remove_session
  else non-retryable failure
    SignCoord->>Node1: abort signing, return error
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

@JesseAbram JesseAbram marked this pull request as ready for review April 30, 2026 14:35
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
bin/orbis-node/src/dkg/coordinator/phases.rs (1)

1007-1241: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep the ring claimed until the bulletin update actually finishes.

This new retry/signing path runs after the earlier unmark_ring_pss() call, so the ring becomes claimable again while node 1 is still collecting threshold shares or before the bulletin write succeeds. That allows a second refresh/reshare to overlap with an in-flight finalization of the same ring.

Suggested fix
-    // Clear the in-progress ceremony flag now that Phase 4 has succeeded.
-    // (Phase4Complete was already set by check_and_trigger_phase4 before this
-    // function was called; no second update_phase needed here.)
-    if let Some(ring_key) = kind.ring_key() {
-        coord
-            .app_state
-            .dkg_session_state
-            .unmark_ring_pss(ring_key)
-            .await;
-    }
-
     let ring_pk_bytes = CryptoSerialize::to_bytes(&aggregate_pk).map_err(|e| {
         DkgError::Serialization(format!("Failed to serialize aggregate public key: {}", e))
     })?;
@@
     if let SessionKind::Reshare {
         ring_pk_hex,
         new_threshold,
         ..
     } = &kind
     {
         ...
     }
+
+    // Only release the claim once the success path is fully finalized.
+    // The error path already goes through remove_session()/cleanup.
+    if let Some(ring_key) = kind.ring_key() {
+        coord
+            .app_state
+            .dkg_session_state
+            .unmark_ring_pss(ring_key)
+            .await;
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/orbis-node/src/dkg/coordinator/phases.rs` around lines 1007 - 1241, The
ring is being unmarked (via unmark_ring_pss) before node 1 finishes collecting
signatures and updating the bulletin, making the ring claimable during the
in-flight reshare finalization; ensure the ring remains claimed until the
bulletin update completes by moving or deferring the unmark call: keep the ring
PSS claim active across collect_reshare_update_signature_with_retry(...) and the
subsequent coord.app_state.bulletin.update(...) call (use session_id and
ring_pk_hex to identify the ring), and only call unmark_ring_pss (or equivalent)
after the bulletin.update Future resolves successfully (and on error ensure you
clean up/unmark appropriately to avoid dead claims).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bin/orbis-node/src/dkg/session_state.rs`:
- Around line 583-591: When tearing down a session via remove_session(), also
clear any reshare-ready markers so abandoned sessions can't pass
validate_ring_reshare_update_statement(); specifically, remove the corresponding
entry from the reshare_signature_ready set (the same set manipulated by
mark_reshare_signature_ready and checked by is_reshare_signature_ready) for that
session's (session_id, bulletin_post_id, payload hashes) tuple. Locate
remove_session and add an awaited write lock on reshare_signature_ready to
delete all keys tied to the session being removed (or the single key if
available) so markers are purged on normal session teardown rather than waiting
for cleanup/expiration workers.

---

Outside diff comments:
In `@bin/orbis-node/src/dkg/coordinator/phases.rs`:
- Around line 1007-1241: The ring is being unmarked (via unmark_ring_pss) before
node 1 finishes collecting signatures and updating the bulletin, making the ring
claimable during the in-flight reshare finalization; ensure the ring remains
claimed until the bulletin update completes by moving or deferring the unmark
call: keep the ring PSS claim active across
collect_reshare_update_signature_with_retry(...) and the subsequent
coord.app_state.bulletin.update(...) call (use session_id and ring_pk_hex to
identify the ring), and only call unmark_ring_pss (or equivalent) after the
bulletin.update Future resolves successfully (and on error ensure you clean
up/unmark appropriately to avoid dead claims).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a8a745e6-9d85-4cf6-b77d-8eb15d4199d2

📥 Commits

Reviewing files that changed from the base of the PR and between adeb899 and 31c3892.

📒 Files selected for processing (15)
  • bin/orbis-node/src/constants.rs
  • bin/orbis-node/src/dkg/coordinator/message_handlers.rs
  • bin/orbis-node/src/dkg/coordinator/mod.rs
  • bin/orbis-node/src/dkg/coordinator/network.rs
  • bin/orbis-node/src/dkg/coordinator/phases.rs
  • bin/orbis-node/src/dkg/protocol_handler.rs
  • bin/orbis-node/src/dkg/service.rs
  • bin/orbis-node/src/dkg/session_state.rs
  • bin/orbis-node/src/dkg/tests/reshare.rs
  • bin/orbis-node/src/helpers/create_routers.rs
  • bin/orbis-node/src/helpers/test_helpers.rs
  • bin/orbis-node/src/pss/mod.rs
  • bin/orbis-node/src/sign/coordinator.rs
  • bin/orbis-node/src/sign/helpers.rs
  • bin/orbis-node/src/sign/messages.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: lint
  • GitHub Check: build
🔇 Additional comments (8)
bin/orbis-node/src/constants.rs (1)

287-291: Good addition of centralized retry controls.

Keeping these reshare-signature retry parameters in constants improves operability and avoids magic numbers in coordinator logic.

bin/orbis-node/src/sign/messages.rs (1)

10-11: Protocol message extension is well-structured.

The new domain separator, canonical statement type, and explicit SignContext variant cleanly isolate the reshare-update signing pathway.

Also applies to: 30-51, 68-70

bin/orbis-node/src/dkg/service.rs (1)

39-44: Type-bound tightening looks consistent and safe.

This keeps DkgServiceImpl aligned with the coordinator/type expectations introduced across the DKG stack.

bin/orbis-node/src/dkg/coordinator/network.rs (1)

7-8: Generic constraints are consistently propagated.

Nice job making coordinator network helpers use the same PubPoly contract end-to-end.

Also applies to: 34-39, 63-68, 101-106, 201-206

bin/orbis-node/src/helpers/test_helpers.rs (1)

346-386: Test network wiring update is appropriate.

Enabling DKG+PRE+Sign handlers in this helper aligns test topology with the new reshare signing behavior.

bin/orbis-node/src/dkg/protocol_handler.rs (1)

15-20: Bound update is consistent with the rest of the DKG pipeline.

Looks good for maintaining a single concrete PubPoly type through protocol handling.

bin/orbis-node/src/pss/mod.rs (1)

44-45: PSS generic-bound updates are coherent.

These changes keep refresh/reshare scheduler paths aligned with the concrete DKG public polynomial type used elsewhere.

Also applies to: 62-63, 94-95, 124-125, 240-241, 448-449

bin/orbis-node/src/dkg/coordinator/message_handlers.rs (1)

18-19: Handler-bound unification is clean and consistent.

Good compile-time alignment across all DKG coordinator message paths.

Also applies to: 59-64, 533-538, 793-798, 923-928, 1008-1013, 1126-1131, 1197-1202

Comment thread bin/orbis-node/src/dkg/session_state.rs
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bin/orbis-node/src/dkg/coordinator/phases.rs`:
- Around line 1248-1260: The ring claim is being released too early for
non-node1 signers; instead of immediately calling
dkg_session_state.unmark_ring_pss(ring_key) and coord.remove_session(session_id)
at end of Phase 4, ensure the claim is held until peers have observed the
finalized bulletin (or wait for an explicit post-update completion signal).
Modify the logic around SessionKind::Reshare and the Session removal to (a)
detect whether this node is the sole waiter (new_node_id == 1) or not, and (b)
for non-node1 paths block/unawait unmark_ring_pss and remove_session until
either a bulletin-finalized confirmation is received or a retry/ack from
initiate_signing completes; use or add a completion signal/flag in app_state
(e.g., a bulletin_observed or post_update_notify) to drive
unmark_ring_pss(ring_key) and coord.remove_session(session_id) so the ring
remains claimed while other nodes may still retry initiate_signing().
- Around line 1061-1071: The ReshareSignatureReadyKey is being created with the
raw session ring_pk_hex which can mismatch the normalized storage key used by
validate_ring_reshare_update_statement; update the call that constructs
ReshareSignatureReadyKey for mark_reshare_signature_ready (the block calling
coord.app_state.dkg_session_state.mark_reshare_signature_ready) to set ring_key
to the previously computed storage_key variable instead of ring_pk_hex, keeping
all other fields (session_id, bulletin_post_id, current_payload_sha256,
updated_payload_sha256) the same so the readiness check uses the canonical key.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5fff1fff-c825-4a7c-83e7-779ce575c287

📥 Commits

Reviewing files that changed from the base of the PR and between 31c3892 and e70cf78.

📒 Files selected for processing (1)
  • bin/orbis-node/src/dkg/coordinator/phases.rs
📜 Review details
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: JesseAbram
Repo: sourcenetwork/orbis-rs PR: 147
File: bin/orbis-node/src/store_secret/service.rs:0-0
Timestamp: 2026-04-29T16:47:26.051Z
Learning: In `bin/orbis-node/src/store_secret/service.rs`, the pre-store `verify_encryption` check (which previously passed ring key, effective key, and policy metadata into `validate_encrypted_document`) was intentionally removed as part of PR `#147` (closes issue `#139`). The rationale: running this verification before storing requires exposing ring key and policy metadata to the validation function, which leaks privacy. Since a caller could always store cryptographically junk data regardless of this check, removing it does not weaken the protocol's security invariants — it only improves privacy. Proof field validation (shared_point/challenge/response) is deferred to downstream consumers.
Learnt from: JesseAbram
Repo: sourcenetwork/orbis-rs PR: 142
File: bin/orbis-node/src/dkg/helpers.rs:126-145
Timestamp: 2026-04-28T14:49:17.416Z
Learning: In `bin/orbis-node/src/dkg/helpers.rs`, `load_reshare_ring_payload` intentionally prefers the local `RingIndex` bulletin_post_id over the wire-provided one for old-committee nodes. This is by design: local `RingIndex` and `RingShareBundle` are part of the validity boundary. An old-committee member with a stale or missing local post ID is considered unhealthy; falling back to the wire-supplied post ID would mask stale-state drift rather than surface it. Pure receivers (nodes not in the old committee, with no local index entry) are the only case where the wire `bulletin_post_id` is used as a fallback, which is explicitly allowed.
📚 Learning: 2026-04-28T14:49:17.416Z
Learnt from: JesseAbram
Repo: sourcenetwork/orbis-rs PR: 142
File: bin/orbis-node/src/dkg/helpers.rs:126-145
Timestamp: 2026-04-28T14:49:17.416Z
Learning: In `bin/orbis-node/src/dkg/helpers.rs`, `load_reshare_ring_payload` intentionally prefers the local `RingIndex` bulletin_post_id over the wire-provided one for old-committee nodes. This is by design: local `RingIndex` and `RingShareBundle` are part of the validity boundary. An old-committee member with a stale or missing local post ID is considered unhealthy; falling back to the wire-supplied post ID would mask stale-state drift rather than surface it. Pure receivers (nodes not in the old committee, with no local index entry) are the only case where the wire `bulletin_post_id` is used as a fallback, which is explicitly allowed.

Applied to files:

  • bin/orbis-node/src/dkg/coordinator/phases.rs
📚 Learning: 2026-04-29T16:47:26.051Z
Learnt from: JesseAbram
Repo: sourcenetwork/orbis-rs PR: 147
File: bin/orbis-node/src/store_secret/service.rs:0-0
Timestamp: 2026-04-29T16:47:26.051Z
Learning: In `bin/orbis-node/src/store_secret/service.rs`, the pre-store `verify_encryption` check (which previously passed ring key, effective key, and policy metadata into `validate_encrypted_document`) was intentionally removed as part of PR `#147` (closes issue `#139`). The rationale: running this verification before storing requires exposing ring key and policy metadata to the validation function, which leaks privacy. Since a caller could always store cryptographically junk data regardless of this check, removing it does not weaken the protocol's security invariants — it only improves privacy. Proof field validation (shared_point/challenge/response) is deferred to downstream consumers.

Applied to files:

  • bin/orbis-node/src/dkg/coordinator/phases.rs
🔇 Additional comments (1)
bin/orbis-node/src/dkg/coordinator/phases.rs (1)

39-68: Nice retry isolation.

Attempt-scoped request IDs here are the right shape for avoiding reuse across retries, and the test locks that sequencing in.

Also applies to: 1276-1308

Comment thread bin/orbis-node/src/dkg/coordinator/phases.rs
Comment thread bin/orbis-node/src/dkg/coordinator/phases.rs Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
bin/orbis-node/src/dkg/coordinator/phases.rs (1)

1088-1090: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't fall back to next_peer_ids for the final committee order.

Phase 2 treats reshare_params.new_peer_ids as the sorted new-committee order used for reshare routing, but this branch silently falls back to the wire-order next_peer_ids. If those orders differ, RingConfig and the final RingPayload will use a different node ordering than the share distribution did, which can break signer node IDs during finalization. Fail fast if the sorted list is missing.

Suggested fix
-            let sorted_new_peer_ids = reshare_new_peer_ids
-                .clone()
-                .unwrap_or_else(|| next_peer_ids.clone());
+            let sorted_new_peer_ids = reshare_new_peer_ids.clone().ok_or_else(|| {
+                DkgError::ProtocolError(
+                    "Reshare: missing sorted new-committee peer list".to_string(),
+                )
+            })?;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/orbis-node/src/dkg/coordinator/phases.rs` around lines 1088 - 1090, The
code falls back from reshare_params.new_peer_ids to next_peer_ids when computing
sorted_new_peer_ids, which can cause RingConfig/RingPayload ordering mismatches;
change the logic in the phase where sorted_new_peer_ids is computed so that if
reshare_new_peer_ids (reshare_params.new_peer_ids) is None you return an error
(fail fast) instead of using next_peer_ids, ensuring RingConfig and RingPayload
use the same ordering as the share distribution; update error propagation in the
surrounding function (where sorted_new_peer_ids, reshare_new_peer_ids,
next_peer_ids, RingConfig, and RingPayload are referenced) to return a
descriptive Err when the sorted list is missing.
🧹 Nitpick comments (1)
bin/orbis-node/src/dkg/coordinator/phases.rs (1)

711-717: Use the same session-removal path in the background cleanup task.

This cleanup task calls app_state.dkg_session_state.remove_session() directly while error paths throughout the file use coord.remove_session(). Since DkgCoordinator::remove_session() is a thin pass-through wrapper with no extra logic, both are functionally equivalent. However, for consistency and maintainability, route this through the coordinator abstraction like the rest of the codebase.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/orbis-node/src/dkg/coordinator/phases.rs` around lines 711 - 717, The
cleanup path should call the coordinator's removal helper instead of touching
the session state directly: replace the direct call to
app_state.dkg_session_state.remove_session(&session_id).await with the
coordinator wrapper (e.g. coord.remove_session(session_id).await or
coord.remove_session(&session_id).await matching the existing
DkgCoordinator::remove_session signature) and keep the preceding unmark_ring_pss
logic as-is so the task uses the same coordinator abstraction as the rest of the
file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@bin/orbis-node/src/dkg/coordinator/phases.rs`:
- Around line 1088-1090: The code falls back from reshare_params.new_peer_ids to
next_peer_ids when computing sorted_new_peer_ids, which can cause
RingConfig/RingPayload ordering mismatches; change the logic in the phase where
sorted_new_peer_ids is computed so that if reshare_new_peer_ids
(reshare_params.new_peer_ids) is None you return an error (fail fast) instead of
using next_peer_ids, ensuring RingConfig and RingPayload use the same ordering
as the share distribution; update error propagation in the surrounding function
(where sorted_new_peer_ids, reshare_new_peer_ids, next_peer_ids, RingConfig, and
RingPayload are referenced) to return a descriptive Err when the sorted list is
missing.

---

Nitpick comments:
In `@bin/orbis-node/src/dkg/coordinator/phases.rs`:
- Around line 711-717: The cleanup path should call the coordinator's removal
helper instead of touching the session state directly: replace the direct call
to app_state.dkg_session_state.remove_session(&session_id).await with the
coordinator wrapper (e.g. coord.remove_session(session_id).await or
coord.remove_session(&session_id).await matching the existing
DkgCoordinator::remove_session signature) and keep the preceding unmark_ring_pss
logic as-is so the task uses the same coordinator abstraction as the rest of the
file.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 16eedcb5-05eb-4bb6-8a9e-bfb63dc29d52

📥 Commits

Reviewing files that changed from the base of the PR and between e70cf78 and 402563e.

📒 Files selected for processing (2)
  • bin/orbis-node/src/constants.rs
  • bin/orbis-node/src/dkg/coordinator/phases.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: lint
  • GitHub Check: build
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-04-28T14:49:17.416Z
Learnt from: JesseAbram
Repo: sourcenetwork/orbis-rs PR: 142
File: bin/orbis-node/src/dkg/helpers.rs:126-145
Timestamp: 2026-04-28T14:49:17.416Z
Learning: In `bin/orbis-node/src/dkg/helpers.rs`, `load_reshare_ring_payload` intentionally prefers the local `RingIndex` bulletin_post_id over the wire-provided one for old-committee nodes. This is by design: local `RingIndex` and `RingShareBundle` are part of the validity boundary. An old-committee member with a stale or missing local post ID is considered unhealthy; falling back to the wire-supplied post ID would mask stale-state drift rather than surface it. Pure receivers (nodes not in the old committee, with no local index entry) are the only case where the wire `bulletin_post_id` is used as a fallback, which is explicitly allowed.

Applied to files:

  • bin/orbis-node/src/constants.rs
  • bin/orbis-node/src/dkg/coordinator/phases.rs
📚 Learning: 2026-04-29T16:47:26.051Z
Learnt from: JesseAbram
Repo: sourcenetwork/orbis-rs PR: 147
File: bin/orbis-node/src/store_secret/service.rs:0-0
Timestamp: 2026-04-29T16:47:26.051Z
Learning: In `bin/orbis-node/src/store_secret/service.rs`, the pre-store `verify_encryption` check (which previously passed ring key, effective key, and policy metadata into `validate_encrypted_document`) was intentionally removed as part of PR `#147` (closes issue `#139`). The rationale: running this verification before storing requires exposing ring key and policy metadata to the validation function, which leaks privacy. Since a caller could always store cryptographically junk data regardless of this check, removing it does not weaken the protocol's security invariants — it only improves privacy. Proof field validation (shared_point/challenge/response) is deferred to downstream consumers.

Applied to files:

  • bin/orbis-node/src/dkg/coordinator/phases.rs
🔇 Additional comments (1)
bin/orbis-node/src/constants.rs (1)

287-300: These reshare timing constants fit the new flow well.

The retry window and bulletin-confirmation window are documented clearly, and centralizing them here makes the new Phase 4 behavior easier to tune.

Comment thread bin/orbis-node/src/dkg/coordinator/phases.rs
@JesseAbram JesseAbram merged commit aa42cbe into develop Apr 30, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant