[Stack 21/27] Fix K-means k divergence: preserve vote-encounter row order#2453
Conversation
d4e2154 to
decac1a
Compare
5438582 to
afcda6d
Compare
- Plan: replaced investigation section with resolved findings, updated checklist (K-inv DONE), added PR #2453 to cross-reference table - Journal: added session 12 entry with investigation methodology, root cause (natsorted row ordering), fix, and cold-start blob results Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
decac1a to
7611c85
Compare
- Plan: replaced investigation section with resolved findings, updated checklist (K-inv DONE), added PR #2453 to cross-reference table - Journal: added session 12 entry with investigation methodology, root cause (natsorted row ordering), fix, and cold-start blob results Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
59ea651 to
084551e
Compare
7611c85 to
f51d33f
Compare
49e8745 to
4def564
Compare
f51d33f to
19a64ef
Compare
4def564 to
9a34efe
Compare
19a64ef to
29283da
Compare
There was a problem hiding this comment.
Pull request overview
This PR fixes cold-start K-means k divergence between Python and Clojure by aligning the participant (row) ordering of the rating matrix with Clojure’s insertion/vote-encounter order, ensuring downstream clustering initialization matches.
Changes:
- Preserve participant row encounter order in
Conversation.update_votes()and when filtering moderated participants in_apply_moderation(). - Update ordering-related unit tests to expect encounter-ordered participant rows while keeping comment columns
natsorted. - Update Clojure-regression clustering test behavior (remove broad xfail; xfail incremental blobs only) and re-record the vw cold-start blob/goldens; add investigation documentation.
Reviewed changes
Copilot reviewed 7 out of 9 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
delphi/polismath/conversation/conversation.py |
Preserves participant row order based on vote stream; keeps columns natsorted; moderation row filtering now order-preserving. |
delphi/tests/test_conversation.py |
Adjusts ordering expectations/tests for participant encounter order (columns still natsorted). |
delphi/tests/test_legacy_clojure_regression.py |
Removes previous xfail and conditionally xfails clustering comparison for incremental blobs. |
delphi/real_data/r6vbnhffkxbd7ifmfbdrd-vw/r6vbnhffkxbd7ifmfbdrd_math_blob_cold_start.json |
Re-recorded vw cold-start blob/golden outputs (including ordering-sensitive downstream values). |
delphi/docs/PLAN_DISCREPANCY_FIXES.md |
Marks K-divergence row-order fix as done and summarizes results. |
delphi/docs/HANDOFF_K_DIVERGENCE_INVESTIGATION.md |
Adds investigation write-up documenting root cause and resolution. |
delphi/docs/CLJ-PARITY-FIXES-JOURNAL.md |
Journals the investigation and fix details/results. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Row order: preserve first-appearance order from votes. | ||
| # | ||
| # Clojure builds the rating matrix incrementally — each new participant | ||
| # gets a row appended in the order they first appear in the vote stream | ||
| # (conversation.clj, named_matrix.clj: NamedMatrix preserves insertion | ||
| # order via IndexHash backed by java.util.Vector). The base-cluster IDs | ||
| # are assigned by map-indexed on this row order, so the order directly | ||
| # determines group-level k-means initialization via first-k-distinct. | ||
| # | ||
| # Using natsort (PID-numeric order) instead would change the k-means | ||
| # seed points and produce different silhouette scores / different k. | ||
| # See delphi/docs/HANDOFF_K_DIVERGENCE_INVESTIGATION.md for the full | ||
| # analysis showing this is the root cause of k divergence on vw. | ||
| new_rows_ordered = [] | ||
| for pid, _, _ in vote_updates: | ||
| if pid in new_rows and pid not in existing_rows_set: | ||
| existing_rows_set.add(pid) | ||
| new_rows_ordered.append(pid) | ||
| all_rows = list(existing_rows) + new_rows_ordered |
| # Incremental blobs are progressive snapshots — in-conv sets differ | ||
| # from single-shot computation, so clustering comparison is not valid. | ||
| if blob_type == 'incremental': | ||
| pytest.xfail("Incremental blobs have different in-conv from single-shot") |
| ], ids=lambda test_desc, *args: test_desc if isinstance(test_desc, str) else str(test_desc)) | ||
| def test_natural_sorting_homogeneous_types(self, test_desc, ptpt_ids, comment_ids, expected_ptpt_types, expected_ptpts_sorted, expected_comment_types, expected_comments_sorted): | ||
| """Test natural sorting with homogeneous ID types (all same type). | ||
| def test_natural_sorting_homogeneous_types(self, test_desc, ptpt_ids, comment_ids, expected_ptpt_types, expected_ptpts_ordered, expected_comment_types, expected_comments_sorted): |
29283da to
9149555
Compare
- Plan: replaced investigation section with resolved findings, updated checklist (K-inv DONE), added PR #2453 to cross-reference table - Journal: added session 12 entry with investigation methodology, root cause (natsorted row ordering), fix, and cold-start blob results Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
e4d490a to
d372197
Compare
9149555 to
2b015fd
Compare
- Plan: replaced investigation section with resolved findings, updated checklist (K-inv DONE), added PR #2453 to cross-reference table - Journal: added session 12 entry with investigation methodology, root cause (natsorted row ordering), fix, and cold-start blob results Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
d372197 to
e10d972
Compare
2b015fd to
176da37
Compare
- Plan: replaced investigation section with resolved findings, updated checklist (K-inv DONE), added PR #2453 to cross-reference table - Journal: added session 12 entry with investigation methodology, root cause (natsorted row ordering), fix, and cold-start blob results Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
e10d972 to
94b9ebd
Compare
176da37 to
13fedd6
Compare
- Plan: replaced investigation section with resolved findings, updated checklist (K-inv DONE), added PR #2453 to cross-reference table - Journal: added session 12 entry with investigation methodology, root cause (natsorted row ordering), fix, and cold-start blob results Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
e07bd52 to
baeccff
Compare
- Plan: replaced investigation section with resolved findings, updated checklist (K-inv DONE), added PR #2453 to cross-reference table - Journal: added session 12 entry with investigation methodology, root cause (natsorted row ordering), fix, and cold-start blob results Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
41a7896 to
7218d03
Compare
42db748 to
c4d3382
Compare
- Plan: replaced investigation section with resolved findings, updated checklist (K-inv DONE), added PR #2453 to cross-reference table - Journal: added session 12 entry with investigation methodology, root cause (natsorted row ordering), fix, and cold-start blob results Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7218d03 to
7bbf0f2
Compare
4e68f68 to
7313bc0
Compare
- Plan: replaced investigation section with resolved findings, updated checklist (K-inv DONE), added PR #2453 to cross-reference table - Journal: added session 12 entry with investigation methodology, root cause (natsorted row ordering), fix, and cold-start blob results Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7bbf0f2 to
90368ee
Compare
7313bc0 to
d8d3f24
Compare
- Plan: replaced investigation section with resolved findings, updated checklist (K-inv DONE), added PR #2453 to cross-reference table - Journal: added session 12 entry with investigation methodology, root cause (natsorted row ordering), fix, and cold-start blob results Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
60965ae to
57b6c56
Compare
2ce0b36 to
19e36d8
Compare
57b6c56 to
4932faa
Compare
19e36d8 to
42a795a
Compare
4932faa to
1a66f28
Compare
Delphi Coverage Report
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 9 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ], ids=lambda test_desc, *args: test_desc if isinstance(test_desc, str) else str(test_desc)) | ||
| def test_natural_sorting_homogeneous_types(self, test_desc, ptpt_ids, comment_ids, expected_ptpt_types, expected_ptpts_sorted, expected_comment_types, expected_comments_sorted): | ||
| """Test natural sorting with homogeneous ID types (all same type). | ||
| def test_natural_sorting_homogeneous_types(self, test_desc, ptpt_ids, comment_ids, expected_ptpt_types, expected_ptpts_ordered, expected_comment_types, expected_comments_sorted): |
There was a problem hiding this comment.
This test now validates encounter-order preservation rather than “natural sorting”, but the function name still refers to natural sorting. Renaming the test to reflect the new behavior would make failures easier to interpret.
| def test_natural_sorting_homogeneous_types(self, test_desc, ptpt_ids, comment_ids, expected_ptpt_types, expected_ptpts_ordered, expected_comment_types, expected_comments_sorted): | |
| def test_homogeneous_types_encounter_order_and_natural_sorting(self, test_desc, ptpt_ids, comment_ids, expected_ptpt_types, expected_ptpts_ordered, expected_comment_types, expected_comments_sorted): |
…nt rows Root cause: Python's natsorted() sorted rating matrix rows by PID, while Clojure's NamedMatrix preserves vote-encounter order (insertion order via java.util.Vector). Different row ordering cascades through base-cluster ID assignment into group-level k-means first-k-distinct initialization, producing different local optima and different silhouette landscapes. On vw: PID-sorted order → k=4 (sil=0.508), encounter order → k=2 (sil=0.487). Clojure blob has k=2. After fix, Python also picks k=2. Changes: - update_votes(): track first-appearance order from vote_updates, append new PIDs in encounter order instead of natsorted - _apply_moderation(): preserve raw_rating_mat row order with list comprehension instead of natsorted - Column (comment ID) ordering remains natsorted — column permutation doesn't affect PCA eigenvalues/vectors Results on cold-start blobs: - vw: k=2 exact match (was k=4), sizes [50,17] exact - biodiversity: k=2 exact match, sizes [81,19] exact - bg2018: k=2 match, sizes close ([52,48] vs [51,49]) - FLI: k=3 vs k=2 — inherent PCA divergence (94.5% NaN sparsity, silhouette gap 0.001), not fixable without replicating power iteration Also: re-recorded vw cold-start blob, golden snapshots, updated ordering tests, removed group_clustering xfail, added investigation script. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Plan: replaced investigation section with resolved findings, updated checklist (K-inv DONE), added PR #2453 to cross-reference table - Journal: added session 12 entry with investigation methodology, root cause (natsorted row ordering), fix, and cold-start blob results Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Superseded by spr-managed PR stack. See the new stack starting at #2508. |
Summary
natsorted()(PID-numeric order) while Clojure's NamedMatrix preserves insertion order — different row ordering cascades into different first-k-distinct initialization seeds for group-level k-meansInvestigation findings
The divergence chain: rating_mat row order → PCA projection order → base-cluster ID assignment → group k-means first-k-distinct init → different local optima → different silhouette landscape → different k.
PCA components are identical (cosine similarity = 1.0), silhouette implementation matches, k-means algorithm matches — only the data ORDER feeding first-k-distinct differed.
Changes
conversation.py:update_votes()preserves vote-encounter order for participant rows instead ofnatsorted()conversation.py:_apply_moderation()preserves row order with list comprehensionnatsorted— doesn't affect clusteringtest_group_clusteringxfailscripts/investigate_k_divergence.pydiagnostic toolCold-start blob results
Test plan
🤖 Generated with Claude Code