[Clj parity PR 0] Per-discrepancy test infrastructure by jucor · Pull Request #2401 · compdemocracy/polis

jucor · 2026-02-25T23:52:40Z

Summary

Stacked on #2394 (discrepancy analysis docs). Please review and merge #2396 → #2397 → #2398 → #2399 → #2400 → #2393 → #2394 first.

To see only the diff added by this PR (excluding previous commits): compare kmeans_analysis_docs...series-of-fixes

First PR in the discrepancy fix series described in docs/PLAN_DISCREPANCY_FIXES.md. Sets up TDD test infrastructure: all discrepancy tests are xfail now and will be un-xfailed as each fix lands in subsequent PRs.

Changes

tests/test_discrepancy_fixes.py — 30 tests across 12 classes, one per discrepancy (D2, D4, D5, D6, D7, D8, D9, D10, D11, D12, D15) plus synthetic edge cases. Parametrized by all datasets with Clojure reference blobs.
docs/JOURNAL_OF_DISCREPANCIES_FIX.md — Tracking document with initial baseline, test results, Clojure blob structure, and design decisions.

Test results

test_discrepancy_fixes.py: 5 passed, 2 skipped, 18 xfailed, 5 xpassed

5 passed: Clojure formula sanity checks + blob consistency checks
2 skipped: D15 moderation (no moderated comments in vw/biodiversity)
18 xfailed: All discrepancy tests correctly fail before fixes
5 xpassed: D2 on vw (small dataset, thresholds coincide), D9 repness_not_empty, D6 (small difference) — all strict=False

Test plan

test_discrepancy_fixes.py runs green (all failures are xfail)
No impact on existing test_legacy_clojure_regression.py
Reviewer: verify xfail reasons reference correct discrepancy IDs

🤖 Generated with Claude Code

Copilot

Pull request overview

This PR advances the Python↔Clojure parity effort by expanding the regression/testing infrastructure, aligning clustering data structures toward Clojure’s two-level hierarchy, and adding supporting CLI tooling and documentation for comparison/analysis workflows.

Changes:

Update clustering and serialization to support hierarchical clustering outputs (base-clusters folded + group-clusters referencing base cluster IDs) and adjust downstream computations/tests accordingly.
Improve test execution ergonomics: dataset filtering (--datasets), xdist loadgroup grouping, and updated smoke/regression tests.
Add/extend parity tooling & docs: Clojure comparison CLI, PCA benchmark script, and extensive discrepancy analysis/journal documentation.

Reviewed changes

Copilot reviewed 51 out of 55 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
delphi/tests/test_repness_smoke.py	Switch dataset parametrization helper and use unfolded clusters for repness calls.
delphi/tests/test_regression.py	Enable PCA sign-flip tolerance in comparer + add unit tests for comparer behavior.
delphi/tests/test_pipeline_integrity.py	Use dataset param helper for grouped parallel runs.
delphi/tests/test_pca_unit.py	Remove legacy PCA tests and add NaN/edge-case coverage around mean-fill behavior.
delphi/tests/test_pca_smoke.py	Switch to dataset param helper for grouped parallel runs.
delphi/tests/test_pca_edge_cases.py	Simplify to focus on `pca_project_dataframe` robustness.
delphi/tests/test_math_pipeline_runs_e2e.py	Update mock dataset path.
delphi/tests/test_legacy_repness_comparison.py	Switch dataset parametrization helper.
delphi/tests/test_legacy_clojure_output.py	Remove legacy script-style test file.
delphi/tests/test_golden_data.py	Remove redundant golden data comparison test.
delphi/tests/test_edge_cases.py	Update repness expectation for insufficient-data case.
delphi/tests/test_datasets.py	Update DatasetInfo tests for cold-start blob support.
delphi/tests/test_conversation_smoke.py	Switch dataset parametrization helper.
delphi/tests/test_conversation.py	Skip recompute in tests that only validate sorting/moderation; adjust synthetic votes to meet thresholds; filterwarnings for PCA warnings.
delphi/tests/test_clusters.py	Update expectations for `init_clusters` when `k > n_points`.
delphi/tests/legacy_compare_with_clojure.py	Remove legacy comparison script.
delphi/tests/conftest.py	Add `--datasets` option, xdist grouping helpers, and collection filtering.
delphi/tests/README.md	Document dataset selection + parallel execution options.
delphi/scripts/regression_recorder.py	Add `--include-local` and pass through to dataset discovery.
delphi/scripts/regression_comparer.py	Add CLI options (ignore PCA sign flip, include-local, tolerance knobs) and pass through to comparer.
delphi/scripts/clojure_comparer.py	New CLI to compare Python outputs against Clojure math blobs.
delphi/pyproject.toml	Add Click dependency; set pytest addopts to `--dist=loadgroup`; add marker documentation.
delphi/polismath/regression/datasets.py	Add cold-start blob handling and prefer-cold-start selection in `get_dataset_files`.
delphi/polismath/regression/init.py	Re-export Clojure comparer utilities.
delphi/polismath/pca_kmeans_rep/clusters.py	Switch init strategy to “first k distinct points” and add sklearn-based kmeans/silhouette helpers.
delphi/polismath/conversation/conversation.py	Implement two-level clustering flow; add unfolding helpers and fold/unfold base clusters; adjust repness/participant info/group votes to use unfolded groups.
delphi/polismath/benchmarks/bench_pca.py	New PCA benchmarking/profiling helper.
delphi/docs/SUBGROUP_CLUSTERING_THIRD_LEVEL.md	Document (not implement) Clojure’s subgroup clustering level.
delphi/docs/CLOJURE_TWO_LEVEL_CLUSTERING.md	Document Clojure clustering hierarchy and formats.
delphi/docs/CLOJURE_COMPARISON.md	Document Clojure comparison testing and known differences.
delphi/docs/CLJ-PARITY-FIXES-JOURNAL.md	Add parity fixes journal and baseline tracking.
delphi/CLAUDE.md	Update documentation warning and add testing/regression guidance.
deep-analysis-for-julien/09-fix-plan.md	Add fix plan for parity work.
deep-analysis-for-julien/08-dead-code.md	Add dead-code analysis.
deep-analysis-for-julien/07-discrepancies.md	Add discrepancy catalog.
deep-analysis-for-julien/06-comment-routing.md	Add comment routing analysis.
deep-analysis-for-julien/05-participant-filtering.md	Add participant filtering / priority analysis.
deep-analysis-for-julien/04-repness-analysis.md	Add repness analysis.
deep-analysis-for-julien/03-clustering-analysis.md	Add clustering analysis.
deep-analysis-for-julien/02-pca-analysis.md	Add PCA analysis.
deep-analysis-for-julien/01-overview-and-architecture.md	Add overall architecture overview.

Comments suppressed due to low confidence (1)

delphi/scripts/clojure_comparer.py:268

This Click CLI attempts to signal failures by return 1, but Click commands run in standalone mode by default and won’t use the callback’s return value as the process exit code. As written, the script will typically exit with status 0 even when comparisons fail. Use raise SystemExit(1), ctx.exit(1), or follow the pattern used in scripts/regression_comparer.py (call exit(1) / exit(0) explicitly), or call main(standalone_mode=False) and sys.exit the returned code.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-03T15:50:54Z

delphi/tests/conftest.py

+def _extract_dataset_from_test(item) -> str | None:
+    """Extract dataset name from test item's parameter, if present."""
+    # Check for parametrized marker with dataset/dataset_name parameter
+    for marker in item.iter_markers("parametrize"):
+        argnames = marker.args[0] if marker.args else ""
+        if "dataset" in argnames:
+            # Get the parameter value from callspec
+            if hasattr(item, 'callspec'):
+                for param_name in ['dataset', 'dataset_name']:
+                    if param_name in item.callspec.params:
+                        return item.callspec.params[param_name]
+    return None


--datasets filtering won’t apply to tests parametrized with dataset_name only. _extract_dataset_from_test only considers parametrizations whose argnames contain the substring "dataset", so @pytest.mark.parametrize("dataset_name", ...) will return None and those tests won’t be deselected. Consider checking for both dataset and dataset_name in argnames, or simpler: if item.callspec exists, read item.callspec.params.get('dataset') / .get('dataset_name') directly without relying on the parametrize marker.

Copilot · 2026-03-03T15:50:55Z

delphi/polismath/pca_kmeans_rep/clusters.py

 import random
 from copy import deepcopy
+from sklearn.cluster import KMeans
+from sklearn.metrics import silhouette_score, pairwise_distances


pairwise_distances is imported from sklearn.metrics but never used in this module. Please remove it to avoid unused-import lint noise and reduce import overhead.

Suggested change

from sklearn.metrics import silhouette_score, pairwise_distances

from sklearn.metrics import silhouette_score

Copilot · 2026-03-03T15:50:55Z

delphi/scripts/clojure_comparer.py

+    from polismath.regression.clojure_comparer import ClojureComparer, load_clojure_math_blob
+    from polismath.regression import list_available_datasets, get_dataset_files
+    from polismath.conversation import Conversation
+    from tests.common_utils import load_votes
+


This script imports load_votes from tests.common_utils. That makes the CLI depend on test-only code and will break if delphi is installed/distributed without the tests/ tree. Prefer moving the CSV-loading helper into a non-test module (e.g., polismath.regression or polismath.benchmarks.benchmark_utils) or duplicating a small loader locally in this script.

Copilot · 2026-03-03T15:50:55Z

delphi/scripts/clojure_comparer.py

+                click.echo(f"✗ {dataset}: FAIL")
+                click.echo(f"\nLikely cause: Initialization algorithm difference")
+                click.echo(f"  Python: K-means++ (seed 42)")
+                click.echo(f"  Clojure: First k distinct points")
+            click.echo(f"{'='*60}")


The failure explanation printed here says Python uses “K-means++ (seed 42)”, but the current clustering implementation uses first-k-distinct initialization (matching Clojure) via use_first_k_init=True in kmeans_sklearn(). This message should be updated so the CLI output doesn’t mislead debugging.

Copilot · 2026-03-03T15:50:55Z

delphi/polismath/conversation/conversation.py

+        # Legacy field for backward compatibility (remove after full migration)
        result['group_clusters'] = self.group_clusters


to_dict() now serializes group-clusters (and the supposedly “legacy” group_clusters) with members as base-cluster IDs rather than participant IDs. There is existing Python code that derives participant→group assignments by iterating group-clusters[*].members and treating them as participant IDs (e.g. umap_narrative/polismath_commentgraph/utils/group_data.py). To avoid breaking downstream consumers, consider either (a) keeping group_clusters as the unfolded (participant-id) representation, or (b) adding an explicit unfolded field (e.g. group-clusters-unfolded) and updating any in-repo consumers accordingly.

Suggested change

# Legacy field for backward compatibility (remove after full migration)

result['group_clusters'] = self.group_clusters

# Legacy field for backward compatibility:

# expose group_clusters with members as participant IDs (unfolded).

legacy_group_clusters = deepcopy(self.group_clusters) if self.group_clusters else None

if legacy_group_clusters and self.base_clusters:

# Build mapping: base-cluster ID -> list of participant IDs

base_to_participants = {}

for pid, base_cid in self.base_clusters.items():

base_to_participants.setdefault(base_cid, []).append(pid)

# Replace group members (base-cluster IDs) with participant IDs

for group in legacy_group_clusters:

members = group.get('members', []) or []

unfolded_members: List[Any] = []

for base_cid in members:

unfolded_members.extend(base_to_participants.get(base_cid, []))

group['members'] = unfolded_members

result['group_clusters'] = legacy_group_clusters

…arker - Remove substring-based _extract_dataset_from_test and pytest_collection_modifyitems filtering - Remove get_available_dataset_params() (import-time evaluation) - Add @pytest.mark.use_discovered_datasets marker for dynamic parametrization that respects --datasets and --include-local - Hardcoded @pytest.mark.parametrize tests are no longer affected by --datasets CLI flag - Unify parameter name to dataset_name across all test files - Add test_dataset_selection.py with unit and pytester integration tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Cluster centers are derived from PCA projections, so they inherit the sign ambiguity. This fix ensures sign flips are detected and corrected for `.center` paths in addition to `.pca.comps` and `.proj.` paths. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Previously, sign flips were detected per-projection-vector, which fails when only some components are flipped (e.g., PC1 unchanged, PC2 flipped). Now detects flips at the component level (.pca.comps[N]) and stores them, then applies per-dimension correction to projections and cluster centers. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

… tests TLDR: it works! Differences were just numerical errors, artificially looking huge on small values. Looking at the whole cloud of projections confirmed the post sk-learn matches the pre-sklearn results perfectly well. Problem: -------- The regression comparer was failing on datasets like FLI and pakistan with thousands of "differences" showing relative errors up to 874%. Investigation revealed these were false positives: the high relative errors occurred on near-zero values (e.g., golden=4.54e-06 vs current=4.42e-05) where even tiny absolute differences (3e-04) produce huge relative errors. Diagnosis: ---------- Comparing sklearn SVD-based PCA against power iteration golden snapshots: - The projection point clouds are visually identical (see scatter plots) - Important values (Q70-Q100 percentile) match within 0.2% - Only near-zero values (Q0-Q7 percentile, ~0.0x median) show large rel errors - These near-zero values represent participants at the origin who don't affect visualization or clustering The element-wise (abs_tol, rel_tol) approach fundamentally cannot handle this case: it either fails on small values or is too loose for large values. Solution: --------- Added projection comparison metrics that measure what actually matters: | Metric | Threshold | What it measures | |-----------------------|-----------|-------------------------------------| | Max |error| / range | < 1% | Worst displacement as % of axis | | Mean |error| / range | < 0.1% | Average displacement as % of axis | | R² (all coordinates) | > 0.9999 | Variance explained (99.99%) | | R² (per dimension) | > 0.999 | Per-PC fit quality (99.9%) | | Procrustes disparity | < 1e-4 | Shape similarity after alignment | Results for FLI dataset: - Max |error| / range: 0.0617% (was flagging 28% rel error on Q1 values) - R²: 0.9999992 - Procrustes: 8.2e-07 All 7 local datasets now pass regression tests. Also added: - Quantile context in error reports (computed from ALL values, not just failures) - Explanation when element-wise diffs exist but metrics confirm match

- Exclude .pca.center from PCA sign-flip handling (not sign-ambiguous) - Use AND logic for overall_match (don't let projection metrics override stage failures) - Guard against division by zero when projection data_range is 0 - Fix return type annotation on _log_projection_metrics Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Note: we will update the golden records in a later commit.

I took care to keep the "sparsity-aware" scaling, computed in a vectorized way.

It's not needed anymore: the PCA is done by Sklearn, and the edge cases are already handled by wrapped_pca caller, i.e. pca_project_dataframe. Less code, less complexity, less maintenance.

Re-recorded golden snapshots using the new sklearn SVD-based PCA. These snapshots now serve as the baseline for regression tests.

- test_basic_outputs: xfail for D9/D5/D7 (wrong z-score thresholds and repness formulas produce empty comment_repness) - test_group_clustering: xfail for D2/D3 (wrong participant threshold and missing k-smoother produce different cluster counts) - test_comment_priorities: update xfail reason to reference D12 - Re-record golden snapshots after two-level clustering changes Test baseline: 11 passed, 6 xfailed, 0 failures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot review fixes: - clusters.py: adjust k down when fewer distinct points than requested - conversation.py: group_clusters members are base-cluster IDs (not participants) - pca.py: fallback comps shape uses min(n_comps, n_cols) not min(n_comps, 2) Bug fix: - conversation.py: convert pandas Index to list in repness (JSON serialization) Test fixes: - test_clusters: init_clusters returns empty members - test_conversation: test_recompute uses 10 comments to meet vote threshold - test_datasets: add missing has_comments arg and cold_start blob fixture - test_edge_cases: update group_repness expectation for no-clusters case - test_repness_smoke: mark test_repness_structure as xfail (pre-existing) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…am consumers Group clusters store base-cluster IDs in 'members' (matching Clojure's two-level clustering architecture), but downstream functions (conv_repness, participant_stats, group_votes) need participant IDs to join against the vote matrix. Add _unfolded_group_clusters() helper (equivalent to Clojure's clusters/group-members) and use it in all 5 call sites: - _compute_repness - _compute_participant_info - _compute_group_votes - to_dict group-votes - to_dynamo_dict group_votes Also re-record golden snapshots and remove xfail from test_repness_structure (now passes with correct unfolding). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- test_group_clustering: unfold Python clusters via _unfolded_group_clusters() (both sides now use two-level clustering), tighten thresholds to 0.99 Jaccard / 0.01 distribution tolerance, require overall_match - test_comment_priorities: require exact match (1e-6 tolerance) for all comment IDs instead of 70% at 20% tolerance - clojure_comparer: fix docstring to reflect Python also uses two-level clustering Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

to_dict(), get_full_data(), and to_dynamo_dict() were writing self.group_clusters directly, whose 'members' contain base-cluster IDs (integers 0..N). Downstream consumers (group_data.py, Clojure compat, client apps) expect participant IDs. The internal helper _unfolded_group_clusters() already existed and was used for repness/participant_info/group_votes computation, but the five serialization sites were missed. Fix all five sites to unfold before serializing: - to_dict(): group-clusters, group_clusters, base-clusters - get_full_data(): group_clusters - to_dynamo_dict(): base_clusters / group_clusters Add test_serialization_unfolding.py (TDD: 6 tests fail on the bug, 8/8 pass after fix) using real recompute() pipeline output. Re-record golden snapshots to reflect the corrected output. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Comprehensive 9-document analysis covering: - Architecture and data flow comparison - PCA implementation details (power iteration vs SVD) - Two-level clustering with k-smoother analysis - Representativeness metrics with formula verification - Participant filtering and comment priority systems - TypeScript comment routing (prioritized vs topical) - 15 identified discrepancies rated by severity - Dead code inventory across Python codebase - Prioritized fix plan to bring Python to Clojure parity Key findings: 5 CRITICAL discrepancies (in-conv threshold, proportion test formula, repness metric, z-score thresholds, missing comment priorities), 3 HIGH, and 7 MEDIUM/LOW. https://claude.ai/code/session_01FEPFmVHKz1eoqzvXSTmu14

- Doc 02: Correct Clojure sparsity-aware projection code to match actual implementation (raw votes with nils skipped, not imputed). Add new discrepancy D1b documenting subtle projection input difference. - Doc 03: Fix same-clustering convergence check description - it checks every pairwise distance < threshold, not sum. - Doc 07: Add D1b projection input discrepancy (LOW severity). https://claude.ai/code/session_01FEPFmVHKz1eoqzvXSTmu14

- Remove AGENTS.md and CLAUDE.md/GEMINI.md symlinks - Create delphi/CLAUDE.md as a real file with upstream content plus testing, regression, and sklearn migration sections - Add outdated docs warning - Move uv-specific instructions to CLAUDE.local.md (gitignored) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Comprehensive plan covering all 15 documented discrepancies (D1-D15) between the Delphi Python math pipeline and the Clojure reference implementation. Fixes are ordered by pipeline execution order with one PR per fix, stacked for reviewability. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Create tests/test_discrepancy_fixes.py with 30 tests across 12 classes, one per discrepancy (D2, D4, D5, D6, D7, D8, D9, D10, D11, D12, D15) plus synthetic edge cases. All discrepancy tests are xfailed — designed to fail before each fix and pass after. Parametrized by all datasets with Clojure reference blobs. Current results: 5 passed, 2 skipped, 18 xfailed, 5 xpassed. Create docs/JOURNAL_OF_DISCREPANCIES_FIX.md with initial baseline, test results, Clojure blob structure notes, and design decisions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Instead of erroring when the math worker is running, automatically pause it during generation and resume it afterwards. This prevents the race condition (existing worker picking up fake conversation votes) without requiring manual intervention. Also give each poller container a predictable name and force-remove it on exit to prevent orphans. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Move pause/unpause into per-dataset processing so each run handles it independently. Use a marker file (/tmp/polis-math-coldstart-paused) to track whether we caused the pause, and check for sibling coldstart containers before unpausing. Only the last concurrent run to finish unpauses the math worker, and only if it was paused by us (not manually by the user). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Remove unused `pairwise_distances` import from clusters.py - Move `load_votes` import from test code to `benchmark_utils` module so the clojure_comparer script doesn't depend on test-only code - Fix outdated "K-means++" references — Python now uses first-k-distinct initialization matching Clojure Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Document the 10 xpassed tests (all strict=False): - D2 in-conv × 2 on vw (thresholds coincide on small dataset) - D6 two_prop_test × 1 (pseudocount diff too small) - D9 repness_not_empty × 7 (test too weak — checks non-empty, not correct count. TODO: tighten when fixing D9) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jucor · 2026-03-05T13:32:24Z

Superseded by #2420 (moved head branch to upstream for simpler stacked PR workflow).

jucor changed the title ~~PR 0: Per-discrepancy test infrastructure for Python-Clojure parity fixes~~ [Clj parity] Per-discrepancy test infrastructure Feb 26, 2026

jucor force-pushed the series-of-fixes branch from 5bf8859 to 4056619 Compare February 26, 2026 00:12

jucor changed the title ~~[Clj parity] Per-discrepancy test infrastructure~~ [Clj parity PR 0] Per-discrepancy test infrastructure Feb 26, 2026

jucor force-pushed the series-of-fixes branch 6 times, most recently from f9aaa5a to d2ddb21 Compare February 27, 2026 16:14

jucor mentioned this pull request Mar 3, 2026

[Clj parity PR 1] Fix D2: in-conv participant threshold #2408

Closed

5 tasks

jucor requested a review from Copilot March 3, 2026 15:42

Copilot started reviewing on behalf of jucor March 3, 2026 15:42 View session

Copilot AI reviewed Mar 3, 2026

View reviewed changes

jucor force-pushed the series-of-fixes branch 2 times, most recently from a0b0dab to 97087a2 Compare March 4, 2026 13:35

jucor and others added 15 commits March 4, 2026 14:03

Save difference log regardless of errors

9ac2210

Account for flip sign in numerical difference reporting

ef69a1a

Reformat help message

74f7723

Allow for a few components with higher error tolerance

2995aeb

Add PCA benchmark script

e1faa89

Replace handwritten PCA by SKLearn's implementation

003332c

Note: we will update the golden records in a later commit.

Replace projection by Sklearn's fit_transform for better performance.

efed2a3

I took care to keep the "sparsity-aware" scaling, computed in a vectorized way.

Remove unused start_vectors argument

fcc6944

Remove function wrapped_pca

bf2588b

It's not needed anymore: the PCA is done by Sklearn, and the edge cases are already handled by wrapped_pca caller, i.e. pca_project_dataframe. Less code, less complexity, less maintenance.

Update biodiversity and vw golden snapshots with sklearn PCA

80b2e61

Re-recorded golden snapshots using the new sklearn SVD-based PCA. These snapshots now serve as the baseline for regression tests.

jucor and others added 19 commits March 4, 2026 14:04

Interrupt upon Clojure error

c20c2ee

Adapt agents background and support Gemini

a9c0dfa

Add sections

9fdf334

Rename deep analysis folder

5726ccc

Add naming convention

e43b0f9

jucor force-pushed the series-of-fixes branch from 97087a2 to e43b0f9 Compare March 4, 2026 14:04

jucor closed this Mar 5, 2026

jucor deleted the series-of-fixes branch March 5, 2026 12:41

jucor restored the series-of-fixes branch March 5, 2026 12:43

jucor reopened this Mar 5, 2026

jucor changed the base branch from edge to jc/kmeans_analysis_docs March 5, 2026 12:47

This was referenced Mar 5, 2026

[Stack 10/27] Fix D2: in-conv participant threshold + D2c vote count source #2421

Closed

[Stack 9/27] Per-discrepancy test infrastructure #2420

Closed

jucor closed this Mar 5, 2026

jucor deleted the series-of-fixes branch March 5, 2026 14:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Clj parity PR 0] Per-discrepancy test infrastructure#2401

[Clj parity PR 0] Per-discrepancy test infrastructure#2401
jucor wants to merge 47 commits intocompdemocracy:jc/kmeans_analysis_docsfrom
jucor:series-of-fixes

jucor commented Feb 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

jucor commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	from sklearn.metrics import silhouette_score, pairwise_distances
	from sklearn.metrics import silhouette_score

		# Legacy field for backward compatibility (remove after full migration)
		result['group_clusters'] = self.group_clusters

-        # Legacy field for backward compatibility (remove after full migration)
-        result['group_clusters'] = self.group_clusters
+        # Legacy field for backward compatibility:
+        # expose group_clusters with members as participant IDs (unfolded).
+        legacy_group_clusters = deepcopy(self.group_clusters) if self.group_clusters else None
+        if legacy_group_clusters and self.base_clusters:
+            # Build mapping: base-cluster ID -> list of participant IDs
+            base_to_participants = {}
+            for pid, base_cid in self.base_clusters.items():
+                base_to_participants.setdefault(base_cid, []).append(pid)
+            # Replace group members (base-cluster IDs) with participant IDs
+            for group in legacy_group_clusters:
+                members = group.get('members', []) or []
+                unfolded_members: List[Any] = []
+                for base_cid in members:
+                    unfolded_members.extend(base_to_participants.get(base_cid, []))
+                group['members'] = unfolded_members
+        result['group_clusters'] = legacy_group_clusters

Conversation

jucor commented Feb 25, 2026

Summary

Changes

Test results

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

jucor commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants