compdemocracy · jucor · Mar 11, 2026 · Mar 11, 2026 · Mar 11, 2026 · Mar 11, 2026
diff --git a/delphi/docs/PLAN_DISCREPANCY_FIXES.md b/delphi/docs/PLAN_DISCREPANCY_FIXES.md
@@ -4,13 +4,24 @@
 
 The Delphi Python math pipeline has 15 documented discrepancies with the Clojure reference implementation (see `deep-analysis-for-julien/07-discrepancies.md` and `deep-analysis-for-julien/09-fix-plan.md`). We need to fix them one-by-one with a TDD approach: **first extend the regression test to verify the discrepancy exists, then fix it, then verify the test passes**.
 
-Each fix will be a separate PR to keep reviews manageable. PRs will be **stacked** (each builds on the previous), since fixes are ordered by pipeline execution order — fixing upstream affects downstream. PRs should be clearly labeled as stacked with their dependency chain, so reviewers know the order. We can use `git rebase -i` to clean up commit history before merging to main.
+Each fix will be a separate PR to keep reviews manageable. PRs are **stacked** (each builds on the previous), since fixes are ordered by pipeline execution order — fixing upstream affects downstream. The stack is managed via `.claude/STACK` and the `/pr-stack` skill.
 
-**PR naming convention**: Clojure parity fix PRs use the title prefix `[Clj parity PR N]` (e.g., `[Clj parity PR 0] Per-discrepancy test infrastructure`, `[Clj parity PR 1] Fix D2: in-conv participant threshold`).
+**PR naming**: Titles use `[Stack N/M]` prefix (auto-managed by `.claude/skills/pr-stack/update-stack-titles.sh`). The descriptive part of the title should be self-explanatory.
 
-**Prerequisite**: The current `kmeans_work` branch has changes from `edge`. These should be separated into their own PR(s) first, before we start the discrepancy fix PRs. The discrepancy fix PRs should be based on the cleaned-up branch.
+### Stack ↔ Plan Cross-Reference
 
-**Action**: Before starting any fix, analyze the diff of `kmeans_work` vs `upstream/edge` to understand what's changed. Group those changes into logical PR(s) — e.g., test infrastructure improvements, doc updates, minor bug fixes. Each should be reviewable independently. The discrepancy fix PRs (PR 1+) then stack on top of this clean base.
+The full PR stack includes infrastructure PRs (Stack 1-7) followed by discrepancy fixes.
+This plan's "PR N" labels map to actual GitHub PRs as follows:
+
+| Plan label | GitHub PR | Stack | Title |
+|-----------|-----------|-------|-------|
+| PR 0 (infra) | #2417–#2420 | Stack 1-7 | Test cleanup, clustering, cold-start tooling, analysis docs |
+| PR 1 (D2) | #2421 | Stack 8/10 | Fix D2: in-conv participant threshold + D2c vote count source |
+| PR 2 (D4) | #2435 | Stack 9/10 | Fix D4: pseudocount formula |
+| (perf) | #2436 | Stack 10/10 | Speed up regression tests |
-| PR 1 (D2) | #2421 | Stack 8/10 | Fix D2: in-conv participant threshold + D2c vote count source |
-| PR 2 (D4) | #2435 | Stack 9/10 | Fix D4: pseudocount formula |
-| (perf) | #2436 | Stack 10/10 | Speed up regression tests |
+| PR 1 (D2) | #2421 | Stack 8 | Fix D2: in-conv participant threshold + D2c vote count source |
+| PR 2 (D4) | #2435 | Stack 9 | Fix D4: pseudocount formula |
+| (perf) | #2436 | Stack 10 | Speed up regression tests |
-| PR 1 (D2) | #2421 | Stack 8/10 | Fix D2: in-conv participant threshold + D2c vote count source |
-| PR 2 (D4) | #2435 | Stack 9/10 | Fix D4: pseudocount formula |
-| (perf) | #2436 | Stack 10/10 | Speed up regression tests |
+| PR 1 (D2) | #2421 | Stack 8 | Fix D2: in-conv participant threshold + D2c vote count source |
+| PR 2 (D4) | #2435 | Stack 9 | Fix D4: pseudocount formula |
+| (perf) | #2436 | Stack 10 | Speed up regression tests |
+| PR 3 (D9) | — | — | *Next: Fix D9 z-score thresholds* |
+
+Future fix PRs will be appended to the stack as they're created.
 
 ### Session Continuity
 
@@ -38,6 +49,7 @@ Because this work will span multiple Claude Code sessions, we maintain:
 - **All datasets, not just biodiversity**: Every fix must pass on ALL datasets. biodiversity is just one reference among many.
 - **Synthetic edge-case tests**: Every time we discover an edge case specific to one conversation, extract it into a synthetic unit test with made-up data (never real data from private datasets). These run fast and document the intent clearly.
 - **E2E awareness**: GitHub Actions has Cypress E2E tests (`cypress-tests.yml`) testing UI workflows, and `python-ci.yml` running pytest regression. The Cypress tests don't test math output values directly, but `python-ci.yml` will break if clustering/repness changes. Formula-level fixes (D4, D5, D6, D7, D8, D9) are pure computation — no E2E risk. Selection logic changes (D10, D11) and priority computation (D12) could affect what the TypeScript server returns. We decide case-by-case which PRs need E2E verification.
+- **Remove dead code after replacement**: When a function is replaced by a new implementation (e.g. vectorized version), the old function must be deleted and all callers updated — not left as dead code. Do this in the same PR or a follow-up, after benchmarks and tests confirm the replacement works.
 
 ### Datasets Available (sorted by size, smallest first)
 
@@ -399,27 +411,33 @@ By this point, we should have good test coverage from all the per-discrepancy te
 
 ## Discrepancy Coverage Checklist
 
-| ID | Discrepancy | PR | Status |
-|----|-------------|-----|--------|
-| D1 | PCA sign flips | PR 13 | Fix (sign consistency) |
-| D1b | Projection input | PR 13 | Fix with D1 |
-| D2 | In-conv threshold | **PR 1** | **DONE** ✓ |
-| D2b | Base-cluster sort order | **PR 1** | **DONE** ✓ |
-| D2c | Vote count source (raw vs filtered matrix) | **PR 1** | **DONE** ✓ |
-| D2d | In-conv monotonicity (once in, always in) | **PR 1** | **DONE** ✓ (5 guard tests, T1-T5) |
-| D3 | K-smoother buffer | PR 10 | Fix |
-| D4 | Pseudocount formula | **PR 2** | **DONE** ✓ |
-| D5 | Proportion test | PR 4 | Fix |
-| D6 | Two-proportion test | PR 5 | Fix |
-| D7 | Repness metric | PR 6 | Fix (with flag for old formula) |
-| D8 | Finalize cmt stats | PR 7 | Fix |
-| D9 | Z-score thresholds | **PR 3** | Fix |
-| D10 | Rep comment selection | PR 8 | Fix (with legacy env var) |
-| D11 | Consensus selection | PR 9 | Fix (with legacy env var) |
-| D12 | Comment priorities | PR 11 | Fix (implement from scratch) |
-| D13 | Subgroup clustering | — | **Deferred** (unused) |
-| D14 | Large conv optimization | — | **Deferred** (Python fast enough) |
-| D15 | Moderation handling | PR 12 | Fix |
+| ID | Discrepancy | Plan PR | GitHub PR | Status |
+|----|-------------|---------|-----------|--------|
+| D1 | PCA sign flips | PR 13 | — | Fix (sign consistency) |
+| D1b | Projection input | PR 13 | — | Fix with D1 |
+| D2 | In-conv threshold | **PR 1** | **#2421** | **DONE** ✓ |
+| D2b | Base-cluster sort order | **PR 1** | **#2421** | **DONE** ✓ |
+| D2c | Vote count source (raw vs filtered matrix) | **PR 1** | **#2421** | **DONE** ✓ |
+| D2d | In-conv monotonicity (once in, always in) | **PR 1** | **#2421** | **DONE** ✓ (5 guard tests, T1-T5) |
+| D3 | K-smoother buffer | PR 10 | — | Fix |
+| D4 | Pseudocount formula | **PR 2** | **#2435** | **DONE** ✓ |
+| D5 | Proportion test | PR 4 | — | Fix |
+| D6 | Two-proportion test | PR 5 | — | Fix |
+| D7 | Repness metric | PR 6 | — | Fix (with flag for old formula) |
+| D8 | Finalize cmt stats | PR 7 | — | Fix |
+| D9 | Z-score thresholds | PR 3 | — | Fix (next) |
+| D10 | Rep comment selection | PR 8 | — | Fix (with legacy env var) |
+| D11 | Consensus selection | PR 9 | — | Fix (with legacy env var) |
+| D12 | Comment priorities | PR 11 | — | Fix (implement from scratch) |
+| D13 | Subgroup clustering | — | — | **Deferred** (unused) |
+| D14 | Large conv optimization | — | — | **Deferred** (Python fast enough) |
+| D15 | Moderation handling | PR 12 | — | Fix |
+
+### Non-discrepancy PRs in the stack
+
+| GitHub PR | Stack | Description |
+|-----------|-------|-------------|
+| #2436 | 10/10 | Speed up regression tests (benchmark off, skip intermediate stages) |
 
 ---
 

diff --git a/delphi/notebooks/run_analysis.py b/delphi/notebooks/run_analysis.py
@@ -45,7 +45,7 @@ def check_environment():
 
 # Import polismath modules
 from polismath.conversation.conversation import Conversation
-from polismath.pca_kmeans_rep.repness import conv_repness, participant_stats
+from polismath.pca_kmeans_rep.repness import conv_repness
 from polismath.pca_kmeans_rep.corr import compute_correlation
 
 def load_votes(votes_path):

diff --git a/delphi/polismath/conversation/conversation.py b/delphi/polismath/conversation/conversation.py
@@ -20,7 +20,7 @@
     kmeans_sklearn,
     calculate_silhouette_sklearn
 )
-from polismath.pca_kmeans_rep.repness import conv_repness, participant_stats
+from polismath.pca_kmeans_rep.repness import conv_repness
 from polismath.pca_kmeans_rep.corr import compute_correlation
 
 
@@ -792,114 +792,101 @@ def _compute_participant_info_optimized(self, vote_matrix: pd.DataFrame, group_c
 
         # OPTIMIZATION 3: Precompute group vote matrices and average votes
 
-        # Precompute group vote matrices and their valid comment masks
-        group_vote_matrices = {}
+        # Precompute group average votes and valid comment masks
         group_avg_votes = {}
         group_valid_masks = {}
-        
+
         for group_id, member_indices in group_member_indices.items():
             if len(member_indices) >= 3:  # Only calculate for groups with enough members
                 # Extract the group vote matrix
                 group_vote_matrix = matrix_values[member_indices, :]
-                group_vote_matrices[group_id] = group_vote_matrix
-
+
                 # Calculate average votes per comment for this group
                 group_avg_votes[group_id] = np.mean(group_vote_matrix, axis=0)
 
                 # Precompute which comments have at least 3 votes from this group
                 group_valid_masks[group_id] = np.sum(group_vote_matrix != 0, axis=0) >= 3
 
-        # OPTIMIZATION 4: Use vectorized operations for participant stats
-        
+        # VECTORIZED: Compute vote counts for ALL participants at once
+
         process_start = time.time()
-        batch_start = time.time()
-
-        for p_idx, participant_id in enumerate(vote_matrix.index):
-            if p_idx >= matrix_values.shape[0]:
+
+        n_agree_all = np.sum(matrix_values > 0, axis=1)    # (N,)
+        n_disagree_all = np.sum(matrix_values < 0, axis=1)  # (N,)
+        n_pass_all = np.sum(matrix_values == 0, axis=1)     # (N,)
+        n_votes_all = n_agree_all + n_disagree_all          # (N,)
+
+        # Mask: participants with at least one real vote
+        has_votes = n_votes_all > 0  # (N,) bool
+
+        # VECTORIZED: Compute per-group correlations for ALL participants at once
+        # Store as {group_id: corr_array} where corr_array is (N,)
+        group_corr_arrays = {}
+
+        for group_id, member_indices in group_member_indices.items():
+            if len(member_indices) < 3 or group_id not in group_avg_votes:
+                # All correlations default to 0.0
+                group_corr_arrays[group_id] = np.zeros(participant_count)
                 continue
-
-            # Print progress for large participant sets
-            if participant_count > 100 and p_idx % 100 == 0:
-                now = time.time()
-                elapsed = now - process_start
-                batch_time = now - batch_start
-                batch_start = now
-                percent = (p_idx / participant_count) * 100
-                logger.info(f"Processed {p_idx}/{participant_count} participants ({percent:.1f}%) - " +
-                           f"Elapsed: {elapsed:.2f}s, Batch: {batch_time:.4f}s")
-
-            # Get participant votes
-            participant_votes = matrix_values[p_idx, :]
-
-            # Count votes using vectorized operations
-            n_agree = np.sum(participant_votes > 0)
-            n_disagree = np.sum(participant_votes < 0)
-            n_pass = np.sum(participant_votes == 0) 
-            n_votes = n_agree + n_disagree
-
-            # Skip participants with no votes
-            if n_votes == 0:
+
+            valid_mask = group_valid_masks[group_id]
+            n_valid = int(np.sum(valid_mask))
+
+            if n_valid < 3:
+                group_corr_arrays[group_id] = np.zeros(participant_count)
                 continue
-
-            # Find participant's group using precomputed mapping
-            participant_group = ptpt_group_map.get(participant_id)
-
-            # OPTIMIZATION 5: Efficient group correlation calculation
-
-            # Calculate agreement with each group - optimized version
-            group_agreements = {}
-
-            for group_id, member_indices in group_member_indices.items():
-                if len(member_indices) < 3:
-                    # Skip groups with too few members
-                    group_agreements[group_id] = 0.0
-                    continue
-
-                if group_id not in group_avg_votes or group_id not in group_valid_masks:
-                    group_agreements[group_id] = 0.0
-                    continue
-
-                # Use precomputed data
-                g_votes = group_avg_votes[group_id]
-                valid_mask = group_valid_masks[group_id]
-
-                if np.sum(valid_mask) >= 3:  # At least 3 valid comments
-                    # Extract only valid comment votes
-                    p_votes = participant_votes[valid_mask]
-                    g_votes_valid = g_votes[valid_mask]
-
-                    # Fast correlation calculation
-                    p_std = np.std(p_votes)
-                    g_std = np.std(g_votes_valid)
-
-                    if p_std > 0 and g_std > 0:
-                        # Use numpy's built-in correlation (faster and more numerically stable)
-                        correlation = np.corrcoef(p_votes, g_votes_valid)[0, 1]
-
-                        if not np.isnan(correlation):
-                            group_agreements[group_id] = correlation
-                        else:
-                            group_agreements[group_id] = 0.0
-                    else:
-                        group_agreements[group_id] = 0.0
-                else:
-                    group_agreements[group_id] = 0.0
-
-            # Store participant stats
+
+            # P: all participants' votes on valid comments — (N, n_valid)
+            P = matrix_values[:, valid_mask]
+            # g: group average on valid comments — (n_valid,)
+            g = group_avg_votes[group_id][valid_mask]
+
+            p_mean = P.mean(axis=1)          # (N,)
+            g_mean = g.mean()                # scalar
+            p_std = P.std(axis=1)            # (N,)
+            g_std = g.std()                  # scalar
+
+            if g_std == 0:
+                group_corr_arrays[group_id] = np.zeros(participant_count)
+                continue
+
+            # Pearson correlation: (mean(P*g) - mean(P)*mean(g)) / (std(P)*std(g))
+            cross_mean = (P @ g) / n_valid   # (N,)
+
+            # np.where evaluates both branches; suppress divide-by-zero for p_std==0
+            with np.errstate(invalid='ignore', divide='ignore'):
+                corr = np.where(
+                    p_std > 0,
+                    (cross_mean - p_mean * g_mean) / (p_std * g_std),
+                    0.0,
+                )
+            corr = np.nan_to_num(corr, nan=0.0)
+            group_corr_arrays[group_id] = corr
+
+        # Assemble result dicts (zero computation — just indexing)
+        group_ids = list(group_member_indices.keys())
+
+        for p_idx, participant_id in enumerate(vote_matrix.index):
+            if not has_votes[p_idx]:
+                continue
+
             result['stats'][participant_id] = {
-                'n_agree': int(n_agree),
-                'n_disagree': int(n_disagree),
-                'n_pass': int(n_pass),
-                'n_votes': int(n_votes),
-                'group': participant_group,
-                'group_correlations': group_agreements
+                'n_agree': int(n_agree_all[p_idx]),
+                'n_disagree': int(n_disagree_all[p_idx]),
+                'n_pass': int(n_pass_all[p_idx]),
+                'n_votes': int(n_votes_all[p_idx]),
+                'group': ptpt_group_map.get(participant_id),
+                'group_correlations': {
+                    gid: float(group_corr_arrays[gid][p_idx])
+                    for gid in group_ids
+                }
             }
-        
+
         total_time = time.time() - start_time
         process_time = time.time() - process_start
         logger.info(f"Participant stats completed in {total_time:.2f}s (preparation: {prep_time:.2f}s, processing: {process_time:.2f}s)")
         logger.info(f"Processed {len(result['stats'])} participants with {len(group_clusters)} groups")
-        
+
         return result
 
     def _compute_participant_info(self) -> None:

diff --git a/delphi/polismath/pca_kmeans_rep/__init__.py b/delphi/polismath/pca_kmeans_rep/__init__.py
@@ -10,14 +10,13 @@
 
 from polismath.pca_kmeans_rep.pca import pca_project_dataframe
 from polismath.pca_kmeans_rep.clusters import cluster_dataframe, Cluster
-from polismath.pca_kmeans_rep.repness import conv_repness, participant_stats
+from polismath.pca_kmeans_rep.repness import conv_repness
 from polismath.pca_kmeans_rep.corr import compute_correlation
 
 __all__ = [
     'pca_project_dataframe',
     'cluster_dataframe',
     'Cluster',
     'conv_repness',
-    'participant_stats',
     'compute_correlation',
 ]