Skip to content

[Stack 3/25] Clojure comparison test consolidation and cleanup#2418

Merged
jucor merged 5 commits intoedgefrom
jc/kmeans_clustering_tooling
Mar 27, 2026
Merged

[Stack 3/25] Clojure comparison test consolidation and cleanup#2418
jucor merged 5 commits intoedgefrom
jc/kmeans_clustering_tooling

Conversation

@jucor
Copy link
Copy Markdown
Collaborator

@jucor jucor commented Mar 5, 2026

Summary

Stacked on #2458 (CI: run server and E2E tests on PRs targeting jc/** branches). Please review and merge #2458 first.
Next in stack: #2431 (Two-level hierarchical clustering matching Clojure architecture)

  • Fix e2e test mock data path and rename run_math_pipeline_test.pytest_math_pipeline_runs_e2e.py to follow pytest conventions
  • Create shared ClojureComparer library (polismath/regression/clojure_comparer.py) for comparing Python vs Clojure math outputs
  • Add CLI comparison tool (scripts/clojure_comparer.py)
  • Consolidate three overlapping test/comparison files into test_legacy_clojure_regression.py
  • Remove broken legacy files: legacy_compare_with_clojure.py, test_legacy_clojure_output.py, compare_implementations.py
  • Add CLOJURE_COMPARISON.md documentation

Test plan

  • 212 passed, 2 skipped, 4 xfailed, 0 failures
  • No code changes to computation logic — test-only and tooling
    🤖 Generated with Claude Code

@jucor jucor force-pushed the jc/kmeans_clustering_tooling branch from e35f31a to ad8a92c Compare March 6, 2026 15:31
@jucor jucor force-pushed the jc/pca_test_cleanup branch from 05eaeae to 359a2dd Compare March 10, 2026 11:12
@jucor jucor force-pushed the jc/kmeans_clustering_tooling branch from ad8a92c to 00c659e Compare March 10, 2026 11:12
@jucor jucor force-pushed the jc/pca_test_cleanup branch from 359a2dd to ecd5b9b Compare March 10, 2026 12:29
@jucor jucor force-pushed the jc/kmeans_clustering_tooling branch from 00c659e to 69cf22e Compare March 10, 2026 12:29
@jucor jucor marked this pull request as draft March 10, 2026 13:10
@jucor jucor force-pushed the jc/kmeans_clustering_tooling branch 2 times, most recently from ecebe3b to ea26a44 Compare March 10, 2026 15:18
@jucor jucor changed the title Two-level clustering, Clojure comparison tooling, and test consolidation Clojure comparison test consolidation and cleanup Mar 10, 2026
@jucor jucor marked this pull request as ready for review March 10, 2026 16:06
@jucor jucor requested review from ballPointPenguin and whilo March 10, 2026 16:08
@jucor jucor changed the title Clojure comparison test consolidation and cleanup [Stack 2/8] Clojure comparison test consolidation and cleanup Mar 10, 2026
@jucor jucor changed the title [Stack 2/8] Clojure comparison test consolidation and cleanup [Stack 2/9] Clojure comparison test consolidation and cleanup Mar 11, 2026
@jucor jucor changed the title [Stack 2/9] Clojure comparison test consolidation and cleanup [Stack 2/10] Clojure comparison test consolidation and cleanup Mar 11, 2026
@jucor jucor changed the title [Stack 2/10] Clojure comparison test consolidation and cleanup [Stack 2/11] Clojure comparison test consolidation and cleanup Mar 11, 2026
@jucor jucor requested a review from Copilot March 13, 2026 12:37
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR continues the test/tooling cleanup around Python-vs-Clojure regression comparisons by consolidating legacy tests, adding a shared comparison utility module, and introducing a CLI to run comparisons outside pytest.

Changes:

  • Fix e2e mock dataset path and align test filename with pytest naming conventions.
  • Add polismath.regression.clojure_comparer (shared comparison utilities) and a scripts/clojure_comparer.py CLI.
  • Remove older overlapping/broken legacy comparison scripts and add documentation for running/understanding the comparisons.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
delphi/tests/test_math_pipeline_runs_e2e.py Updates mock dataset directory used by the e2e pipeline test.
delphi/tests/test_legacy_clojure_regression.py Switches clustering comparison to use the shared comparer and adds a clojure_comparison marker.
delphi/tests/test_legacy_clojure_output.py Removes a legacy “inspect Clojure output” script/test file.
delphi/tests/legacy_compare_with_clojure.py Removes a legacy direct-comparison script.
delphi/scripts/compare_implementations.py Removes an older comparison CLI/tooling script.
delphi/scripts/clojure_comparer.py Adds a new Click-based CLI for Python vs Clojure comparisons.
delphi/pyproject.toml Registers the new clojure_comparison pytest marker.
delphi/polismath/regression/clojure_comparer.py Adds the shared comparer utilities (cluster distribution/membership + projection comparison).
delphi/polismath/regression/init.py Re-exports the new Clojure comparer helpers from the regression package.
delphi/docs/CLOJURE_COMPARISON.md Adds documentation describing the comparison suite and usage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jucor jucor changed the title [Stack 2/11] Clojure comparison test consolidation and cleanup [Stack 2/12] Clojure comparison test consolidation and cleanup Mar 13, 2026
@jucor jucor changed the title [Stack 2/12] Clojure comparison test consolidation and cleanup [Stack 2/13] Clojure comparison test consolidation and cleanup Mar 13, 2026
@jucor jucor changed the title [Stack 2/13] Clojure comparison test consolidation and cleanup [Stack 2/15] Clojure comparison test consolidation and cleanup Mar 16, 2026
@jucor jucor changed the title [Stack 2/15] Clojure comparison test consolidation and cleanup [Stack 2/16] Clojure comparison test consolidation and cleanup Mar 16, 2026
@jucor jucor changed the title [Stack 2/16] Clojure comparison test consolidation and cleanup [Stack 2/17] Clojure comparison test consolidation and cleanup Mar 16, 2026
@jucor jucor changed the title [Stack 2/17] Clojure comparison test consolidation and cleanup [Stack 2/24] Clojure comparison test consolidation and cleanup Mar 17, 2026
@jucor jucor changed the title [Stack 2/24] Clojure comparison test consolidation and cleanup [Stack 2/25] Clojure comparison test consolidation and cleanup Mar 17, 2026
@jucor jucor changed the title [Stack 2/25] Clojure comparison test consolidation and cleanup [Stack 2/24] Clojure comparison test consolidation and cleanup Mar 19, 2026
@jucor jucor changed the title [Stack 2/24] Clojure comparison test consolidation and cleanup [Stack 2/25] Clojure comparison test consolidation and cleanup Mar 19, 2026
Base automatically changed from jc/pca_test_cleanup to edge March 19, 2026 14:59
Copy link
Copy Markdown
Collaborator

@whilo whilo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

Good consolidation PR. The main additions are solid:

** library** is well-designed — Wasserstein distance for cluster distributions, Jaccard similarity for membership, handling of the 8 sign/axis ambiguities in projection comparison. It's not just test scaffolding: PR #2431 extends it with unfold_clojure_group_clusters() for fair two-level clustering comparison, confirming it earns its place as a reusable library. The find_best_cluster_mapping() greedy approach is a reasonable tradeoff over optimal assignment.

Test cleanup (deleting legacy_compare_with_clojure.py, test_legacy_clojure_output.py, compare_implementations.py, test_golden_data.py) removes dead weight. The refactored test_group_clustering using ClojureComparer is cleaner and shares comparison logic with the CLI tool.

test_math_pipeline_runs_e2e.py is rough — the re.search(r'LIMIT (\\d+)') regex intercept and the # FIX: comments scattered throughout signal it's a placeholder. The assertions (len(items) > 0) are minimal smoke tests rather than correctness checks. That said, given Clojure will eventually be removed and this test is explicitly framed as temporary comparison infrastructure, the approach is acceptable. It catches gross failures without requiring a live DB.

One small style note: clojure_comparer.py uses float | None union syntax (Python 3.10+) while the rest of the codebase uses Optional[float]. Minor inconsistency but worth noting for contributors.

Approved — the infrastructure justifies the PR, and the known rough edges in the e2e test are acceptable given the Clojure deprecation trajectory.

whilo
whilo approved these changes Mar 20, 2026
@jucor jucor changed the base branch from edge to jc/ci-stack-coverage March 23, 2026 16:12
@jucor jucor changed the title [Stack 2/25] Clojure comparison test consolidation and cleanup [Stack 3/25] Clojure comparison test consolidation and cleanup Mar 23, 2026
@jucor jucor force-pushed the jc/kmeans_clustering_tooling branch from 8f8c26d to a2d5e3c Compare March 23, 2026 16:15
@jucor jucor requested a review from Copilot March 23, 2026 16:19
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Consolidates and modernizes the Python↔Clojure regression comparison tooling by centralizing comparison logic into a shared library, adding a CLI, and removing legacy scripts/tests.

Changes:

  • Added polismath.regression.clojure_comparer comparison utilities and exposed them via polismath.regression exports.
  • Updated/expanded the consolidated pytest suite and introduced a clojure_comparison marker.
  • Added scripts/clojure_comparer.py, updated E2E mock data path, and removed multiple legacy comparison scripts/files; added docs.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
delphi/tests/test_legacy_clojure_regression.py Switches clustering comparison to shared ClojureComparer utilities and adds a pytest marker.
delphi/tests/test_legacy_clojure_output.py Removes legacy Clojure output analysis script.
delphi/tests/run_math_pipeline_test.py Adjusts mock dataset directory path for E2E pipeline test.
delphi/tests/legacy_compare_with_clojure.py Removes legacy direct comparison script.
delphi/scripts/compare_implementations.py Removes legacy implementation comparison CLI.
delphi/scripts/clojure_comparer.py Adds a new Click-based CLI for Python↔Clojure comparisons.
delphi/pyproject.toml Registers the clojure_comparison pytest marker.
delphi/polismath/regression/clojure_comparer.py Adds shared comparison logic (clusters + projections) including Wasserstein/Jaccard metrics.
delphi/polismath/regression/init.py Re-exports the new comparer utilities as part of the regression package API.
delphi/docs/CLOJURE_COMPARISON.md Documents how to run/interpret comparisons and explains known algorithmic differences.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Base automatically changed from jc/ci-stack-coverage to edge March 24, 2026 18:08
jucor and others added 3 commits March 24, 2026 18:33
Consolidate three overlapping test files into a coherent suite:
- Removed: tests/test_legacy_clojure_output.py (structure analysis)
- Removed: tests/legacy_compare_with_clojure.py (direct comparison script)
- Enhanced: tests/test_legacy_clojure_regression.py (main pytest suite)

Created shared utilities and CLI tool:
- polismath/regression/clojure_comparer.py: Comparison utilities with
  ClojureComparer class, cluster distribution/membership comparison,
  PCA projection comparison with transformations
- scripts/clojure_comparer.py: Interactive CLI tool for detailed
  comparison (similar UX to regression_comparer.py)
- docs/CLOJURE_COMPARISON.md: Comprehensive documentation

Organized following codebase patterns:
- Placed utilities in polismath/regression/ alongside other regression
  utilities (comparer.py, recorder.py, datasets.py)
- Updated imports in consuming files
- Added exports to polismath/regression/__init__.py for clean imports

Added pytest marker:
- @pytest.mark.clojure_comparison for selective test execution
- Can exclude with: pytest -m "not clojure_comparison"

Key discovery:
- Clojure uses two-level clustering (participants→base-clusters→groups)
- Python uses single-level clustering (participants→groups directly)
- This architectural difference explains 0% Jaccard similarity

Tests run and detect differences (as expected until initialization is aligned).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@jucor jucor force-pushed the jc/kmeans_clustering_tooling branch from a2d5e3c to 57d1d33 Compare March 24, 2026 18:35
@jucor jucor requested a review from Copilot March 26, 2026 21:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

jucor and others added 2 commits March 26, 2026 22:06
…r.py)

This script originated as tests/test_real_data_comparison.py (March 2025),
was renamed to scripts/compare_implementations.py in #2282 (Nov 2025),
and has been broken since that rename (SyntaxError: global TOLERANCE
declared after prior use, plus undefined variable on line 486).

Its replacement, scripts/clojure_comparer.py, was introduced earlier in
this PR. It uses the shared ClojureComparer library, auto-discovers
datasets, supports --include-local, and does rigorous cluster comparison
(Jaccard membership + L1 distribution). The one feature
compare_implementations.py had that clojure_comparer.py lacks — multi-
tolerance comment priority comparison — is moot until D12 is implemented,
at which point it should be added to the shared ClojureComparer library.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove unused imports in test_legacy_clojure_regression.py
- Add click as declared dependency in pyproject.toml
- Fix redundant sys.exit with Click standalone mode
- Fix circular import: import from .datasets directly, not package init
- Remove SciPy hard dep: don't re-export clojure_comparer from __init__.py
- Replace Wasserstein distance with L1 distance (Copilot: scipy was misused
  on probability vectors; L1 is well-defined for normalized distributions)
- match_status now requires num_clusters_match and complete mapping
  (Copilot: could false-positive PASS with different cluster counts)
- Add 23 unit tests for compare_cluster_distributions,
  compare_cluster_membership, compare_projections, and
  compute_distribution_similarity with synthetic inputs
- Fix doc file paths in CLOJURE_COMPARISON.md ({report_id}_math_blob.json)
- Fix float|None -> Optional style inconsistency (Christian's review)
- CLI: try/except for test code import with clear error message
- Also: remove stale PYTEST_ADDOPTS="-n auto" from ~/.exports

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jucor jucor force-pushed the jc/kmeans_clustering_tooling branch from 57d1d33 to d41e3b8 Compare March 27, 2026 01:15
@github-actions
Copy link
Copy Markdown

Delphi Coverage Report

File Stmts Miss Cover
init.py 2 0 100%
benchmarks/bench_pca.py 76 76 0%
benchmarks/bench_repness.py 81 81 0%
benchmarks/bench_update_votes.py 38 38 0%
benchmarks/benchmark_utils.py 34 34 0%
components/init.py 1 0 100%
components/config.py 165 133 19%
conversation/init.py 2 0 100%
conversation/conversation.py 1036 353 66%
conversation/manager.py 131 42 68%
database/init.py 1 0 100%
database/dynamodb.py 387 234 40%
database/postgres.py 305 205 33%
pca_kmeans_rep/init.py 5 0 100%
pca_kmeans_rep/clusters.py 233 8 97%
pca_kmeans_rep/corr.py 98 17 83%
pca_kmeans_rep/pca.py 50 15 70%
pca_kmeans_rep/repness.py 361 48 87%
pca_kmeans_rep/stats.py 107 22 79%
regression/init.py 4 0 100%
regression/clojure_comparer.py 157 12 92%
regression/comparer.py 887 406 54%
regression/datasets.py 95 21 78%
regression/recorder.py 36 27 25%
regression/utils.py 137 38 72%
run_math_pipeline.py 260 114 56%
umap_narrative/500_generate_embedding_umap_cluster.py 210 109 48%
umap_narrative/501_calculate_comment_extremity.py 112 54 52%
umap_narrative/502_calculate_priorities.py 135 135 0%
umap_narrative/700_datamapplot_for_layer.py 502 502 0%
umap_narrative/701_static_datamapplot_for_layer.py 310 310 0%
umap_narrative/702_consensus_divisive_datamapplot.py 432 432 0%
umap_narrative/801_narrative_report_batch.py 785 785 0%
umap_narrative/802_process_batch_results.py 265 265 0%
umap_narrative/803_check_batch_status.py 175 175 0%
umap_narrative/llm_factory_constructor/init.py 2 2 0%
umap_narrative/llm_factory_constructor/model_provider.py 157 157 0%
umap_narrative/polismath_commentgraph/init.py 1 0 100%
umap_narrative/polismath_commentgraph/cli.py 270 270 0%
umap_narrative/polismath_commentgraph/core/init.py 3 3 0%
umap_narrative/polismath_commentgraph/core/clustering.py 108 108 0%
umap_narrative/polismath_commentgraph/core/embedding.py 104 104 0%
umap_narrative/polismath_commentgraph/lambda_handler.py 219 219 0%
umap_narrative/polismath_commentgraph/schemas/init.py 2 0 100%
umap_narrative/polismath_commentgraph/schemas/dynamo_models.py 160 9 94%
umap_narrative/polismath_commentgraph/tests/conftest.py 17 17 0%
umap_narrative/polismath_commentgraph/tests/test_clustering.py 74 74 0%
umap_narrative/polismath_commentgraph/tests/test_embedding.py 55 55 0%
umap_narrative/polismath_commentgraph/tests/test_storage.py 87 87 0%
umap_narrative/polismath_commentgraph/utils/init.py 3 0 100%
umap_narrative/polismath_commentgraph/utils/converter.py 283 237 16%
umap_narrative/polismath_commentgraph/utils/group_data.py 354 336 5%
umap_narrative/polismath_commentgraph/utils/storage.py 584 477 18%
umap_narrative/reset_conversation.py 159 50 69%
umap_narrative/run_pipeline.py 453 312 31%
utils/general.py 62 41 34%
Total 10772 7249 33%

@jucor jucor merged commit 1f3113f into edge Mar 27, 2026
4 checks passed
@jucor jucor deleted the jc/kmeans_clustering_tooling branch March 27, 2026 02:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants