[Stack 3/25] Clojure comparison test consolidation and cleanup#2418
[Stack 3/25] Clojure comparison test consolidation and cleanup#2418
Conversation
e35f31a to
ad8a92c
Compare
05eaeae to
359a2dd
Compare
ad8a92c to
00c659e
Compare
359a2dd to
ecd5b9b
Compare
00c659e to
69cf22e
Compare
ecebe3b to
ea26a44
Compare
There was a problem hiding this comment.
Pull request overview
This PR continues the test/tooling cleanup around Python-vs-Clojure regression comparisons by consolidating legacy tests, adding a shared comparison utility module, and introducing a CLI to run comparisons outside pytest.
Changes:
- Fix e2e mock dataset path and align test filename with pytest naming conventions.
- Add
polismath.regression.clojure_comparer(shared comparison utilities) and ascripts/clojure_comparer.pyCLI. - Remove older overlapping/broken legacy comparison scripts and add documentation for running/understanding the comparisons.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| delphi/tests/test_math_pipeline_runs_e2e.py | Updates mock dataset directory used by the e2e pipeline test. |
| delphi/tests/test_legacy_clojure_regression.py | Switches clustering comparison to use the shared comparer and adds a clojure_comparison marker. |
| delphi/tests/test_legacy_clojure_output.py | Removes a legacy “inspect Clojure output” script/test file. |
| delphi/tests/legacy_compare_with_clojure.py | Removes a legacy direct-comparison script. |
| delphi/scripts/compare_implementations.py | Removes an older comparison CLI/tooling script. |
| delphi/scripts/clojure_comparer.py | Adds a new Click-based CLI for Python vs Clojure comparisons. |
| delphi/pyproject.toml | Registers the new clojure_comparison pytest marker. |
| delphi/polismath/regression/clojure_comparer.py | Adds the shared comparer utilities (cluster distribution/membership + projection comparison). |
| delphi/polismath/regression/init.py | Re-exports the new Clojure comparer helpers from the regression package. |
| delphi/docs/CLOJURE_COMPARISON.md | Adds documentation describing the comparison suite and usage. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
whilo
left a comment
There was a problem hiding this comment.
Review
Good consolidation PR. The main additions are solid:
** library** is well-designed — Wasserstein distance for cluster distributions, Jaccard similarity for membership, handling of the 8 sign/axis ambiguities in projection comparison. It's not just test scaffolding: PR #2431 extends it with unfold_clojure_group_clusters() for fair two-level clustering comparison, confirming it earns its place as a reusable library. The find_best_cluster_mapping() greedy approach is a reasonable tradeoff over optimal assignment.
Test cleanup (deleting legacy_compare_with_clojure.py, test_legacy_clojure_output.py, compare_implementations.py, test_golden_data.py) removes dead weight. The refactored test_group_clustering using ClojureComparer is cleaner and shares comparison logic with the CLI tool.
test_math_pipeline_runs_e2e.py is rough — the re.search(r'LIMIT (\\d+)') regex intercept and the # FIX: comments scattered throughout signal it's a placeholder. The assertions (len(items) > 0) are minimal smoke tests rather than correctness checks. That said, given Clojure will eventually be removed and this test is explicitly framed as temporary comparison infrastructure, the approach is acceptable. It catches gross failures without requiring a live DB.
One small style note: clojure_comparer.py uses float | None union syntax (Python 3.10+) while the rest of the codebase uses Optional[float]. Minor inconsistency but worth noting for contributors.
Approved — the infrastructure justifies the PR, and the known rough edges in the e2e test are acceptable given the Clojure deprecation trajectory.
8f8c26d to
a2d5e3c
Compare
There was a problem hiding this comment.
Pull request overview
Consolidates and modernizes the Python↔Clojure regression comparison tooling by centralizing comparison logic into a shared library, adding a CLI, and removing legacy scripts/tests.
Changes:
- Added
polismath.regression.clojure_comparercomparison utilities and exposed them viapolismath.regressionexports. - Updated/expanded the consolidated pytest suite and introduced a
clojure_comparisonmarker. - Added
scripts/clojure_comparer.py, updated E2E mock data path, and removed multiple legacy comparison scripts/files; added docs.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| delphi/tests/test_legacy_clojure_regression.py | Switches clustering comparison to shared ClojureComparer utilities and adds a pytest marker. |
| delphi/tests/test_legacy_clojure_output.py | Removes legacy Clojure output analysis script. |
| delphi/tests/run_math_pipeline_test.py | Adjusts mock dataset directory path for E2E pipeline test. |
| delphi/tests/legacy_compare_with_clojure.py | Removes legacy direct comparison script. |
| delphi/scripts/compare_implementations.py | Removes legacy implementation comparison CLI. |
| delphi/scripts/clojure_comparer.py | Adds a new Click-based CLI for Python↔Clojure comparisons. |
| delphi/pyproject.toml | Registers the clojure_comparison pytest marker. |
| delphi/polismath/regression/clojure_comparer.py | Adds shared comparison logic (clusters + projections) including Wasserstein/Jaccard metrics. |
| delphi/polismath/regression/init.py | Re-exports the new comparer utilities as part of the regression package API. |
| delphi/docs/CLOJURE_COMPARISON.md | Documents how to run/interpret comparisons and explains known algorithmic differences. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Consolidate three overlapping test files into a coherent suite: - Removed: tests/test_legacy_clojure_output.py (structure analysis) - Removed: tests/legacy_compare_with_clojure.py (direct comparison script) - Enhanced: tests/test_legacy_clojure_regression.py (main pytest suite) Created shared utilities and CLI tool: - polismath/regression/clojure_comparer.py: Comparison utilities with ClojureComparer class, cluster distribution/membership comparison, PCA projection comparison with transformations - scripts/clojure_comparer.py: Interactive CLI tool for detailed comparison (similar UX to regression_comparer.py) - docs/CLOJURE_COMPARISON.md: Comprehensive documentation Organized following codebase patterns: - Placed utilities in polismath/regression/ alongside other regression utilities (comparer.py, recorder.py, datasets.py) - Updated imports in consuming files - Added exports to polismath/regression/__init__.py for clean imports Added pytest marker: - @pytest.mark.clojure_comparison for selective test execution - Can exclude with: pytest -m "not clojure_comparison" Key discovery: - Clojure uses two-level clustering (participants→base-clusters→groups) - Python uses single-level clustering (participants→groups directly) - This architectural difference explains 0% Jaccard similarity Tests run and detect differences (as expected until initialization is aligned). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
a2d5e3c to
57d1d33
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…r.py) This script originated as tests/test_real_data_comparison.py (March 2025), was renamed to scripts/compare_implementations.py in #2282 (Nov 2025), and has been broken since that rename (SyntaxError: global TOLERANCE declared after prior use, plus undefined variable on line 486). Its replacement, scripts/clojure_comparer.py, was introduced earlier in this PR. It uses the shared ClojureComparer library, auto-discovers datasets, supports --include-local, and does rigorous cluster comparison (Jaccard membership + L1 distribution). The one feature compare_implementations.py had that clojure_comparer.py lacks — multi- tolerance comment priority comparison — is moot until D12 is implemented, at which point it should be added to the shared ClojureComparer library. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove unused imports in test_legacy_clojure_regression.py
- Add click as declared dependency in pyproject.toml
- Fix redundant sys.exit with Click standalone mode
- Fix circular import: import from .datasets directly, not package init
- Remove SciPy hard dep: don't re-export clojure_comparer from __init__.py
- Replace Wasserstein distance with L1 distance (Copilot: scipy was misused
on probability vectors; L1 is well-defined for normalized distributions)
- match_status now requires num_clusters_match and complete mapping
(Copilot: could false-positive PASS with different cluster counts)
- Add 23 unit tests for compare_cluster_distributions,
compare_cluster_membership, compare_projections, and
compute_distribution_similarity with synthetic inputs
- Fix doc file paths in CLOJURE_COMPARISON.md ({report_id}_math_blob.json)
- Fix float|None -> Optional style inconsistency (Christian's review)
- CLI: try/except for test code import with clear error message
- Also: remove stale PYTEST_ADDOPTS="-n auto" from ~/.exports
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
57d1d33 to
d41e3b8
Compare
Delphi Coverage Report
|
Summary
run_math_pipeline_test.py→test_math_pipeline_runs_e2e.pyto follow pytest conventionsClojureComparerlibrary (polismath/regression/clojure_comparer.py) for comparing Python vs Clojure math outputsscripts/clojure_comparer.py)test_legacy_clojure_regression.pylegacy_compare_with_clojure.py,test_legacy_clojure_output.py,compare_implementations.pyCLOJURE_COMPARISON.mddocumentationTest plan
🤖 Generated with Claude Code