Skip to content

[Stack 12/27] Speed up regression tests#2436

Closed
jucor wants to merge 2 commits intojc/clj-parity-d4-fixfrom
jc/regression-test-perf
Closed

[Stack 12/27] Speed up regression tests#2436
jucor wants to merge 2 commits intojc/clj-parity-d4-fixfrom
jc/regression-test-perf

Conversation

@jucor
Copy link
Copy Markdown
Collaborator

@jucor jucor commented Mar 11, 2026

Summary

Stacked on #2435 (Fix D4: pseudocount formula). Please review and merge #2435 first.
Next in stack: #2437 (Vectorize participant info computation (3-15x speedup))

  • Default benchmark=False in compare_with_golden() — benchmark mode ran the pipeline 3x for timing statistics, unnecessary for correctness checks. The regression_comparer.py script already had --benchmark as opt-in, so this aligns the default.
  • Add skip_intermediate_stages parameter to compute_all_stages()test_conversation_regression now skips stages 1-4 (empty, load-only, PCA-only, PCA+clustering) since it only checks overall_match. test_conversation_stages_individually still runs all stages for granular failure detection.

Measured speedup on one of the large private test conversations

Test Before After Speedup
test_conversation_regression 317s 23s 13.9x
test_conversation_stages_individually 60s 32s 1.9x

The regression test's ~14x speedup comes from two combined effects: no longer running the pipeline 3x (benchmark), and skipping 4 redundant intermediate stages.

Test plan

  • All 9 public regression tests pass (vw + biodiversity)
  • Private dataset tests pass (--include-local)
  • Timing verified on large private dataset

🤖 Generated with Claude Code

@jucor jucor force-pushed the jc/regression-test-perf branch from b6518b8 to 241ab18 Compare March 11, 2026 12:40
@jucor jucor changed the title [Clj parity PR 3] Speed up regression tests Speed up regression tests Mar 11, 2026
@jucor jucor changed the title Speed up regression tests [Stack 10/10] Speed up regression tests Mar 11, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR speeds up Delphi’s Python regression tests by avoiding unnecessary benchmark reruns and allowing the regression pipeline to skip redundant intermediate computation stages when only an overall match is needed.

Changes:

  • Set ConversationComparer.compare_with_golden() to default benchmark=False (benchmarking remains opt-in).
  • Add skip_intermediate_stages plumbing through compare_with_golden()compute_all_stages() / compute_all_stages_with_benchmark().
  • Update the main regression pytest to skip stages 1–4 while keeping the per-stage test unchanged.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
delphi/tests/test_regression.py Uses skip_intermediate_stages=True for the overall regression test to reduce runtime.
delphi/polismath/regression/utils.py Adds skip_intermediate_stages to compute_all_stages() and threads it through benchmark runs.
delphi/polismath/regression/comparer.py Changes default benchmark behavior and forwards skip_intermediate_stages into stage computation.
delphi/docs/PLAN_DISCREPANCY_FIXES.md Updates documentation to reflect the current stack/PR mapping and naming.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jucor jucor changed the title [Stack 10/10] Speed up regression tests [Stack 10/11] Speed up regression tests Mar 11, 2026
@jucor jucor force-pushed the jc/clj-parity-d4-fix branch from cb557b3 to f0516e8 Compare March 13, 2026 13:09
@jucor jucor force-pushed the jc/regression-test-perf branch from 013624a to 19d4a5b Compare March 13, 2026 13:09
@jucor jucor changed the title [Stack 10/11] Speed up regression tests [Stack 10/12] Speed up regression tests Mar 13, 2026
@jucor jucor force-pushed the jc/clj-parity-d4-fix branch from f0516e8 to ebd71ca Compare March 13, 2026 13:46
@jucor jucor force-pushed the jc/regression-test-perf branch from 19d4a5b to 4a57515 Compare March 13, 2026 13:50
@jucor jucor force-pushed the jc/clj-parity-d4-fix branch from ebd71ca to 758355c Compare March 13, 2026 14:13
@jucor jucor changed the title [Stack 10/12] Speed up regression tests [Stack 10/13] Speed up regression tests Mar 13, 2026
@jucor jucor force-pushed the jc/regression-test-perf branch from 4a57515 to 967ffec Compare March 13, 2026 14:13
@jucor jucor force-pushed the jc/clj-parity-d4-fix branch from 758355c to 35d24b1 Compare March 13, 2026 15:56
@jucor jucor force-pushed the jc/regression-test-perf branch from 967ffec to 4b803ca Compare March 13, 2026 15:56
@jucor jucor changed the title [Stack 10/13] Speed up regression tests [Stack 10/15] Speed up regression tests Mar 16, 2026
@jucor jucor force-pushed the jc/clj-parity-d4-fix branch from 35d24b1 to d295389 Compare March 16, 2026 16:04
@jucor jucor force-pushed the jc/regression-test-perf branch from 4b803ca to f4252a0 Compare March 16, 2026 16:04
@jucor jucor changed the title [Stack 10/15] Speed up regression tests [Stack 10/16] Speed up regression tests Mar 16, 2026
@jucor jucor changed the title [Stack 10/16] Speed up regression tests [Stack 10/17] Speed up regression tests Mar 16, 2026
@jucor jucor changed the title [Stack 10/17] Speed up regression tests [Stack 10/24] Speed up regression tests Mar 17, 2026
@jucor jucor changed the title [Stack 10/24] Speed up regression tests [Stack 10/25] Speed up regression tests Mar 17, 2026
@jucor jucor force-pushed the jc/clj-parity-d4-fix branch from d295389 to 7e6ccc1 Compare March 19, 2026 10:43
@jucor jucor force-pushed the jc/regression-test-perf branch from f4252a0 to 3f09ab0 Compare March 19, 2026 10:43
@jucor jucor force-pushed the jc/clj-parity-d4-fix branch from 303fd4e to 081bdb0 Compare March 23, 2026 17:47
@jucor jucor force-pushed the jc/regression-test-perf branch from d519507 to bd78b5e Compare March 23, 2026 17:47
@jucor jucor force-pushed the jc/clj-parity-d4-fix branch from 081bdb0 to 6093159 Compare March 26, 2026 21:24
@jucor jucor force-pushed the jc/regression-test-perf branch from bd78b5e to 478fa73 Compare March 26, 2026 21:24
@jucor jucor force-pushed the jc/clj-parity-d4-fix branch from 6093159 to cca501b Compare March 27, 2026 01:15
@jucor jucor force-pushed the jc/regression-test-perf branch 2 times, most recently from ce0c874 to da4b4de Compare March 27, 2026 02:10
@jucor jucor force-pushed the jc/clj-parity-d4-fix branch 2 times, most recently from 49be8f6 to c620c1e Compare March 27, 2026 10:41
@jucor jucor force-pushed the jc/regression-test-perf branch from da4b4de to a9b000e Compare March 27, 2026 10:41
@jucor jucor changed the title [Stack 10/25] Speed up regression tests [Stack 11/26] Speed up regression tests Mar 30, 2026
@jucor jucor force-pushed the jc/clj-parity-d4-fix branch from c620c1e to 707c63d Compare March 30, 2026 12:48
@jucor jucor force-pushed the jc/regression-test-perf branch from a9b000e to 41dec2d Compare March 30, 2026 12:48
@jucor jucor changed the title [Stack 11/26] Speed up regression tests [Stack 12/27] Speed up regression tests Mar 30, 2026
@jucor jucor force-pushed the jc/clj-parity-d4-fix branch from 707c63d to ddb4e01 Compare March 30, 2026 12:54
@jucor jucor force-pushed the jc/regression-test-perf branch from 41dec2d to b3d7506 Compare March 30, 2026 12:54
@jucor jucor requested a review from Copilot March 30, 2026 16:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jucor jucor force-pushed the jc/clj-parity-d4-fix branch from ddb4e01 to 8fa1cf3 Compare March 30, 2026 16:49
@jucor jucor force-pushed the jc/regression-test-perf branch from b3d7506 to 5611792 Compare March 30, 2026 16:49
jucor and others added 2 commits March 30, 2026 17:58
…nd synthetic test tid range

- Fix error message "No full blob" → "No incremental blob" in datasets.py
- Remove hardcoded blob_type='incremental' in test helpers (use default)
- Update docstrings from '*-full' to '*-incremental'/'*-cold_start'
- Fix synthetic D2c test: range(5,11) → range(4,10) for valid tids
- Add comment explaining xdist_group strategy for blob variants

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two changes that reduce regression test wall time significantly:

1. Default benchmark=False in compare_with_golden(). Benchmark mode
   runs the full pipeline 3x for timing statistics — useful for perf
   analysis but unnecessary for correctness checks. The regression_comparer
   script already had benchmark as opt-in (--benchmark flag), so this
   aligns the default. Callers that need benchmarking can still pass
   benchmark=True explicitly.

2. Add skip_intermediate_stages parameter to compute_all_stages().
   test_conversation_regression now skips stages 1-4 (empty, load-only,
   PCA-only, PCA+clustering) since it only checks overall_match.
   test_conversation_stages_individually still runs all stages.
   Skipped stages are silently ignored in the comparison (existing
   behavior for missing stages).

For large private datasets, this cuts test_conversation_regression time
from ~5x to ~1x of a single pipeline run (e.g. one large conversation
went from ~317s to ~60s).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jucor jucor force-pushed the jc/regression-test-perf branch from 5611792 to 6a96d48 Compare March 30, 2026 17:05
@jucor jucor force-pushed the jc/clj-parity-d4-fix branch from 8fa1cf3 to 5893382 Compare March 30, 2026 17:05
@github-actions
Copy link
Copy Markdown

Delphi Coverage Report

File Stmts Miss Cover
init.py 2 0 100%
benchmarks/bench_pca.py 76 76 0%
benchmarks/bench_repness.py 81 81 0%
benchmarks/bench_update_votes.py 38 38 0%
benchmarks/benchmark_utils.py 34 34 0%
components/init.py 1 0 100%
components/config.py 165 133 19%
conversation/init.py 2 0 100%
conversation/conversation.py 1117 328 71%
conversation/manager.py 131 42 68%
database/init.py 1 0 100%
database/dynamodb.py 387 234 40%
database/postgres.py 305 205 33%
pca_kmeans_rep/init.py 5 0 100%
pca_kmeans_rep/clusters.py 257 22 91%
pca_kmeans_rep/corr.py 98 17 83%
pca_kmeans_rep/pca.py 52 16 69%
pca_kmeans_rep/repness.py 361 51 86%
pca_kmeans_rep/stats.py 107 22 79%
regression/init.py 4 0 100%
regression/clojure_comparer.py 188 17 91%
regression/comparer.py 887 720 19%
regression/datasets.py 135 27 80%
regression/recorder.py 36 27 25%
regression/utils.py 138 119 14%
run_math_pipeline.py 260 114 56%
umap_narrative/500_generate_embedding_umap_cluster.py 210 109 48%
umap_narrative/501_calculate_comment_extremity.py 112 54 52%
umap_narrative/502_calculate_priorities.py 135 135 0%
umap_narrative/700_datamapplot_for_layer.py 502 502 0%
umap_narrative/701_static_datamapplot_for_layer.py 310 310 0%
umap_narrative/702_consensus_divisive_datamapplot.py 432 432 0%
umap_narrative/801_narrative_report_batch.py 785 785 0%
umap_narrative/802_process_batch_results.py 265 265 0%
umap_narrative/803_check_batch_status.py 175 175 0%
umap_narrative/llm_factory_constructor/init.py 2 2 0%
umap_narrative/llm_factory_constructor/model_provider.py 157 157 0%
umap_narrative/polismath_commentgraph/init.py 1 0 100%
umap_narrative/polismath_commentgraph/cli.py 270 270 0%
umap_narrative/polismath_commentgraph/core/init.py 3 3 0%
umap_narrative/polismath_commentgraph/core/clustering.py 108 108 0%
umap_narrative/polismath_commentgraph/core/embedding.py 104 104 0%
umap_narrative/polismath_commentgraph/lambda_handler.py 219 219 0%
umap_narrative/polismath_commentgraph/schemas/init.py 2 0 100%
umap_narrative/polismath_commentgraph/schemas/dynamo_models.py 160 9 94%
umap_narrative/polismath_commentgraph/tests/conftest.py 17 17 0%
umap_narrative/polismath_commentgraph/tests/test_clustering.py 74 74 0%
umap_narrative/polismath_commentgraph/tests/test_embedding.py 55 55 0%
umap_narrative/polismath_commentgraph/tests/test_storage.py 87 87 0%
umap_narrative/polismath_commentgraph/utils/init.py 3 0 100%
umap_narrative/polismath_commentgraph/utils/converter.py 283 237 16%
umap_narrative/polismath_commentgraph/utils/group_data.py 354 336 5%
umap_narrative/polismath_commentgraph/utils/storage.py 584 477 18%
umap_narrative/reset_conversation.py 159 50 69%
umap_narrative/run_pipeline.py 453 312 31%
utils/general.py 62 41 34%
Total 10951 7648 30%

This was referenced Mar 30, 2026
@jucor
Copy link
Copy Markdown
Collaborator Author

jucor commented Mar 30, 2026

Superseded by spr-managed PR stack. See the new stack starting at #2508.

@jucor jucor closed this Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants