[Stack 12/27] Speed up regression tests by jucor · Pull Request #2436 · compdemocracy/polis

jucor · 2026-03-11T12:33:09Z

Summary

Stacked on #2435 (Fix D4: pseudocount formula). Please review and merge #2435 first.
Next in stack: #2437 (Vectorize participant info computation (3-15x speedup))

Default benchmark=False in compare_with_golden() — benchmark mode ran the pipeline 3x for timing statistics, unnecessary for correctness checks. The regression_comparer.py script already had --benchmark as opt-in, so this aligns the default.
Add skip_intermediate_stages parameter to compute_all_stages() — test_conversation_regression now skips stages 1-4 (empty, load-only, PCA-only, PCA+clustering) since it only checks overall_match. test_conversation_stages_individually still runs all stages for granular failure detection.

Measured speedup on one of the large private test conversations

Test	Before	After	Speedup
`test_conversation_regression`	317s	23s	13.9x
`test_conversation_stages_individually`	60s	32s	1.9x

The regression test's ~14x speedup comes from two combined effects: no longer running the pipeline 3x (benchmark), and skipping 4 redundant intermediate stages.

Test plan

All 9 public regression tests pass (vw + biodiversity)
Private dataset tests pass (--include-local)
Timing verified on large private dataset

🤖 Generated with Claude Code

Copilot

Pull request overview

This PR speeds up Delphi’s Python regression tests by avoiding unnecessary benchmark reruns and allowing the regression pipeline to skip redundant intermediate computation stages when only an overall match is needed.

Changes:

Set ConversationComparer.compare_with_golden() to default benchmark=False (benchmarking remains opt-in).
Add skip_intermediate_stages plumbing through compare_with_golden() → compute_all_stages() / compute_all_stages_with_benchmark().
Update the main regression pytest to skip stages 1–4 while keeping the per-stage test unchanged.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
`delphi/tests/test_regression.py`	Uses `skip_intermediate_stages=True` for the overall regression test to reduce runtime.
`delphi/polismath/regression/utils.py`	Adds `skip_intermediate_stages` to `compute_all_stages()` and threads it through benchmark runs.
`delphi/polismath/regression/comparer.py`	Changes default benchmark behavior and forwards `skip_intermediate_stages` into stage computation.
`delphi/docs/PLAN_DISCREPANCY_FIXES.md`	Updates documentation to reflect the current stack/PR mapping and naming.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

delphi/polismath/regression/comparer.py

delphi/polismath/regression/utils.py

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…nd synthetic test tid range - Fix error message "No full blob" → "No incremental blob" in datasets.py - Remove hardcoded blob_type='incremental' in test helpers (use default) - Update docstrings from '*-full' to '*-incremental'/'*-cold_start' - Fix synthetic D2c test: range(5,11) → range(4,10) for valid tids - Add comment explaining xdist_group strategy for blob variants Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Two changes that reduce regression test wall time significantly: 1. Default benchmark=False in compare_with_golden(). Benchmark mode runs the full pipeline 3x for timing statistics — useful for perf analysis but unnecessary for correctness checks. The regression_comparer script already had benchmark as opt-in (--benchmark flag), so this aligns the default. Callers that need benchmarking can still pass benchmark=True explicitly. 2. Add skip_intermediate_stages parameter to compute_all_stages(). test_conversation_regression now skips stages 1-4 (empty, load-only, PCA-only, PCA+clustering) since it only checks overall_match. test_conversation_stages_individually still runs all stages. Skipped stages are silently ignored in the comparison (existing behavior for missing stages). For large private datasets, this cuts test_conversation_regression time from ~5x to ~1x of a single pipeline run (e.g. one large conversation went from ~317s to ~60s). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-03-30T17:56:55Z

Delphi Coverage Report

File	Stmts	Miss	Cover
init.py	2	0	100%
benchmarks/bench_pca.py	76	76	0%
benchmarks/bench_repness.py	81	81	0%
benchmarks/bench_update_votes.py	38	38	0%
benchmarks/benchmark_utils.py	34	34	0%
components/init.py	1	0	100%
components/config.py	165	133	19%
conversation/init.py	2	0	100%
conversation/conversation.py	1117	328	71%
conversation/manager.py	131	42	68%
database/init.py	1	0	100%
database/dynamodb.py	387	234	40%
database/postgres.py	305	205	33%
pca_kmeans_rep/init.py	5	0	100%
pca_kmeans_rep/clusters.py	257	22	91%
pca_kmeans_rep/corr.py	98	17	83%
pca_kmeans_rep/pca.py	52	16	69%
pca_kmeans_rep/repness.py	361	51	86%
pca_kmeans_rep/stats.py	107	22	79%
regression/init.py	4	0	100%
regression/clojure_comparer.py	188	17	91%
regression/comparer.py	887	720	19%
regression/datasets.py	135	27	80%
regression/recorder.py	36	27	25%
regression/utils.py	138	119	14%
run_math_pipeline.py	260	114	56%
umap_narrative/500_generate_embedding_umap_cluster.py	210	109	48%
umap_narrative/501_calculate_comment_extremity.py	112	54	52%
umap_narrative/502_calculate_priorities.py	135	135	0%
umap_narrative/700_datamapplot_for_layer.py	502	502	0%
umap_narrative/701_static_datamapplot_for_layer.py	310	310	0%
umap_narrative/702_consensus_divisive_datamapplot.py	432	432	0%
umap_narrative/801_narrative_report_batch.py	785	785	0%
umap_narrative/802_process_batch_results.py	265	265	0%
umap_narrative/803_check_batch_status.py	175	175	0%
umap_narrative/llm_factory_constructor/init.py	2	2	0%
umap_narrative/llm_factory_constructor/model_provider.py	157	157	0%
umap_narrative/polismath_commentgraph/init.py	1	0	100%
umap_narrative/polismath_commentgraph/cli.py	270	270	0%
umap_narrative/polismath_commentgraph/core/init.py	3	3	0%
umap_narrative/polismath_commentgraph/core/clustering.py	108	108	0%
umap_narrative/polismath_commentgraph/core/embedding.py	104	104	0%
umap_narrative/polismath_commentgraph/lambda_handler.py	219	219	0%
umap_narrative/polismath_commentgraph/schemas/init.py	2	0	100%
umap_narrative/polismath_commentgraph/schemas/dynamo_models.py	160	9	94%
umap_narrative/polismath_commentgraph/tests/conftest.py	17	17	0%
umap_narrative/polismath_commentgraph/tests/test_clustering.py	74	74	0%
umap_narrative/polismath_commentgraph/tests/test_embedding.py	55	55	0%
umap_narrative/polismath_commentgraph/tests/test_storage.py	87	87	0%
umap_narrative/polismath_commentgraph/utils/init.py	3	0	100%
umap_narrative/polismath_commentgraph/utils/converter.py	283	237	16%
umap_narrative/polismath_commentgraph/utils/group_data.py	354	336	5%
umap_narrative/polismath_commentgraph/utils/storage.py	584	477	18%
umap_narrative/reset_conversation.py	159	50	69%
umap_narrative/run_pipeline.py	453	312	31%
utils/general.py	62	41	34%
Total	10951	7648	30%

jucor · 2026-03-30T22:54:35Z

Superseded by spr-managed PR stack. See the new stack starting at #2508.

jucor force-pushed the jc/regression-test-perf branch from b6518b8 to 241ab18 Compare March 11, 2026 12:40

jucor changed the title ~~[Clj parity PR 3] Speed up regression tests~~ Speed up regression tests Mar 11, 2026

jucor changed the title ~~Speed up regression tests~~ [Stack 10/10] Speed up regression tests Mar 11, 2026

jucor mentioned this pull request Mar 11, 2026

[Stack 11/27] Fix D4: pseudocount formula #2435

Closed

5 tasks

jucor requested review from ballPointPenguin, Copilot and whilo March 11, 2026 12:56

Copilot started reviewing on behalf of jucor March 11, 2026 12:56 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

delphi/polismath/regression/comparer.py Show resolved Hide resolved

delphi/polismath/regression/utils.py Show resolved Hide resolved

jucor changed the title ~~[Stack 10/10] Speed up regression tests~~ [Stack 10/11] Speed up regression tests Mar 11, 2026

jucor force-pushed the jc/clj-parity-d4-fix branch from cb557b3 to f0516e8 Compare March 13, 2026 13:09

jucor force-pushed the jc/regression-test-perf branch from 013624a to 19d4a5b Compare March 13, 2026 13:09

jucor changed the title ~~[Stack 10/11] Speed up regression tests~~ [Stack 10/12] Speed up regression tests Mar 13, 2026

jucor mentioned this pull request Mar 13, 2026

[Stack 13/27] Vectorize participant info computation (3-15x speedup) #2437

Closed

5 tasks

jucor force-pushed the jc/clj-parity-d4-fix branch from f0516e8 to ebd71ca Compare March 13, 2026 13:46

jucor force-pushed the jc/regression-test-perf branch from 19d4a5b to 4a57515 Compare March 13, 2026 13:50

jucor force-pushed the jc/clj-parity-d4-fix branch from ebd71ca to 758355c Compare March 13, 2026 14:13

jucor changed the title ~~[Stack 10/12] Speed up regression tests~~ [Stack 10/13] Speed up regression tests Mar 13, 2026

jucor force-pushed the jc/regression-test-perf branch from 4a57515 to 967ffec Compare March 13, 2026 14:13

jucor force-pushed the jc/clj-parity-d4-fix branch from 758355c to 35d24b1 Compare March 13, 2026 15:56

jucor force-pushed the jc/regression-test-perf branch from 967ffec to 4b803ca Compare March 13, 2026 15:56

jucor changed the title ~~[Stack 10/13] Speed up regression tests~~ [Stack 10/15] Speed up regression tests Mar 16, 2026

jucor force-pushed the jc/clj-parity-d4-fix branch from 35d24b1 to d295389 Compare March 16, 2026 16:04

jucor force-pushed the jc/regression-test-perf branch from 4b803ca to f4252a0 Compare March 16, 2026 16:04

jucor changed the title ~~[Stack 10/15] Speed up regression tests~~ [Stack 10/16] Speed up regression tests Mar 16, 2026

jucor changed the title ~~[Stack 10/16] Speed up regression tests~~ [Stack 10/17] Speed up regression tests Mar 16, 2026

jucor changed the title ~~[Stack 10/17] Speed up regression tests~~ [Stack 10/24] Speed up regression tests Mar 17, 2026

jucor changed the title ~~[Stack 10/24] Speed up regression tests~~ [Stack 10/25] Speed up regression tests Mar 17, 2026

jucor force-pushed the jc/clj-parity-d4-fix branch from d295389 to 7e6ccc1 Compare March 19, 2026 10:43

jucor force-pushed the jc/regression-test-perf branch from f4252a0 to 3f09ab0 Compare March 19, 2026 10:43

jucor force-pushed the jc/clj-parity-d4-fix branch from 303fd4e to 081bdb0 Compare March 23, 2026 17:47

jucor force-pushed the jc/regression-test-perf branch from d519507 to bd78b5e Compare March 23, 2026 17:47

jucor force-pushed the jc/clj-parity-d4-fix branch from 081bdb0 to 6093159 Compare March 26, 2026 21:24

jucor force-pushed the jc/regression-test-perf branch from bd78b5e to 478fa73 Compare March 26, 2026 21:24

jucor force-pushed the jc/clj-parity-d4-fix branch from 6093159 to cca501b Compare March 27, 2026 01:15

jucor force-pushed the jc/regression-test-perf branch 2 times, most recently from ce0c874 to da4b4de Compare March 27, 2026 02:10

jucor force-pushed the jc/clj-parity-d4-fix branch 2 times, most recently from 49be8f6 to c620c1e Compare March 27, 2026 10:41

jucor force-pushed the jc/regression-test-perf branch from da4b4de to a9b000e Compare March 27, 2026 10:41

jucor changed the title ~~[Stack 10/25] Speed up regression tests~~ [Stack 11/26] Speed up regression tests Mar 30, 2026

jucor force-pushed the jc/clj-parity-d4-fix branch from c620c1e to 707c63d Compare March 30, 2026 12:48

jucor force-pushed the jc/regression-test-perf branch from a9b000e to 41dec2d Compare March 30, 2026 12:48

jucor changed the title ~~[Stack 11/26] Speed up regression tests~~ [Stack 12/27] Speed up regression tests Mar 30, 2026

jucor force-pushed the jc/clj-parity-d4-fix branch from 707c63d to ddb4e01 Compare March 30, 2026 12:54

jucor force-pushed the jc/regression-test-perf branch from 41dec2d to b3d7506 Compare March 30, 2026 12:54

jucor requested a review from Copilot March 30, 2026 16:25

Copilot started reviewing on behalf of jucor March 30, 2026 16:26 View session

Copilot AI reviewed Mar 30, 2026

View reviewed changes

jucor force-pushed the jc/clj-parity-d4-fix branch from ddb4e01 to 8fa1cf3 Compare March 30, 2026 16:49

jucor force-pushed the jc/regression-test-perf branch from b3d7506 to 5611792 Compare March 30, 2026 16:49

jucor and others added 2 commits March 30, 2026 17:58

jucor force-pushed the jc/regression-test-perf branch from 5611792 to 6a96d48 Compare March 30, 2026 17:05

jucor force-pushed the jc/clj-parity-d4-fix branch from 8fa1cf3 to 5893382 Compare March 30, 2026 17:05

This was referenced Mar 30, 2026

IGNORE -- crash from spr #2496

Closed

IGNORE -- crash from spr #2498

Closed

jucor closed this Mar 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Stack 12/27] Speed up regression tests#2436

[Stack 12/27] Speed up regression tests#2436
jucor wants to merge 2 commits intojc/clj-parity-d4-fixfrom
jc/regression-test-perf

jucor commented Mar 11, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions bot commented Mar 30, 2026

Uh oh!

jucor commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jucor commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Measured speedup on one of the large private test conversations

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

github-actions bot commented Mar 30, 2026

Delphi Coverage Report

Uh oh!

jucor commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jucor commented Mar 11, 2026 •

edited

Loading