[Stack 8/17] Speed up regression tests by jucor · Pull Request #2515 · compdemocracy/polis

jucor · 2026-03-30T22:25:14Z

Summary

Default benchmark=False in compare_with_golden() — benchmark mode ran the pipeline 3x for timing statistics, unnecessary for correctness checks. The regression_comparer.py script already had --benchmark as opt-in, so this aligns the default.
Add skip_intermediate_stages parameter to compute_all_stages() — test_conversation_regression now skips stages 1-4 (empty, load-only, PCA-only, PCA+clustering) since it only checks overall_match. test_conversation_stages_individually still runs all stages for granular failure detection.

Measured speedup on one of the large private test conversations

Test	Before	After	Speedup
`test_conversation_regression`	317s	23s	13.9x
`test_conversation_stages_individually`	60s	32s	1.9x

The regression test's ~14x speedup comes from two combined effects: no longer running the pipeline 3x (benchmark), and skipping 4 redundant intermediate stages.

Test plan

All 9 public regression tests pass (vw + biodiversity)
Private dataset tests pass (--include-local)
Timing verified on large private dataset

🤖 Generated with Claude Code

Squashed commits

Address Copilot review: fix stale terminology, hardcoded blob_type, and synthetic test tid range
Speed up regression tests: disable benchmark, skip intermediate stages

commit-id:f39f3218

Stack:

⚠️ Part of a stack created by spr. Do not merge manually using the UI - doing so may have unexpected results.

## Summary - Default `benchmark=False` in `compare_with_golden()` — benchmark mode ran the pipeline 3x for timing statistics, unnecessary for correctness checks. The `regression_comparer.py` script already had `--benchmark` as opt-in, so this aligns the default. - Add `skip_intermediate_stages` parameter to `compute_all_stages()` — `test_conversation_regression` now skips stages 1-4 (empty, load-only, PCA-only, PCA+clustering) since it only checks `overall_match`. `test_conversation_stages_individually` still runs all stages for granular failure detection. ### Measured speedup on one of the large private test conversations | Test | Before | After | Speedup | |------|--------|-------|---------| | `test_conversation_regression` | 317s | 23s | **13.9x** | | `test_conversation_stages_individually` | 60s | 32s | **1.9x** | The regression test's ~14x speedup comes from two combined effects: no longer running the pipeline 3x (benchmark), and skipping 4 redundant intermediate stages. ## Test plan - [x] All 9 public regression tests pass (vw + biodiversity) - [x] Private dataset tests pass (`--include-local`) - [x] Timing verified on large private dataset 🤖 Generated with [Claude Code](https://claude.com/claude-code) ## Squashed commits - Address Copilot review: fix stale terminology, hardcoded blob_type, and synthetic test tid range - Speed up regression tests: disable benchmark, skip intermediate stages commit-id:f39f3218

github-actions · 2026-03-31T00:58:20Z

Delphi Coverage Report

File	Stmts	Miss	Cover
init.py	2	0	100%
benchmarks/bench_pca.py	76	76	0%
benchmarks/bench_repness.py	81	81	0%
benchmarks/bench_update_votes.py	38	38	0%
benchmarks/benchmark_utils.py	34	34	0%
components/init.py	1	0	100%
components/config.py	165	133	19%
conversation/init.py	2	0	100%
conversation/conversation.py	1117	328	71%
conversation/manager.py	131	42	68%
database/init.py	1	0	100%
database/dynamodb.py	387	234	40%
database/postgres.py	305	205	33%
pca_kmeans_rep/init.py	5	0	100%
pca_kmeans_rep/clusters.py	257	22	91%
pca_kmeans_rep/corr.py	98	17	83%
pca_kmeans_rep/pca.py	52	16	69%
pca_kmeans_rep/repness.py	361	51	86%
pca_kmeans_rep/stats.py	107	22	79%
regression/init.py	4	0	100%
regression/clojure_comparer.py	188	17	91%
regression/comparer.py	887	720	19%
regression/datasets.py	135	27	80%
regression/recorder.py	36	27	25%
regression/utils.py	138	119	14%
run_math_pipeline.py	260	114	56%
umap_narrative/500_generate_embedding_umap_cluster.py	210	109	48%
umap_narrative/501_calculate_comment_extremity.py	112	54	52%
umap_narrative/502_calculate_priorities.py	135	135	0%
umap_narrative/700_datamapplot_for_layer.py	502	502	0%
umap_narrative/701_static_datamapplot_for_layer.py	310	310	0%
umap_narrative/702_consensus_divisive_datamapplot.py	432	432	0%
umap_narrative/801_narrative_report_batch.py	785	785	0%
umap_narrative/802_process_batch_results.py	265	265	0%
umap_narrative/803_check_batch_status.py	175	175	0%
umap_narrative/llm_factory_constructor/init.py	2	2	0%
umap_narrative/llm_factory_constructor/model_provider.py	157	157	0%
umap_narrative/polismath_commentgraph/init.py	1	0	100%
umap_narrative/polismath_commentgraph/cli.py	270	270	0%
umap_narrative/polismath_commentgraph/core/init.py	3	3	0%
umap_narrative/polismath_commentgraph/core/clustering.py	108	108	0%
umap_narrative/polismath_commentgraph/core/embedding.py	104	104	0%
umap_narrative/polismath_commentgraph/lambda_handler.py	219	219	0%
umap_narrative/polismath_commentgraph/schemas/init.py	2	0	100%
umap_narrative/polismath_commentgraph/schemas/dynamo_models.py	160	9	94%
umap_narrative/polismath_commentgraph/tests/conftest.py	17	17	0%
umap_narrative/polismath_commentgraph/tests/test_clustering.py	74	74	0%
umap_narrative/polismath_commentgraph/tests/test_embedding.py	55	55	0%
umap_narrative/polismath_commentgraph/tests/test_storage.py	87	87	0%
umap_narrative/polismath_commentgraph/utils/init.py	3	0	100%
umap_narrative/polismath_commentgraph/utils/converter.py	283	237	16%
umap_narrative/polismath_commentgraph/utils/group_data.py	354	336	5%
umap_narrative/polismath_commentgraph/utils/storage.py	584	477	18%
umap_narrative/reset_conversation.py	159	50	69%
umap_narrative/run_pipeline.py	453	312	31%
utils/general.py	62	41	34%
Total	10951	7648	30%

jucor changed the title ~~Speed up regression tests~~ [Stack 8/17] Speed up regression tests Mar 30, 2026

jucor force-pushed the spr/edge/f39f3218 branch from 591e196 to 510205d Compare March 30, 2026 22:39

jucor force-pushed the spr/edge/6ae3ee43 branch from 4ad6046 to 603f0ac Compare March 30, 2026 22:47

jucor force-pushed the spr/edge/f39f3218 branch from 510205d to b1ebec4 Compare March 30, 2026 22:47

jucor force-pushed the spr/edge/f39f3218 branch from b1ebec4 to 8637399 Compare March 31, 2026 00:35

jucor force-pushed the spr/edge/6ae3ee43 branch from 603f0ac to b9dcc89 Compare March 31, 2026 00:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Stack 8/17] Speed up regression tests#2515

[Stack 8/17] Speed up regression tests#2515
jucor wants to merge 1 commit intospr/edge/6ae3ee43from
spr/edge/f39f3218

jucor commented Mar 30, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jucor commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Measured speedup on one of the large private test conversations

Test plan

Squashed commits

Uh oh!

github-actions bot commented Mar 31, 2026

Delphi Coverage Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jucor commented Mar 30, 2026 •

edited

Loading