Skip to content

[Stack 8/17] Speed up regression tests#2515

Open
jucor wants to merge 1 commit intospr/edge/6ae3ee43from
spr/edge/f39f3218
Open

[Stack 8/17] Speed up regression tests#2515
jucor wants to merge 1 commit intospr/edge/6ae3ee43from
spr/edge/f39f3218

Conversation

@jucor
Copy link
Copy Markdown
Collaborator

@jucor jucor commented Mar 30, 2026

Summary

  • Default benchmark=False in compare_with_golden() — benchmark mode ran the pipeline 3x for timing statistics, unnecessary for correctness checks. The regression_comparer.py script already had --benchmark as opt-in, so this aligns the default.
  • Add skip_intermediate_stages parameter to compute_all_stages()test_conversation_regression now skips stages 1-4 (empty, load-only, PCA-only, PCA+clustering) since it only checks overall_match. test_conversation_stages_individually still runs all stages for granular failure detection.

Measured speedup on one of the large private test conversations

Test Before After Speedup
test_conversation_regression 317s 23s 13.9x
test_conversation_stages_individually 60s 32s 1.9x

The regression test's ~14x speedup comes from two combined effects: no longer running the pipeline 3x (benchmark), and skipping 4 redundant intermediate stages.

Test plan

  • All 9 public regression tests pass (vw + biodiversity)
  • Private dataset tests pass (--include-local)
  • Timing verified on large private dataset

🤖 Generated with Claude Code

Squashed commits

  • Address Copilot review: fix stale terminology, hardcoded blob_type, and synthetic test tid range
  • Speed up regression tests: disable benchmark, skip intermediate stages

commit-id:f39f3218


Stack:


⚠️ Part of a stack created by spr. Do not merge manually using the UI - doing so may have unexpected results.

@jucor jucor changed the title Speed up regression tests [Stack 8/17] Speed up regression tests Mar 30, 2026
@jucor jucor force-pushed the spr/edge/f39f3218 branch from 591e196 to 510205d Compare March 30, 2026 22:39
@jucor jucor force-pushed the spr/edge/6ae3ee43 branch from 4ad6046 to 603f0ac Compare March 30, 2026 22:47
@jucor jucor force-pushed the spr/edge/f39f3218 branch from 510205d to b1ebec4 Compare March 30, 2026 22:47
## Summary


- Default `benchmark=False` in `compare_with_golden()` — benchmark mode ran the pipeline 3x for timing statistics, unnecessary for correctness checks. The `regression_comparer.py` script already had `--benchmark` as opt-in, so this aligns the default.
- Add `skip_intermediate_stages` parameter to `compute_all_stages()` — `test_conversation_regression` now skips stages 1-4 (empty, load-only, PCA-only, PCA+clustering) since it only checks `overall_match`. `test_conversation_stages_individually` still runs all stages for granular failure detection.

### Measured speedup on one of the large private test conversations

| Test | Before | After | Speedup |
|------|--------|-------|---------|
| `test_conversation_regression` | 317s | 23s | **13.9x** |
| `test_conversation_stages_individually` | 60s | 32s | **1.9x** |

The regression test's ~14x speedup comes from two combined effects: no longer running the pipeline 3x (benchmark), and skipping 4 redundant intermediate stages.

## Test plan

- [x] All 9 public regression tests pass (vw + biodiversity)
- [x] Private dataset tests pass (`--include-local`)
- [x] Timing verified on large private dataset

🤖 Generated with [Claude Code](https://claude.com/claude-code)


## Squashed commits

- Address Copilot review: fix stale terminology, hardcoded blob_type, and synthetic test tid range
- Speed up regression tests: disable benchmark, skip intermediate stages

commit-id:f39f3218
@jucor jucor force-pushed the spr/edge/f39f3218 branch from b1ebec4 to 8637399 Compare March 31, 2026 00:35
@jucor jucor force-pushed the spr/edge/6ae3ee43 branch from 603f0ac to b9dcc89 Compare March 31, 2026 00:35
@github-actions
Copy link
Copy Markdown

Delphi Coverage Report

File Stmts Miss Cover
init.py 2 0 100%
benchmarks/bench_pca.py 76 76 0%
benchmarks/bench_repness.py 81 81 0%
benchmarks/bench_update_votes.py 38 38 0%
benchmarks/benchmark_utils.py 34 34 0%
components/init.py 1 0 100%
components/config.py 165 133 19%
conversation/init.py 2 0 100%
conversation/conversation.py 1117 328 71%
conversation/manager.py 131 42 68%
database/init.py 1 0 100%
database/dynamodb.py 387 234 40%
database/postgres.py 305 205 33%
pca_kmeans_rep/init.py 5 0 100%
pca_kmeans_rep/clusters.py 257 22 91%
pca_kmeans_rep/corr.py 98 17 83%
pca_kmeans_rep/pca.py 52 16 69%
pca_kmeans_rep/repness.py 361 51 86%
pca_kmeans_rep/stats.py 107 22 79%
regression/init.py 4 0 100%
regression/clojure_comparer.py 188 17 91%
regression/comparer.py 887 720 19%
regression/datasets.py 135 27 80%
regression/recorder.py 36 27 25%
regression/utils.py 138 119 14%
run_math_pipeline.py 260 114 56%
umap_narrative/500_generate_embedding_umap_cluster.py 210 109 48%
umap_narrative/501_calculate_comment_extremity.py 112 54 52%
umap_narrative/502_calculate_priorities.py 135 135 0%
umap_narrative/700_datamapplot_for_layer.py 502 502 0%
umap_narrative/701_static_datamapplot_for_layer.py 310 310 0%
umap_narrative/702_consensus_divisive_datamapplot.py 432 432 0%
umap_narrative/801_narrative_report_batch.py 785 785 0%
umap_narrative/802_process_batch_results.py 265 265 0%
umap_narrative/803_check_batch_status.py 175 175 0%
umap_narrative/llm_factory_constructor/init.py 2 2 0%
umap_narrative/llm_factory_constructor/model_provider.py 157 157 0%
umap_narrative/polismath_commentgraph/init.py 1 0 100%
umap_narrative/polismath_commentgraph/cli.py 270 270 0%
umap_narrative/polismath_commentgraph/core/init.py 3 3 0%
umap_narrative/polismath_commentgraph/core/clustering.py 108 108 0%
umap_narrative/polismath_commentgraph/core/embedding.py 104 104 0%
umap_narrative/polismath_commentgraph/lambda_handler.py 219 219 0%
umap_narrative/polismath_commentgraph/schemas/init.py 2 0 100%
umap_narrative/polismath_commentgraph/schemas/dynamo_models.py 160 9 94%
umap_narrative/polismath_commentgraph/tests/conftest.py 17 17 0%
umap_narrative/polismath_commentgraph/tests/test_clustering.py 74 74 0%
umap_narrative/polismath_commentgraph/tests/test_embedding.py 55 55 0%
umap_narrative/polismath_commentgraph/tests/test_storage.py 87 87 0%
umap_narrative/polismath_commentgraph/utils/init.py 3 0 100%
umap_narrative/polismath_commentgraph/utils/converter.py 283 237 16%
umap_narrative/polismath_commentgraph/utils/group_data.py 354 336 5%
umap_narrative/polismath_commentgraph/utils/storage.py 584 477 18%
umap_narrative/reset_conversation.py 159 50 69%
umap_narrative/run_pipeline.py 453 312 31%
utils/general.py 62 41 34%
Total 10951 7648 30%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant