Skip to content

[Stack 2/17] Add SKIP_GOLDEN env var to disable golden snapshot tests#2509

Open
jucor wants to merge 1 commit intospr/edge/1a8eb714from
spr/edge/d39cf65d
Open

[Stack 2/17] Add SKIP_GOLDEN env var to disable golden snapshot tests#2509
jucor wants to merge 1 commit intospr/edge/1a8eb714from
spr/edge/d39cf65d

Conversation

@jucor
Copy link
Copy Markdown
Collaborator

@jucor jucor commented Mar 30, 2026

Summary

Add SKIP_GOLDEN=1 environment variable to disable golden snapshot regression tests.

During stacked PR development, golden snapshots become stale as computation changes cascade through the stack. Rather than re-recording snapshots at every rebase (which causes conflict cascades in jj/git), we skip them until the stack is merged into edge.

Changes

  • test_regression.py: Add @_skip_golden decorator to test_conversation_regression and test_conversation_stages_individually — the only two tests that compare against golden snapshots. Other dataset-using tests (Clojure comparison, smoke tests) are unaffected.
  • python-ci.yml: Set SKIP_GOLDEN=1 in CI so the stacked PRs don't fail on stale snapshots.

Usage

SKIP_GOLDEN=1 pytest tests/          # skip golden snapshot tests
pytest tests/                         # run everything (default)

Test plan

  • SKIP_GOLDEN=1 pytest tests/test_regression.py -v: 4 skipped, 5 passed
  • pytest tests/test_regression.py -v: all 9 collected (golden tests run normally)

Squashed commits

  • Add SKIP_GOLDEN env var to disable golden snapshot regression tests

commit-id:d39cf65d


Stack:


⚠️ Part of a stack created by spr. Do not merge manually using the UI - doing so may have unexpected results.

@jucor jucor changed the title Add SKIP_GOLDEN env var to disable golden snapshot tests [Stack 2/17] Add SKIP_GOLDEN env var to disable golden snapshot tests Mar 30, 2026
@jucor jucor force-pushed the spr/edge/d39cf65d branch from 8ca05ce to c6e1d7d Compare March 30, 2026 22:39
## Summary


Add `SKIP_GOLDEN=1` environment variable to disable golden snapshot regression tests.

During stacked PR development, golden snapshots become stale as computation changes cascade through the stack. Rather than re-recording snapshots at every rebase (which causes conflict cascades in jj/git), we skip them until the stack is merged into `edge`.

### Changes

- **`test_regression.py`**: Add `@_skip_golden` decorator to `test_conversation_regression` and `test_conversation_stages_individually` — the only two tests that compare against golden snapshots. Other dataset-using tests (Clojure comparison, smoke tests) are unaffected.
- **`python-ci.yml`**: Set `SKIP_GOLDEN=1` in CI so the stacked PRs don't fail on stale snapshots.

### Usage

```bash
SKIP_GOLDEN=1 pytest tests/          # skip golden snapshot tests
pytest tests/                         # run everything (default)
```

## Test plan

- [x] `SKIP_GOLDEN=1 pytest tests/test_regression.py -v`: 4 skipped, 5 passed
- [x] `pytest tests/test_regression.py -v`: all 9 collected (golden tests run normally)


## Squashed commits

- Add SKIP_GOLDEN env var to disable golden snapshot regression tests

commit-id:d39cf65d
@jucor jucor force-pushed the spr/edge/1a8eb714 branch from 7f16641 to b02e443 Compare March 30, 2026 22:47
@jucor jucor force-pushed the spr/edge/d39cf65d branch from c6e1d7d to 47e9569 Compare March 30, 2026 22:47
@github-actions
Copy link
Copy Markdown

Delphi Coverage Report

File Stmts Miss Cover
init.py 2 0 100%
benchmarks/bench_pca.py 76 76 0%
benchmarks/bench_repness.py 81 81 0%
benchmarks/bench_update_votes.py 38 38 0%
benchmarks/benchmark_utils.py 34 34 0%
components/init.py 1 0 100%
components/config.py 165 133 19%
conversation/init.py 2 0 100%
conversation/conversation.py 1118 336 70%
conversation/manager.py 131 42 68%
database/init.py 1 0 100%
database/dynamodb.py 387 233 40%
database/postgres.py 305 205 33%
pca_kmeans_rep/init.py 5 0 100%
pca_kmeans_rep/clusters.py 257 22 91%
pca_kmeans_rep/corr.py 98 17 83%
pca_kmeans_rep/pca.py 52 16 69%
pca_kmeans_rep/repness.py 361 48 87%
pca_kmeans_rep/stats.py 107 22 79%
regression/init.py 4 0 100%
regression/clojure_comparer.py 188 17 91%
regression/comparer.py 887 720 19%
regression/datasets.py 103 22 79%
regression/recorder.py 36 27 25%
regression/utils.py 137 118 14%
run_math_pipeline.py 260 114 56%
umap_narrative/500_generate_embedding_umap_cluster.py 210 109 48%
umap_narrative/501_calculate_comment_extremity.py 112 54 52%
umap_narrative/502_calculate_priorities.py 135 135 0%
umap_narrative/700_datamapplot_for_layer.py 502 502 0%
umap_narrative/701_static_datamapplot_for_layer.py 310 310 0%
umap_narrative/702_consensus_divisive_datamapplot.py 432 432 0%
umap_narrative/801_narrative_report_batch.py 785 785 0%
umap_narrative/802_process_batch_results.py 265 265 0%
umap_narrative/803_check_batch_status.py 175 175 0%
umap_narrative/llm_factory_constructor/init.py 2 2 0%
umap_narrative/llm_factory_constructor/model_provider.py 157 157 0%
umap_narrative/polismath_commentgraph/init.py 1 0 100%
umap_narrative/polismath_commentgraph/cli.py 270 270 0%
umap_narrative/polismath_commentgraph/core/init.py 3 3 0%
umap_narrative/polismath_commentgraph/core/clustering.py 108 108 0%
umap_narrative/polismath_commentgraph/core/embedding.py 104 104 0%
umap_narrative/polismath_commentgraph/lambda_handler.py 219 219 0%
umap_narrative/polismath_commentgraph/schemas/init.py 2 0 100%
umap_narrative/polismath_commentgraph/schemas/dynamo_models.py 160 9 94%
umap_narrative/polismath_commentgraph/tests/conftest.py 17 17 0%
umap_narrative/polismath_commentgraph/tests/test_clustering.py 74 74 0%
umap_narrative/polismath_commentgraph/tests/test_embedding.py 55 55 0%
umap_narrative/polismath_commentgraph/tests/test_storage.py 87 87 0%
umap_narrative/polismath_commentgraph/utils/init.py 3 0 100%
umap_narrative/polismath_commentgraph/utils/converter.py 283 237 16%
umap_narrative/polismath_commentgraph/utils/group_data.py 354 336 5%
umap_narrative/polismath_commentgraph/utils/storage.py 584 477 18%
umap_narrative/reset_conversation.py 159 50 69%
umap_narrative/run_pipeline.py 453 312 31%
utils/general.py 62 41 34%
Total 10919 7646 30%

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-out switch for golden snapshot regression tests to make stacked PR development smoother by avoiding frequent snapshot re-recording while intermediate computation changes are in flight.

Changes:

  • Add a SKIP_GOLDEN=1-controlled pytest skip marker and apply it to the two golden snapshot comparison tests.
  • Set SKIP_GOLDEN=1 in the GitHub Actions Python CI workflow so the stack doesn’t fail on stale golden snapshots.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
delphi/tests/test_regression.py Introduces a skipif marker driven by SKIP_GOLDEN and applies it to the golden snapshot tests.
.github/workflows/python-ci.yml Exports SKIP_GOLDEN=1 into the test container environment during CI pytest execution.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 92 to 98
-e AWS_ACCESS_KEY_ID=dummy \
-e AWS_SECRET_ACCESS_KEY=dummy \
-e POSTGRES_HOST=postgres \
-e POSTGRES_PASSWORD=PdwPNS2mDN73Vfbc \
-e POSTGRES_DB=polis-test \
-e SKIP_GOLDEN=1 \
delphi \
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SKIP_GOLDEN=1 is being set unconditionally for the CI pytest run, which means the golden snapshot regression tests will never run in CI for any PR or push to edge/stable. That reduces regression detection significantly. Consider gating this env var to only the stacked PR branches (e.g., spr/**) or adding a separate job/workflow (push-to-edge or scheduled) that runs the golden snapshot tests without SKIP_GOLDEN so coverage isn’t lost long-term.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants