Skip to content

[Stack 11/17] Fix D9: z-score thresholds from two-tailed to one-tailed#2518

Open
jucor wants to merge 1 commit intospr/edge/b9062b50from
spr/edge/0194003d
Open

[Stack 11/17] Fix D9: z-score thresholds from two-tailed to one-tailed#2518
jucor wants to merge 1 commit intospr/edge/b9062b50from
spr/edge/0194003d

Conversation

@jucor
Copy link
Copy Markdown
Collaborator

@jucor jucor commented Mar 30, 2026

Summary

  • Fix D9: change z-score significance thresholds from two-tailed to one-tailed, matching Clojure's stats.clj
  • Z_90: 1.645 → 1.2816, Z_95: 1.96 → 1.6449
  • Also resolves an internal inconsistency — Python's own stats.py already used the correct one-tailed values

Why one-tailed?

The proportion tests in Polis check whether a comment's agree (or disagree) rate is significantly above 0.5 — a directional hypothesis. One-tailed is correct because we only care about one direction at a time. The two-tailed values were 28% more conservative, causing fewer comments to pass significance.

Test plan

  • TDD: removed xfail from 3 D9 tests, confirmed red (3 failures), applied fix, confirmed green
  • Discrepancy tests: 63 passed, 6 skipped, 50 xfailed (all 7 datasets including private)
  • Regression tests: 19 passed (all 7 datasets, golden snapshots re-recorded)
  • Repness unit tests: 36 passed (boundary values updated to match new thresholds)
  • 4 pre-existing failures unrelated to D9 (PCA incremental blobs, DB-dependent tests)

🤖 Generated with Claude Code

Squashed commits

  • Plan: add task parallelization analysis for remaining fixes
  • Fix D9: match Clojure z-sig semantics (strict >, no abs) and remove dead stats.py
  • Re-record vw golden snapshot after D9 z-sig semantics change
  • Update plan: mark D9 as done, note stats.py removal for next PR
  • Add mathematical rigor and exhaustive testing guidance to fix plan
  • Plan: move PR 14 earlier (prerequisite for blob tests) + add handoff doc
  • Re-record golden snapshots after upstream cascade

commit-id:0194003d


Stack:


⚠️ Part of a stack created by spr. Do not merge manually using the UI - doing so may have unexpected results.

@jucor jucor changed the title Fix D9: z-score thresholds from two-tailed to one-tailed [Stack 11/17] Fix D9: z-score thresholds from two-tailed to one-tailed Mar 30, 2026
@jucor jucor force-pushed the spr/edge/0194003d branch 2 times, most recently from 1452e78 to 24de40d Compare March 30, 2026 22:47
## Summary


- Fix D9: change z-score significance thresholds from two-tailed to one-tailed, matching Clojure's `stats.clj`
- `Z_90`: 1.645 → 1.2816, `Z_95`: 1.96 → 1.6449
- Also resolves an internal inconsistency — Python's own `stats.py` already used the correct one-tailed values

## Why one-tailed?

The proportion tests in Polis check whether a comment's agree (or disagree) rate is **significantly above 0.5** — a directional hypothesis. One-tailed is correct because we only care about one direction at a time. The two-tailed values were 28% more conservative, causing fewer comments to pass significance.

## Test plan

- [x] TDD: removed xfail from 3 D9 tests, confirmed red (3 failures), applied fix, confirmed green
- [x] Discrepancy tests: 63 passed, 6 skipped, 50 xfailed (all 7 datasets including private)
- [x] Regression tests: 19 passed (all 7 datasets, golden snapshots re-recorded)
- [x] Repness unit tests: 36 passed (boundary values updated to match new thresholds)
- [x] 4 pre-existing failures unrelated to D9 (PCA incremental blobs, DB-dependent tests)

🤖 Generated with [Claude Code](https://claude.com/claude-code)


## Squashed commits

- Plan: add task parallelization analysis for remaining fixes
- Fix D9: match Clojure z-sig semantics (strict >, no abs) and remove dead stats.py
- Re-record vw golden snapshot after D9 z-sig semantics change
- Update plan: mark D9 as done, note stats.py removal for next PR
- Add mathematical rigor and exhaustive testing guidance to fix plan
- Plan: move PR 14 earlier (prerequisite for blob tests) + add handoff doc
- Re-record golden snapshots after upstream cascade

commit-id:0194003d
@jucor jucor force-pushed the spr/edge/0194003d branch from 24de40d to add1343 Compare March 31, 2026 00:35
@github-actions
Copy link
Copy Markdown

Delphi Coverage Report

File Stmts Miss Cover
init.py 2 0 100%
benchmarks/bench_pca.py 76 76 0%
benchmarks/bench_repness.py 81 81 0%
benchmarks/bench_update_votes.py 38 38 0%
benchmarks/benchmark_utils.py 34 34 0%
components/init.py 1 0 100%
components/config.py 165 133 19%
conversation/init.py 2 0 100%
conversation/conversation.py 1107 320 71%
conversation/manager.py 131 42 68%
database/init.py 1 0 100%
database/dynamodb.py 387 234 40%
database/postgres.py 305 205 33%
pca_kmeans_rep/init.py 5 0 100%
pca_kmeans_rep/clusters.py 257 22 91%
pca_kmeans_rep/corr.py 98 17 83%
pca_kmeans_rep/pca.py 52 16 69%
pca_kmeans_rep/repness.py 297 43 86%
regression/init.py 4 0 100%
regression/clojure_comparer.py 188 17 91%
regression/comparer.py 887 720 19%
regression/datasets.py 135 27 80%
regression/recorder.py 36 27 25%
regression/utils.py 138 94 32%
run_math_pipeline.py 260 114 56%
umap_narrative/500_generate_embedding_umap_cluster.py 210 109 48%
umap_narrative/501_calculate_comment_extremity.py 112 53 53%
umap_narrative/502_calculate_priorities.py 135 135 0%
umap_narrative/700_datamapplot_for_layer.py 502 502 0%
umap_narrative/701_static_datamapplot_for_layer.py 310 310 0%
umap_narrative/702_consensus_divisive_datamapplot.py 432 432 0%
umap_narrative/801_narrative_report_batch.py 785 785 0%
umap_narrative/802_process_batch_results.py 265 265 0%
umap_narrative/803_check_batch_status.py 175 175 0%
umap_narrative/llm_factory_constructor/init.py 2 2 0%
umap_narrative/llm_factory_constructor/model_provider.py 157 157 0%
umap_narrative/polismath_commentgraph/init.py 1 0 100%
umap_narrative/polismath_commentgraph/cli.py 270 270 0%
umap_narrative/polismath_commentgraph/core/init.py 3 3 0%
umap_narrative/polismath_commentgraph/core/clustering.py 108 108 0%
umap_narrative/polismath_commentgraph/core/embedding.py 104 104 0%
umap_narrative/polismath_commentgraph/lambda_handler.py 219 219 0%
umap_narrative/polismath_commentgraph/schemas/init.py 2 0 100%
umap_narrative/polismath_commentgraph/schemas/dynamo_models.py 160 9 94%
umap_narrative/polismath_commentgraph/tests/conftest.py 17 17 0%
umap_narrative/polismath_commentgraph/tests/test_clustering.py 74 74 0%
umap_narrative/polismath_commentgraph/tests/test_embedding.py 55 55 0%
umap_narrative/polismath_commentgraph/tests/test_storage.py 87 87 0%
umap_narrative/polismath_commentgraph/utils/init.py 3 0 100%
umap_narrative/polismath_commentgraph/utils/converter.py 283 237 16%
umap_narrative/polismath_commentgraph/utils/group_data.py 354 336 5%
umap_narrative/polismath_commentgraph/utils/storage.py 584 518 11%
umap_narrative/reset_conversation.py 159 50 69%
umap_narrative/run_pipeline.py 453 312 31%
utils/general.py 62 41 34%
Total 10770 7625 29%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant