[Stack 12/17] Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount)#2519
Open
jucor wants to merge 1 commit intospr/edge/0194003dfrom
Open
[Stack 12/17] Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount)#2519jucor wants to merge 1 commit intospr/edge/0194003dfrom
jucor wants to merge 1 commit intospr/edge/0194003dfrom
Conversation
This was referenced Mar 30, 2026
Open
cd39374 to
a387b9e
Compare
…eudocount) ## Summary Replace Python's standard one-proportion z-test `prop_test(p, n, p0)` with Clojure's Wilson-score-like formula `prop_test(succ, n)` from `stats.clj:10-15`: ``` 2 * sqrt(n+1) * ((succ+1)/(n+1) - 0.5) ``` The Clojure formula has a built-in +1 pseudocount (Laplace smoothing / Beta(1,1) prior) that regularizes extreme values for small Polis groups. This is separate from the `PSEUDO_COUNT=2.0` used for `pa`/`pd` estimation (Beta(2,2) prior): - `pa = (na + 1) / (ns + 2)` — Beta(2,2) prior for probability estimation - `pat = 2 * sqrt(ns+1) * ((na+1)/(ns+1) - 0.5)` — Beta(1,1) prior for significance testing **What changed in the output**: `pat`, `pdt` values (proportion test z-scores), and downstream `agree_metric` / `disagree_metric` values. The z-scores are now slightly different due to `sqrt(n+1)` vs `sqrt(n)` and `(succ+1)/(n+1)` vs `(na+1)/(n+2)` denominators. ## Changes - `repness.py`: `prop_test(p, n, p0)` → `prop_test(succ, n)` with Clojure formula - `repness.py`: `prop_test_vectorized(p, n, p0)` → `prop_test_vectorized(succ, n)` - `repness.py`: Callers updated to pass raw counts `(na, ns)` instead of `(pa, ns, 0.5)` - `test_discrepancy_fixes.py`: Removed xfail from D5 formula test, added 8 test cases + edge case - `test_repness_unit.py`, `test_old_format_repness.py`: Updated for new signature - Golden snapshots re-recorded for all datasets ## Test plan - [x] D5 formula tests pass (8 input pairs + edge cases) - [x] D5 Clojure blob consistency check passes (all datasets) - [x] Full test suite passes (public + private, 19/19 regression tests) - [x] Only pre-existing failure: pakistan-incremental D2 (unrelated) 🤖 Generated with [Claude Code](https://claude.com/claude-code) ## Squashed commits - RED: add D5 blob injection test (prop_test vs Clojure p-test values) - Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount) - Update plan and journal: mark D5 as done - Plan: add D5 PR number and stack position to cross-reference commit-id:48b77ba3
24de40d to
add1343
Compare
a387b9e to
956e3a8
Compare
Delphi Coverage Report
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replace Python's standard one-proportion z-test
prop_test(p, n, p0)withClojure's Wilson-score-like formula
prop_test(succ, n)fromstats.clj:10-15:The Clojure formula has a built-in +1 pseudocount (Laplace smoothing / Beta(1,1)
prior) that regularizes extreme values for small Polis groups. This is separate
from the
PSEUDO_COUNT=2.0used forpa/pdestimation (Beta(2,2) prior):pa = (na + 1) / (ns + 2)— Beta(2,2) prior for probability estimationpat = 2 * sqrt(ns+1) * ((na+1)/(ns+1) - 0.5)— Beta(1,1) prior for significance testingWhat changed in the output:
pat,pdtvalues (proportion test z-scores),and downstream
agree_metric/disagree_metricvalues. The z-scores arenow slightly different due to
sqrt(n+1)vssqrt(n)and(succ+1)/(n+1)vs(na+1)/(n+2)denominators.Changes
repness.py:prop_test(p, n, p0)→prop_test(succ, n)with Clojure formularepness.py:prop_test_vectorized(p, n, p0)→prop_test_vectorized(succ, n)repness.py: Callers updated to pass raw counts(na, ns)instead of(pa, ns, 0.5)test_discrepancy_fixes.py: Removed xfail from D5 formula test, added 8 test cases + edge casetest_repness_unit.py,test_old_format_repness.py: Updated for new signatureTest plan
🤖 Generated with Claude Code
Squashed commits
commit-id:48b77ba3
Stack: