[Stack 20/27] Fix D15: match Clojure moderation handling (zero out columns, don't remove)#2452
Conversation
There was a problem hiding this comment.
Pull request overview
Aligns Python’s moderation handling with Clojure’s behavior by preserving the vote-matrix column structure when comments are moderated out (zeroing moderated columns rather than removing them), ensuring downstream math (PCA/repness) sees consistent dimensions/column indices.
Changes:
- Update
_apply_moderation()to drop moderated-out participants (rows) but zero-out moderated-out comment columns (instead of removing them). - Update/add tests to expect preserved columns and validate moderated columns are zeroed.
- Mark D15 as complete in discrepancy tracking docs/journal.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
delphi/polismath/conversation/conversation.py |
Change moderation application to zero moderated-out columns while preserving matrix shape. |
delphi/tests/test_discrepancy_fixes.py |
Update D2c expectation for column counts; add/extend D15 parity + synthetic moderation tests. |
delphi/tests/test_conversation.py |
Update moderation-related assertions to expect kept-but-zeroed comment columns. |
delphi/docs/PLAN_DISCREPANCY_FIXES.md |
Mark D15 as DONE and adjust next-step notes. |
delphi/docs/CLJ-PARITY-FIXES-JOURNAL.md |
Add journal entry documenting D15 work and update “What’s Next”. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| # Zero out moderated-out comments (keep columns, set values to 0) | ||
| # Clojure: (matrix/set-column m' i 0) — zeroes the column | ||
| mod_cols = [c for c in self.mod_out_tids if c in self.rating_mat.columns] |
| # Moderated columns should be all zeros (not NaN, not original values) | ||
| for tid in mod_out: | ||
| if tid in mod_conv.rating_mat.columns: | ||
| col_values = mod_conv.rating_mat[tid].values | ||
| check.is_true( | ||
| np.all(col_values == 0.0), | ||
| f"Moderated tid {tid} should be all zeros, " | ||
| f"got non-zero values: {col_values[col_values != 0.0][:5]}" | ||
| ) | ||
|
|
||
| def test_tids_include_moderated(self, conv, clojure_blob, dataset_name): | ||
| """The tids output should include moderated-out comments (matching Clojure).""" | ||
| mod_out = clojure_blob.get('mod-out') or [] | ||
| if not mod_out: | ||
| pytest.skip(f"[{dataset_name}] No moderated comments in this dataset") | ||
|
|
||
| mod_conv = conv.update_moderation( | ||
| {'mod_out_tids': mod_out}, | ||
| recompute=False, | ||
| ) | ||
|
|
||
| # rating_mat.columns (used for tids output) should include moderated tids | ||
| for tid in mod_out: | ||
| if tid in mod_conv.raw_rating_mat.columns: | ||
| check.is_in( | ||
| tid, set(mod_conv.rating_mat.columns), |
| n_cols_raw = len(mod_conv.raw_rating_mat.columns) | ||
| n_tids_clojure = len(clojure_blob.get('tids', [])) | ||
|
|
||
| print(f"[{dataset_name}] Moderated comments: {len(mod_out)}") | ||
| print(f"[{dataset_name}] Python matrix columns: {n_cols_python}, Clojure tids: {n_tids_clojure}") | ||
| print(f"[{dataset_name}] mod-out: {len(mod_out)}") | ||
| print(f"[{dataset_name}] Python rating_mat cols: {n_cols_python}, raw cols: {n_cols_raw}") | ||
| print(f"[{dataset_name}] Clojure tids: {n_tids_clojure}") |
d4e2154 to
decac1a
Compare
39fc7ef to
c0684d4
Compare
decac1a to
7611c85
Compare
c0684d4 to
42d9f25
Compare
7611c85 to
f51d33f
Compare
42d9f25 to
b68c7e5
Compare
f51d33f to
19a64ef
Compare
b68c7e5 to
c2e521d
Compare
19a64ef to
29283da
Compare
c2e521d to
b8a8e08
Compare
29283da to
9149555
Compare
b8a8e08 to
0154ce7
Compare
9149555 to
2b015fd
Compare
0154ce7 to
a313f1c
Compare
2b015fd to
176da37
Compare
a313f1c to
6a2ac55
Compare
176da37 to
13fedd6
Compare
6a2ac55 to
90dd355
Compare
13fedd6 to
39928d2
Compare
90dd355 to
1d05e99
Compare
4e68f68 to
7313bc0
Compare
4ebd5ab to
baceacd
Compare
7313bc0 to
d8d3f24
Compare
baceacd to
74b31de
Compare
d8d3f24 to
2ce0b36
Compare
74b31de to
c4f5811
Compare
2ce0b36 to
19e36d8
Compare
c4f5811 to
2960412
Compare
42a795a to
9bd7794
Compare
44b04ae to
b24d69b
Compare
9bd7794 to
04a595b
Compare
04a595b to
f7102c6
Compare
b24d69b to
abfbacb
Compare
f7102c6 to
4e5f139
Compare
abfbacb to
3847f76
Compare
Delphi Coverage Report
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
3847f76 to
f7062f8
Compare
4e5f139 to
20c419f
Compare
…emove) Python's _apply_moderation() removed moderated-out comment columns entirely from rating_mat. Clojure's zero-out-columns (named_matrix.clj:214-230) sets all values in moderated columns to 0, preserving matrix structure. Change _apply_moderation() to zero out moderated columns instead of removing them, so that: - rating_mat retains the same column count as raw_rating_mat - tids output includes moderated-out tids (matching Clojure) - Matrix dimensions are preserved through the pipeline Moderated-out participants (rows) are still removed — unchanged. Zeroed columns have no signal (na=0, nd=0), so they fail all significance tests and are effectively excluded from repness/consensus/PCA, but their presence preserves index stability for downstream consumers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
f7062f8 to
f7329b4
Compare
|
Superseded by spr-managed PR stack. See the new stack starting at #2508. |
Summary
Python's
_apply_moderation()removed moderated-out comment columns entirelyfrom
rating_mat. Clojure'szero-out-columns(named_matrix.clj:214-230) setsall values in moderated columns to 0, preserving matrix structure.
This fix changes Python to match:
rating_matretains the same column count asraw_rating_matWhy zeroing matters
rating-mathas the same shape asraw-rating-mat.Downstream code (PCA, repness) processes the same-shaped matrix.
significance tests and are excluded from repness/consensus. PCA sees zero variance.
Changes
conversation.py:_apply_moderation()— zero out columns instead of removingtest_discrepancy_fixes.py: 5 new synthetic tests + 2 enhanced real-data teststest_conversation.py: Updated to expect zeroed columnsTest plan
🤖 Generated with Claude Code