feat: Reproduce Boag et al. 2018 medical mistrust pipeline in PyHealth#964
Open
vtewari2 wants to merge 1 commit intosunlabuiuc:masterfrom
Open
feat: Reproduce Boag et al. 2018 medical mistrust pipeline in PyHealth#964vtewari2 wants to merge 1 commit intosunlabuiuc:masterfrom
vtewari2 wants to merge 1 commit intosunlabuiuc:masterfrom
Conversation
Complete implementation of the interpersonal-feature mistrust classifiers
from "Racial Disparities and Mistrust in End-of-Life Care" (MLHC 2018,
arXiv:1808.03827) using PyHealth. Consolidates all changes from:
pr/uiuccs598dlh/logistic-regression/l1-regularization
pr/uiuccs598dlh/mistrust-tasks/interpersonal-features-mimic3
pr/uiuccs598dlh/paper-pipeline/eol-mistrust-boag-2018
pyhealth/models/logistic_regression.py
- Add l1_lambda (float, default 0.0) to LogisticRegression
- In forward(): loss += l1_lambda * ||fc.weight||_1 when l1_lambda > 0
- Equivalent to sklearn LogisticRegression(penalty='l1', C=C) with
l1_lambda = 1 / (C * n_train)
pyhealth/tasks/mistrust_mimic3.py [new]
- build_interpersonal_itemids(): reads D_ITEMS.csv.gz, returns
{itemid: label} for ~168 interpersonal CHARTEVENTS items
- MistrustNoncomplianceMIMIC3: sequence task predicting noncompliance
label from NOTEEVENTS; base rate 0.88% in MIMIC-III v1.4
- MistrustAutopsyMIMIC3: sequence task predicting autopsy consent;
ambiguous admissions excluded; Black consent rate 39% vs White 26%
- Full feature normalisation from trust.ipynb cell 7
pyhealth/tasks/__init__.py
- Export MistrustNoncomplianceMIMIC3, MistrustAutopsyMIMIC3,
build_interpersonal_itemids
examples/mistrust_prediction/mistrust_mimic3_logistic_regression.py [new]
- End-to-end pipeline: MIMIC3Dataset -> set_task -> LogisticRegression
with L1 -> Trainer -> AUC-ROC evaluation
- --synthetic flag for smoke-test without PhysioNet access
- Expected AUC: noncompliance 0.667, autopsy 0.531
Co-Authored-By: Varun Tewari <vtewari2@illinois.edu>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description:
Summary
This PR is a consolidated reference branch containing the complete
implementation of the computational medical mistrust pipeline from:
It consolidates all changes from three focused PRs into a single reviewable
branch for reference and integration testing:
pr/uiuccs598dlh/logistic-regression/l1-regularizationpr/uiuccs598dlh/mistrust-tasks/interpersonal-features-mimic3pr/uiuccs598dlh/paper-pipeline/eol-mistrust-boag-2018Background
Boag et al. 2018 demonstrates that racial disparities in aggressive ICU
end-of-life care — Black patients receiving ~879 min more mechanical
ventilation than White patients (p=0.009) — are better explained by
medical mistrust than by race alone. Mistrust stratification amplifies
the ventilation gap to ~2,319 min (3×). The paper trains L1-regularised
logistic regression on structured interpersonal interaction features
extracted from CHARTEVENTS to produce continuous mistrust proxy scores.
This PR brings that methodology natively into the PyHealth framework.
Changes
1.
pyhealth/models/logistic_regression.pyAdd optional
l1_lambda: float = 0.0parameter (fully backward-compatible).When non-zero, appends a sparsity-inducing L1 penalty to the training loss:
loss = BCE(logits, y_true) + l1_lambda * ‖fc.weight‖₁
Equivalent to
sklearn LogisticRegression(penalty='l1', C=C)withl1_lambda = 1 / (C × n_train).2.
pyhealth/tasks/mistrust_mimic3.py(new)build_interpersonal_itemids(d_items_path)Reads
D_ITEMS.csv.gzand returns{itemid: label}for ~168 CHARTEVENTSitems matched via interpersonal keyword list from the paper's
trust.ipynb.MistrustNoncomplianceMIMIC3input_schema = {"interpersonal_features": "sequence"}output_schema = {"noncompliance": "binary"}1if any NOTEEVENTS note contains"noncompliant", else0MistrustAutopsyMIMIC3input_schema = {"interpersonal_features": "sequence"}output_schema = {"autopsy_consent": "binary"}1for autopsy consent (mistrust),0for decline (trust)Both tasks apply full feature normalisation from
trust.ipynbcell 7(restraint coarsening, bath categories, skip rules for pain subtypes).
Feature keys take the form
"category||normalised_value"and aretokenised automatically by PyHealth during
set_task().3.
pyhealth/tasks/__init__.pyExports
MistrustNoncomplianceMIMIC3,MistrustAutopsyMIMIC3,build_interpersonal_itemids.4.
examples/mistrust_prediction/mistrust_mimic3_logistic_regression.py(new)End-to-end pipeline reproducing the paper's classifier experiments:
MIMIC3Dataset(CHARTEVENTS + NOTEEVENTS)
→ build_interpersonal_itemids()
→ MistrustNoncomplianceMIMIC3 / MistrustAutopsyMIMIC3
→ LogisticRegression(l1_lambda=...)
→ Trainer → AUC-ROC
Includes
--syntheticflag for smoke-test without PhysioNet access.Expected results (MIMIC-III v1.4)
Usage