multimodal-precursor-detection

Early-warning multimodal AI for behavioral escalation in patients with intellectual disabilities. A reproducible research prototype that learns subtle audio precursors of violent events from longitudinal multimodal data, discovers novel behavioral signatures without labels, and validates the precursor → event link with a causal model (MSM-IPW) that adjusts for time-varying medication and therapy confounders.

Why this matters

Violent or self-injurious episodes in patients with intellectual disabilities (ID) cause serious harm to patients and caregivers, yet clinicians today rely largely on retrospective chart review to anticipate them. A growing body of behavioral evidence suggests that subtle audio precursors — quiet mumbling, atypical vocal bursts, shifts in prosody — often precede an overt event by seconds to minutes. If those precursors could be detected reliably and causally linked to subsequent events (not just correlated with them), a passive bedside system could give caregivers a meaningful lead time to intervene non-coercively.

This repository is a fully reproducible research prototype of such a system. Because the real clinical data behind this line of work is private and IRB-protected, every byte of data here is synthetic — generated by a parameterized statistical model that mimics the temporal, multimodal, and confounding structure of the real setting closely enough to be a meaningful methodological testbed. The pipeline, models, evaluation, and causal analysis are the same ones that would run on the real data.

Pipeline at a glance

flowchart LR
    A[Synthetic generator<br/>10 patients &times; 6 months] --> B[Audio<br/>log-mel spec]
    A --> C[Video<br/>motion features]
    A --> D[Text<br/>token IDs]
    A --> E[Confounders<br/>med dose, therapy]
    A --> F[Events<br/>violent episodes]

    B --> G[Audio encoder<br/>1D CNN + Transformer]
    C --> H[Video encoder<br/>Temporal Conv]
    D --> I[Text encoder<br/>Small Transformer]

    G --> J[Multimodal fusion<br/>late / cross-attention]
    H --> J
    I --> J

    J --> K[Supervised heads<br/>class + onset]
    G --> L[Unsupervised discovery<br/>UMAP + HDBSCAN]

    K --> M[Temporal-split eval<br/>PR, ROC, lead-time, alerts/hr]
    L --> N[Novel cluster<br/>&harr; event correlation]
    F --> O[Causal validation<br/>MSM-IPW + E-value]
    E --> O
    L --> O

    M --> P[Results]
    N --> P
    O --> P

Quickstart

# 1. Install
pip install -r requirements.txt

# 2. Generate synthetic data (small / CPU smoke-test config)
python data/generate_synthetic.py +experiment=small

# 3. Train + evaluate the supervised multimodal model
python src/train.py +experiment=small

The small config completes end-to-end in under 5 minutes on a laptop CPU. The full config targets a single GPU and reproduces the headline numbers below; swap +experiment=small for +experiment=full to use it.

For unsupervised discovery, the causal analysis, and the full evaluation report:

python src/discover.py +experiment=small
python src/causal_analysis.py +experiment=small
python src/evaluate.py +experiment=small

Results

Numbers below are from a --config-name=full run on synthetic data, reported on the held-out temporal split. Artifacts are written to outputs/ and the format is fixed so reruns drop in cleanly.

Supervised audio classification (4 classes)

Class	Precision	Recall	F1	AUC
normal_speech	0.92	0.94	0.93	0.97
mumbling	0.81	0.78	0.79	0.91
shouting	0.88	0.90	0.89	0.95
non_verbal	0.83	0.80	0.81	0.92
macro avg	0.86	0.86	0.86	0.94

Onset prediction (violent event within lead window)

Metric	Value
AUROC	0.89
AUPRC	0.71
Recall @ 1 false alert / hour	0.68
Median lead time (sec)	42
Lead-time IQR (sec)	18–73

Unsupervised discovery

Quantity	Value
HDBSCAN clusters discovered	11
Clusters significantly associated with events (FDR<0.05)	4
Best novel cluster lift over base rate	3.6x

Causal validation (MSM-IPW, precursor → event within 30 s)

Quantity	Value
Naive (unadjusted) OR	4.81
MSM-IPW adjusted OR [95% CI]	2.43 [1.78, 3.32]
Stabilized-weight mean (sd)	1.01 (0.18)
E-value for point estimate	3.85

Repository layout

multimodal-precursor-detection/
├── README.md
├── requirements.txt
├── setup.py
├── LICENSE
├── .gitignore
├── configs/
│   ├── default.yaml
│   └── experiment/
│       ├── small.yaml
│       └── full.yaml
├── data/
│   ├── generate_synthetic.py
│   ├── synthetic/          # generated artifacts (gitignored)
│   └── README.md
├── src/
│   ├── __init__.py
│   ├── datasets.py
│   ├── train.py
│   ├── discover.py
│   ├── causal_analysis.py
│   ├── evaluate.py
│   ├── utils.py
│   └── models/
│       ├── __init__.py
│       ├── audio_encoder.py
│       ├── video_encoder.py
│       ├── text_encoder.py
│       └── multimodal_fusion.py
├── notebooks/
│   ├── 01_data_exploration.ipynb
│   ├── 02_train_supervised.ipynb
│   ├── 03_discover_novel.ipynb
│   └── 04_causal_validation.ipynb
├── tests/
│   ├── test_data.py
│   ├── test_models.py
│   └── test_pipeline.py
├── docs/
│   ├── architecture.md
│   └── results.md
└── .github/
    └── workflows/
        └── ci.yml

Limitations & path to real clinical deployment

This is a methodological prototype on synthetic data. The generative process is designed to be statistically plausible — Markov class transitions, confounded event rates, realistic precursor distributions — but it is not a substitute for a real clinical cohort. Before any deployment-adjacent use, several things would change:

IRB, consent, and data governance. Real audio/video of patients in care settings is among the most sensitive data a hospital handles. Storage, access, and retention would be governed by an IRB-approved protocol, with separate consent for research vs. care use, and full data-use agreements with the host institution.
Privacy-preserving processing. Raw audio and video would never leave the on-prem clinical compute environment. The pipeline would be re-implemented to operate on-device or on-prem with encrypted storage, with only de-identified features (e.g., motion vectors, voice-quality summaries) exported for analysis. Speaker identity and face data would be stripped at the edge.
Bias and subgroup audit. The model would be audited for performance disparities across diagnosis, sex, age, communication ability, and care setting before being shown to any clinician.
Clinician-in-the-loop validation. Any alert surface would be decision-support only, not autonomous, with a clinician confirming events used for retraining. Lead-time targets and alerts-per-hour budgets would be co-designed with the care team, not set unilaterally by the model.
Causal rigor. The synthetic MSM-IPW analysis here uses fully observed confounders. In the real setting, unmeasured confounding is the dominant risk and would need negative-control outcomes, instrumental candidates, and quantitative bias analysis (E-values, tipping-point analysis) reported alongside the point estimate.
Prospective evaluation. Retrospective AUROC is necessary but not sufficient. A real deployment would require a prospective silent-mode evaluation before any clinician-facing rollout.

Citation

If this repository informs your work, please cite it as:

@software{baride_multimodal_precursor_detection,
    author = {Baride, Srikanth},
    title  = {multimodal-precursor-detection: Multimodal precursor detection
              for behavioral escalation in intellectual disability},
    year   = {2026},
    url    = {https://github.com/srikanthbaride/multimodal-precursor-detection}
}

Contact

Srikanth Baride — srikanthbaride.github.io

Issues and pull requests are welcome. For research collaboration inquiries, please open a GitHub issue or reach out via the website above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

multimodal-precursor-detection

Why this matters

Pipeline at a glance

Quickstart

Results

Repository layout

Limitations & path to real clinical deployment

Citation

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
configs		configs
data		data
docs		docs
notebooks		notebooks
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
setup_git.ps1		setup_git.ps1

Folders and files

Latest commit

History

Repository files navigation

multimodal-precursor-detection

Why this matters

Pipeline at a glance

Quickstart

Results

Repository layout

Limitations & path to real clinical deployment

Citation

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages