Kansas Legislature roll call vote scraper and Bayesian analysis platform. Scrapes kslegislature.gov into structured CSV files, then runs a 27-phase statistical pipeline covering IRT ideal points, network analysis, clustering, time series, and more.
Coverage: 2011-2026 (84th-91st Legislatures)
# Install uv (if you don't have it)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone and set up
git clone https://github.com/codechizel/tallgrass.git
cd tallgrass
uv sync
# Scrape the current session
uv run tallgrass 2025
# Run the full analysis pipeline
just pipeline 2025-26- Python 3.14+ — install via
uv python install 3.14 - uv — package manager
- Just — command runner (optional but recommended)
- R — only required for Phase 16 (W-NOMINATE/OC) and Phase 19 (TSA enrichment). Install
wnominate,pscl,oc,changepoint,strucchangefrom CRAN.
uv run tallgrass 2025 # current session (2025-26)
uv run tallgrass 2023 # historical session (2023-24)
uv run tallgrass 2024 --special # special session
uv run tallgrass --merge-special 2020 # merge 2020 special into parent biennium
uv run tallgrass --merge-special all # merge all 5 specials
uv run tallgrass --list-sessions # show all available sessions
uv run tallgrass 2025 --clear-cache # re-fetch everythingFive CSV files per session in data/kansas/{legislature}_{start}-{end}/:
| File | Contents |
|---|---|
*_votes.csv |
One row per legislator per roll call |
*_rollcalls.csv |
One row per roll call (bill, motion, result, tallies) |
*_legislators.csv |
One row per legislator (name, party, district, chamber) |
*_bill_actions.csv |
One row per bill lifecycle action (89th+ only) |
*_bill_texts.csv |
One row per bill document (via tallgrass-text) |
29 phases covering descriptive statistics, dimensionality reduction, Bayesian modeling, network analysis, prediction, bill text NLP, cross-session validation, and cross-temporal ideal point alignment.
just pipeline 2025-26 # run all 25 single-biennium phases
just eda # single phase
just irt --n-samples 4000 # with custom arguments| # | Phase | Method |
|---|---|---|
| 01 | EDA | Descriptive statistics, vote matrix, missingness |
| 02 | PCA | Principal component analysis |
| 03 | MCA | Multiple correspondence analysis |
| 04 | UMAP | Nonlinear dimensionality reduction |
| 05 | IRT | 1D Bayesian ideal points (PyMC + nutpie) |
| 06 | 2D IRT | 2D Bayesian IRT with PLT identification |
| 07 | Hierarchical | Hierarchical IRT with partial pooling |
| 08 | PPC | Posterior predictive checks + LOO-CV model comparison |
| 09 | Clustering | Hierarchical, k-means, GMM |
| 10 | LCA | Latent class analysis (StepMix) |
| 11 | Network | Co-voting network + community detection |
| 12 | Bipartite | Bill-legislator bipartite network |
| 13 | Indices | Rice, party unity, ENP, maverick scores |
| 14 | Beta-Binomial | Bayesian party loyalty shrinkage |
| 15 | Prediction | Vote prediction (logistic + XGBoost + SHAP) |
| 16 | W-NOMINATE | W-NOMINATE + Optimal Classification (R) |
| 17 | External Validation | Shor-McCarty score correlation |
| 18 | DIME | DIME/CFscore campaign-finance validation |
| 19 | TSA | Time series analysis + changepoint detection |
| 20 | Bill Text | BERTopic topic modeling + NLP analysis |
| 21 | TBIP | Text-based ideal points |
| 22 | Issue IRT | Issue-specific ideal points (topic-stratified) |
| 23 | Model Legislation | ALEC + cross-state bill matching |
| 24 | Synthesis | Narrative report joining all phases |
| 25 | Profiles | Per-legislator deep-dive reports |
| 26 | Cross-Session | Cross-biennium legislator matching + shift |
| 27 | Dynamic IRT | Martin-Quinn state-space IRT across bienniums |
Each phase produces an HTML report with tables, figures, and plain-English interpretation. Reports are written to results/kansas/{session}/{run_id}/{phase}/.
just check # lint + typecheck + tests (quality gate)
just test # run all ~2664 tests
just test-fast # skip slow/integration tests
just lint # ruff check --fix + ruff format
just typecheck # ty check src/ + ty check analysis/src/tallgrass/ # Scraper package (config, session, models, scraper, output, CLI)
analysis/ # 27 numbered phase subdirectories + shared infrastructure
tests/ # ~2664 pytest tests (scraper + all analysis phases)
docs/ # Deep dives, ADRs, field surveys, primers
data/ # Scraped CSV output + external validation data (gitignored)
results/ # HTML reports + parquet intermediates (gitignored)
- Analysis primer — plain-English guide for general audiences
- How IRT works — general-audience explanation of Bayesian ideal points
- Architecture decisions — 96 ADRs documenting design choices
- Design docs — per-phase methodology and implementation
- Roadmap — completed phases, backlog, rejected methods
- Shor-McCarty scores — auto-downloaded from Harvard Dataverse on first use
- DIME/CFscores — must be manually downloaded from Stanford DIME project (144 MB, ODC-BY license). Place at
data/external/dime_recipients_1979_2024.csv