⭐ If THEMIS sparked ideas about fine-tuning LLMs on domain-specific law — a star helps other researchers find it. Takes 2 seconds.
████████╗██╗ ██╗███████╗███╗ ███╗██╗███████╗
╚══██╔══╝██║ ██║██╔════╝████╗ ████║██║██╔════╝
██║ ███████║█████╗ ██╔████╔██║██║███████╗
██║ ██╔══██║██╔══╝ ██║╚██╔╝██║██║╚════██║
██║ ██║ ██║███████╗██║ ╚═╝ ██║██║███████║
╚═╝ ╚═╝ ╚═╝╚══════╝╚═╝ ╚═╝╚═╝╚══════╝
The Parametric Legal Intelligence Engine for Indian Law
"Not retrieval. Not lookup. Pure legal reasoning, baked into weights."
HuggingFace:
THEMIS is a domain-specific large language model fine-tuned on Indian statutory law. It is not a retrieval system, a search engine, or a chatbot wrapper around an existing API. It is a parametric knowledge model — one where legal understanding of the Bharatiya Nyaya Sanhita (BNS), the Indian Penal Code (IPC), the Bharatiya Nagarik Suraksha Sanhita (BNSS), and allied statutes is baked directly into the model weights through supervised fine-tuning.
Where HECTOR retrieves — THEMIS reasons.
v2 post-mortem: Scaled to 20,909 training pairs, but 3 epochs caused overfitting. Loss dropped to 0.06-0.08 (memorization territory). Model regurgitated training artifacts instead of reasoning. Fixed in v3 by reducing to 2 epochs.
What v2 achieved:
- ✅ Domain grounding fixed — no more "Bangladesh National Standards" hallucination
- ✅ Correct section identification (e.g., Section 303 for theft)
- ✅ 10x data scale from v1 (1,939 → 20,909 pairs)
What v2 broke:
- ❌ Overfitting — loss 0.06-0.08 indicates memorization, not learning
- ❌ Regurgitation — model recited definitions verbatim instead of answering the question
- ❌ Repetition loops — disclaimer text repeated 2x, cut off at token limit
- ❌ No checkpoint saving — intermediate checkpoints lost when Kaggle session ended
Root cause: 3 epochs on 20k examples is too many. The model memorized surface patterns (statute text blocks, disclaimer boilerplate) rather than learning to reason about what's being asked.
v3 fix: Reduced epochs from 3 to 2. See notebooks/THEMIS_v3_Training.ipynb.
THEMIS v3 is designed to match the data depth of production medical RAG systems — comparable to the 90,000+ clinical records in Ella.
Target: 50,000–90,000 training pairs covering:
| Legal Domain | Target Pairs | Sources |
|---|---|---|
| BNS 2023 — Criminal Law | 15,000 | India Code full text, section-by-section Q&A |
| IPC 1860 — Legacy Criminal Law | 10,000 | India Code, comparative IPC↔BNS mapping |
| BNSS 2023 — Criminal Procedure | 8,000 | India Code full text |
| BSA 2023 — Evidence Act | 5,000 | India Code full text |
| Consumer Protection Act 2019 | 6,000 | India Code + NCDRC judgment summaries |
| RTI Act 2005 | 3,000 | India Code + CIC decisions |
| Indian Contract Act 1872 | 5,000 | India Code full text |
| Transfer of Property Act 1882 | 4,000 | India Code full text |
| Supreme Court landmark judgments | 10,000 | Indian Kanoon — top 500 judgments parsed |
| IPC → BNS transition mapping | 8,000 | Section-level comparison pairs |
| Total | 74,000 |
At this scale, THEMIS becomes a model that has genuinely read Indian law — not a model that learned to sound like a lawyer.
What changed:
- ✅ Reduced epochs from 3 to 2 (fixes memorization)
- ✅ Added checkpoint saving every 500 steps (keep last 3)
- ✅ Added conversational test questions to eval set (15 new rephrased queries)
- ✅ Detailed training notebook with step-by-step instructions
v3 notebook: notebooks/THEMIS_v3_Training.ipynb
Target: 50,000–90,000 pairs | LoRA rank 32 | Sequence 2,048 | A100 (Colab Pro or RunPod)
- Full India Code corpus ingestion (all central acts)
- Indian Kanoon top 1,000 judgment summaries
- IPC → BNS complete transition mapping (all 511 sections)
- Hindi language support (bilingual fine-tune)
- RAGAS-style evaluation harness with citation F1 scoring
- Systematic hallucination rate measurement
- Publish v3 adapter to HuggingFace with full model card
Success criteria: Citation accuracy >85% on held-out eval set. Hallucination rate <10% on factual section number queries.
The long-term architecture unifies THEMIS (parametric reasoning) with HECTOR (retrieval grounding):
User Query
│
▼
┌─────────────────────────────────────┐
│ Query Classifier │
│ "Parametric or retrieval?" │
└──────────────┬──────────────────────┘
│
┌───────┴───────┐
▼ ▼
┌─────────┐ ┌─────────┐
│ THEMIS │ │ HECTOR │
│ (reason)│ │(retrieve│
│ │ │+ verify)│
└────┬────┘ └────┬────┘
└───────┬───────┘
▼
Unified Legal Response
with citations + reasoning
THEMIS handles citizen-level Q&A with parametric reasoning. HECTOR handles deep legal research requiring source-level PDF citations. A unified router dispatches based on query complexity.
themis/
├── cli.py # Rich-powered CLI entry point
├── infer.py # Model loading and inference engine
├── config.py # Model path, generation params, device config
├── eval/
│ ├── run_eval.py # Evaluation harness
│ ├── metrics.py # Citation accuracy, refusal rate, ROUGE-L
│ └── eval_set.json # Ground truth evaluation dataset
├── data/
│ ├── scraper/
│ │ ├── kanoon.py # Indian Kanoon judgment scraper
│ │ └── indiacode.py # India Code Bare Acts parser
│ ├── synthetic/
│ │ └── generate.py # Claude-assisted Q&A pair generation
│ ├── preprocess.py # Cleaning, deduplication, formatting
│ └── dataset.json # Training dataset (v1: 1,939 pairs)
├── training/
│ ├── finetune.py # Unsloth + LoRA training script
│ ├── config.yaml # LoRA hyperparameters
│ └── push_to_hub.py # HuggingFace Hub upload
└── model/ # Local model weights (gitignored)
| Layer | Technology | Purpose |
|---|---|---|
| Base Model | Mistral 7B Instruct v0.3 | Foundation — strong instruction following |
| Fine-tuning Method | LoRA (Low-Rank Adaptation) | Parameter-efficient training |
| Training Framework | Unsloth | 2x faster LoRA, VRAM optimized |
| Training Platform | Kaggle free T4 (v1) → RunPod A100 (v3) | Compute |
| Dataset Format | Alpaca instruction tuning | Standard SFT format |
| Data Sources | India Code + Indian Kanoon + Synthetic | Scraping + generation |
| Synthetic Generation | Claude API | Q&A pair generation from Bare Acts |
| CLI | Typer + Rich | Terminal interface |
| Inference | HuggingFace Transformers + PEFT | LoRA adapter loading |
| Evaluation | Custom harness + citation F1 | Quality measurement |
| Model Hosting | HuggingFace Hub | Public model access |
Generated from India Code Bare Acts using Claude API for synthetic Q&A pair generation. Format:
{
"instruction": "What does Section 303 of the Bharatiya Nyaya Sanhita say about theft?",
"input": "",
"output": "Section 303 of the Bharatiya Nyaya Sanhita (BNS) 2023 defines theft as..."
}Expanded to 10x data covering BNS, IPC, BNSS, BSA, CPA, RTI Act. Includes:
- 15 template question categories
- IPC → BNS section mappings (200+)
- Abbreviation disambiguation pairs (21)
- Conversational rephrased questions (added for v3 eval)
Full India Code corpus + Indian Kanoon judgment summaries + complete IPC→BNS transition mapping. At this scale, the dataset size matches the clinical corpus depth of production medical AI systems.
base_model: unsloth/mistral-7b-instruct-v0.3-bnb-4bit
lora_r: 8
lora_alpha: 16
target_modules: [q_proj, v_proj]
lora_dropout: 0
epochs: 3
batch_size: 1
gradient_accumulation: 8
learning_rate: 2e-4
max_seq_length: 512
platform: Kaggle T4 (free)
training_pairs: 1,939# v2 used 3 epochs → overfitting (loss 0.06-0.08)
# v3 fixed by reducing to 2 epochs
lora_r: 16
lora_alpha: 32
target_modules: [q_proj, k_proj, v_proj, o_proj]
lora_dropout: 0.05
epochs: 2 # KEY CHANGE: 2 instead of 3
batch_size: 1
gradient_accumulation: 8
learning_rate: 2e-4
max_seq_length: 1024
save_steps: 500 # Checkpoint every 500 steps
save_total_limit: 3 # Keep last 3 checkpoints
platform: Kaggle T4 (free)
training_pairs: 20,909lora_r: 32
target_modules: [q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj]
max_seq_length: 2048
training_pairs: 74,000
platform: RunPod A100 (40GB)THEMIS uses a 3-tier evaluation system:
Tier 1 — Citation Accuracy Does the response cite the correct section number? Target: >85% on criminal law queries by v3.
Tier 2 — Hallucination Rate Does the model fabricate section numbers or act names? Target: <10% hallucination rate by v3. Current v1 rate: ~60% on BNS-specific queries (abbreviation confusion).
Tier 3 — Refusal Rate Does the model correctly decline out-of-scope queries? Target: >95% correct refusal on state-specific law queries.
BNS 2023 abbreviation confusion— Fixed with 20k training pairsSection number hallucination— Model now identifies correct sections
- Overfitting risk still exists — monitor loss during training
- No case law knowledge — statutes only
- English only
- State-specific laws not covered
- Best used as orientation, not as authoritative legal reference
India has 1.4 billion people. Fewer than 2 million are lawyers. The gap between legal literacy and legal need is enormous. THEMIS is a step toward making statutory law accessible to anyone — not as a replacement for lawyers, but as a first layer of orientation that helps people understand what laws exist, what they say, and what options they have.
At 90,000 training pairs, a model can genuinely know Indian law. That is the goal.
| THEMIS | HECTOR | |
|---|---|---|
| Architecture | Parametric fine-tune (LoRA) | RAG (Qdrant + Chain-of-Verification) |
| Knowledge | Model weights | External vector database |
| Runtime documents | Not needed | Required |
| Best for | Citizen Q&A | Deep legal research |
| Citations | Parametric (may hallucinate) | Source-grounded (verified) |
| Status | v1 trained, v3 in progress | Production-ready |
MIT License
@misc{themis2026,
author = {Daniel Deshmukh},
title = {THEMIS: Parametric Legal Intelligence Engine for Indian Law},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/Daniel2503/themis-mistral-7b-lora}
}THEMIS — Greek goddess of law, justice, and order. Because justice should not require a law degree to understand.