A language model built from biological neurons — not transformers.
Real Sam uses Leaky Integrate-and-Fire (LIF) spiking neurons, curriculum learning, and environment-driven plasticity to learn language the way a brain does: through binary spikes, temporal dynamics, and sparse computation.
| Phase | Seq Length | Val Loss | Perplexity |
|---|---|---|---|
| Words | 8 | 2.64 | 14 |
| Phrases | 16 | 2.55 | 13 |
| Sentences | 32 | 2.48 | 12 |
| Stories | 64 | 2.40 | 11 |
6M parameters. ~10% firing rate. Trained on a single GTX 1050 Ti.
Token → STE Spike Encoder → Linear Projection (256 → 512)
→ 6x Environment Spiking Blocks (LIF + diversity + residual)
→ Weight-Tied Readout → Next Token
Each token becomes a binary spike vector via the Straight-Through Estimator. Six layers of LIF neurons with learnable decay process the sequence recurrently — membrane potential carries temporal context, spikes are binary {0, 1}. The readout layer reuses the embedding matrix transpose, saving 60%+ parameters.
Generation is O(1) per token. No attention. No KV cache. Just spiking state.
Curriculum Learning — Data complexity increases in phases (words, phrases, sentences, stories, conversations), mimicking infant language development. Phase transitions are automatic, triggered by loss convergence.
Shared Environment — One global stress signal modulates all neurons simultaneously, like cortisol in the bloodstream. High loss = stressed environment = neurons explore more. Inspired by Cortical Labs' DishBrain and the Free Energy Principle.
Neuron Diversity — Each neuron has a fixed "personality" sampled at initialization (LogNormal diversity factor), simulating biological receptor density. Same environment, different responses. Sensitive explorers and resilient anchors.
Firing Rate Regularization — Neurons maintain ~10% sparse firing through a loss penalty, not threshold manipulation. Backprop naturally discovers efficient sparse codes, just like biological cortex.
git clone https://github.com/nakaiwilliams/real-sam.git
cd real-sam
pip install -r requirements.txtpython -m src.large_data --data-dir data --vocab 4096This downloads TinyStories, Alpaca, Dolly, and OpenAssistant data, trains a BPE tokenizer, and caches everything locally.
python src/train.py --mode v4 --epochs 80 --batch-size 32 --grad-accum 4Training runs on CUDA, Apple MPS, or CPU. A GTX 1050 Ti (4GB VRAM) handles batch_size=32 comfortably.
python src/train.py --mode v4 --resume --epochs 80python -m src.chat --checkpoint checkpoints/real-sam-v4.ptpython src/generate.py --checkpoint checkpoints/real-sam-v4.pt --prompt "Once upon a time"src/
neurons.py LIF neuron implementations (V1-V4)
encoder.py STE spike encoder (tokens → binary spikes)
network.py Full model architectures (RealSam V1-V4)
train.py Training loop with curriculum learning
data.py BPE tokenizer and dataset utilities
curriculum_data.py Multi-phase curriculum data pipeline
chat.py Interactive chat interface
generate.py Text generation
spiking_ner.py Spiking NER model (for PII detection)
train_spiking_ner.py NER training pipeline
docs/
index.html Project landing page
checkpoints/ Model checkpoints (not in git — train or download)
data/ Training data (not in git — download via script)
| Version | Params | Key Feature | Notes |
|---|---|---|---|
| V1 | ~1M | Basic LIF + recurrence | Character-level Shakespeare |
| V2 | ~3M | Residual blocks + LayerNorm | BPE tokenizer, conversation data |
| V3 | ~6M | Homeostatic thresholds | Per-neuron adaptive thresholds (deprecated) |
| V4 | ~6M | Environment + diversity | Shared stress signal, neuron personalities |
- Python 3.9+
- PyTorch 2.0+
- snnTorch 0.7+
- tokenizers
- datasets (HuggingFace)
- tqdm, numpy, matplotlib
Real Sam processes language through spiking dynamics:
-
Encoding: Each BPE token is embedded and passed through a sigmoid + threshold to produce a binary spike vector. The Straight-Through Estimator provides gradients for backpropagation.
-
Processing: Six stacked spiking blocks process the sequence one token at a time. Each block has:
- A feedforward path (
fc_in) - A recurrent path from previous spikes (
fc_rec) - A LIF neuron with learnable decay (beta)
- A gated residual connection
- A feedforward path (
-
Environment: A shared stress signal, computed from the training loss, modulates all neurons' gain. Each neuron's response is scaled by its fixed diversity factor — some neurons are sensitive explorers, others are resilient anchors.
-
Readout: The output projects back to embedding space and multiplies by the transposed embedding matrix (weight tying). This produces next-token logits without a separate vocabulary projection.
-
Curriculum: Training data complexity increases automatically through phases. The model learns words before phrases, phrases before sentences, and so on — just like a child.
Transformers are brilliant. But they're not how brains work.
Biological neurons communicate through binary spikes — discrete events in time. Information is encoded in when neurons fire, not in continuous activation values. This is fundamentally more efficient: most neurons are silent most of the time.
Real Sam explores whether this principle can work for language. It's not trying to beat GPT-4. It's asking: what if we built language models the way evolution built brains?
The answer, so far: 6 million spiking neurons can learn grammar, narrative structure, and basic conversation. Not perfectly. But they do it with ~10% of neurons active at any time, O(1) generation per token, and no attention mechanism at all.
MIT. See LICENSE.
Built by Nakai Williams. Powered by spikes, not attention.