Active-inference Training with Learned Adaptive Stigmergy
"Don't train on what humans wrote about the world.
Train on what you actually discover about the world.
Validate what you claim. Own what you build."
An open source contribution from OpenHub Research (Thailand)
Website: atlasagi.org · Author: Robin Dey · Institution: https://openhubresearch.org/
ATLAS is a next-generation LLM training framework built in pure Rust with zero external crate dependencies — the SQLite principle applied to AI infrastructure.
It fuses four architectural innovations:
| Component | Role | Key property |
|---|---|---|
| ASTRA-dev | Live discovery engine | ~10s/cycle, NASA/WHO/World Bank APIs, causal inference |
| GraphPalace | Stigmergic memory | Pheromone-guided curriculum, O(1/√T) convergence |
| TRM-CausalValidator | Recursive validator | 7M params, 0.1% compute, Quality Gate 6 |
| ZK Schnorr proofs | Provenance chain | LLM output → live API, cryptographically verifiable |
v4.0.0 — Champagnat n-morphic Framework + OLMo-3-7B Fix:
- 🧬 InvasionFitnessScorer — morphic fitness f(y) = success − cost − Σcos_sim·n̄ (fixes pheromone saturation)
- 🌊 CanonicalPheromoneUpdate — principled decay Δρ ∝ μ·σ²·n̄·∂₁s (Champagnat-Méléard 2011)
- ⚖️ BarBovier2017Constraints — stability gate: explore_ratio × batch_size > 10, temp > 1/√batch
- 🔀 CognitiveBranching — n-morphic OODA bifurcation on plateau detection
- 🔆 HJConcentrationPrior — Hopf-Cole sharpening T_eff(s) = T₀/(1+γs) in TRM recursion
- 🔧 Issue #7 fix — OLMo-3-7B SWA (24/32 sliding layers, window=4096) + YaRN RoPE + config.json auto-patch
v4.0.2 — BF16 GPU Inference Path (Issue #9):
- ⚡ BF16 W16A32 — weights in BF16 (14 GB) vs f32 (28 GB);
GpuBufBf16,GpuBufKind,upload_bf16()in atlas-tensor - 🔥 GEMV kernels —
sgemv_bf16_kernel+sgemv_f32_kernel: one-warp-per-row for N=1 decode; fixes 32× tiled-GEMM inefficiency - 🚀 OLMo-3-7B-Think: 4.1 → 19.9 tok/s (4.8× speedup, A100-SXM4-40GB, W16A32)
v4.0.3 — Math Integrity Fixes (Issue #11):
- 🧮
CanonicalPheromoneUpdateλ decay — replaced linear formulabase_rate × (1 − canonical_term)(went negative when term > 1, dead gradient at clamp boundary) withbase_rate × exp(−canonical_term): always positive, smooth, zero-gradient fidelity, hardware-safe for v6 ASIC spec - 🏆
InvasionFitnessScorercompetition kernel — fixed negative Lotka-Volterra coefficients: rawcosine_sim ∈ [−1, 1]was giving fitness bonuses to anti-correlated strategies (mutualism, not competition); replaced withα_ij = ReLU(cos_sim − 0.2)— threshold at 4σ above noise floor in d=384 embedding space;competition_thresholdadded toInvasionFitnessConfig - ✅ 532/532 tests (+4 new regression tests); GPU validated: 47/47 A100 model tests, OLMo-3-7B-Think still 19.9 tok/s
The result: a self-improving scientific intelligence that trains on what it actually discovers about the world — real causal relationships from live data, validated by recursive architecture, guided by stigmergic memory.
Nobody has built this before. See CHARTER.md for the full architecture.
Every other LLM is trained on:
- What humans wrote on the internet (web scrapes, Wikipedia)
- Synthetic data generated by another LLM (GPT-4 distillation)
- Human-curated datasets (expensive, frozen at curation time)
atlas-7b is trained on:
- What an autonomous science engine actually discovers about the world
- Real causal relationships extracted from live NASA, WHO, World Bank APIs
- Validated findings with Bayesian confidence scores and PC/FCI causal inference
- A corpus that grows every 10 seconds and never contains stale or duplicated information
This is not a better fine-tuning recipe. This is a different paradigm for what training data can be.
The SQLite principle applied to AI infrastructure.
atlas/
├── Cargo.toml # workspace root — [dependencies] is empty by design
├── kernels/
│ ├── matmul.cu # raw CUDA kernel (no cudarc crate)
│ ├── attention.cu # flash attention from scratch
│ └── quant.cu # INT4/INT8 quantization
└── crates/
├── atlas-core/ # error types, traits, config
├── atlas-tensor/ # Tensor + CUDA FFI (the seed of everything)
├── atlas-grad/ # autograd tape, backward pass
├── atlas-optim/ # AdamW, cosine LR scheduler
├── atlas-quant/ # INT4/INT8 quantization, QLoRA
├── atlas-model/ # transformer: MultiHeadAttn, FFN, RMSNorm, RoPE
├── atlas-tokenize/ # BPE tokenizer (sentencepiece port)
├── atlas-palace/ # GraphPalace stigmergic memory: A* search, 5-type pheromones, Active Inference
├── atlas-mcp/ # MCP server: 28 palace tools via JSON-RPC 2.0 stdio + connection pool
├── atlas-api/ # OpenAI-compatible HTTP endpoint: /v1/chat/completions, SSE streaming
├── atlas-trm/ # TRM-CausalValidator (7M params, arXiv:2510.04871)
├── atlas-causal/ # PC/FCI causal inference (py-causal port)
├── atlas-bayes/ # Bayesian confidence scoring
├── atlas-astra/ # ASTRA OODA engine (~8K LOC, full port)
├── atlas-corpus/ # LiveDiscoveryCorpus + DeepSupervisionTrainer + quality gates
├── atlas-zk/ # ZK Schnorr proofs (asi-build port)
├── atlas-http/ # HTTP client via raw libc syscalls
├── atlas-json/ # JSON parser from source
├── atlas-safety/ # Horn-clause safety constitution, 5-state FSM, CircuitBreaker
├── atlas-bridge/ # ZK-attested Rings↔ETH interface (Sepolia-compatible)
└── atlas-cli/ # CLI: train / discover / eval / prove / mcp / api / bench
21 crates. One coherent system. Zero external Rust dependencies.
CUDA is called via raw extern "C" FFI from build.rs + .cu kernel files — no cudarc, no tch, no candle. The same approach that makes SQLite trustworthy, applied to GPU compute.
// atlas-tensor/src/lib.rs — the first line of ATLAS
pub struct Tensor {
data: Vec<f32>,
shape: Vec<usize>,
}Every billion-parameter transformer starts here.
- GraphPalace Memory — pheromone-weighted persistent knowledge;
search_by_embedding(),hot_paths(),deposit_pheromones() - Morphic Warm-Start — O(1/√T) cross-run convergence (proven in BUTTERS, R²=0.982, p<10⁻³⁰)
- Stigmergic RLVR —
r_total = α·r_verifiable + β·r_pheromone; pheromone decay prevents reward hacking - Active Inference Data Gen — palace cold spots direct ASTRA to fill knowledge gaps
- ZK Knowledge Claims — Schnorr proof chain from LLM output to raw API data; hallucinations have broken proof trails
- LiveDiscoveryCorpus — ASTRA's output as a living training dataset; ~86K quality examples/month
- TRM-CausalValidator — 7M-param recursive validator;
z = net(x,y,z) × 6 recursions; Quality Gate 6; generates Type 5 training traces
ATLAS v4.0.0 delivers a fully GPU-resident forward pass — hidden states stay in VRAM between tokens, with pre-pinned weight upload at model load time.
| Model | Params | GPU tok/s | VRAM | Notes |
|---|---|---|---|---|
| SmolLM2-135M | 135M | 37.7 | 507 MiB | f32, sm_80 |
| SmolLM2-360M | 360M | 25.4 | ~1.4 GB | f32 |
| SmolLM2-1.7B | 1.7B | 12.6 | ~6.5 GB | f32, 2.4× over CPU |
| TinyLlama-1.1B | 1.1B | 20.9 | ~8.4 GB | f32 |
| OLMo-3-7B-Think | 7B | 19.9 | ~14 GB | BF16 W16A32 (v4.0.2+); was 4.1 tok/s CPU |
| Kernel | What it does |
|---|---|
rmsnorm_forward |
RMSNorm in CUDA — replaces per-token CPU loop |
rope_forward |
RoPE rotation — parallel over heads |
silu_mul_forward |
SwiGLU gate fused — single CUDA pass |
atlas_adamw_step |
AdamW optimizer step entirely on GPU |
sgemm_vec |
Zero-copy matrix×vector; GpuVec activation buffer |
CUDA portability: all kernels use rsqrtf() (not __rsqrtf()) for cross-platform compatibility.
ATLAS v4.0.0 adds atlas-api — an OpenAI-compatible HTTP inference server. Drop-in replacement for any OpenAI API client.
# Start the server
./target/release/atlas api serve --model /home/user/models/smollm2-135m --port 8080| Endpoint | Method | Description |
|---|---|---|
/v1/chat/completions |
POST | Chat completions with SSE streaming |
/v1/completions |
POST | Text completions |
/v1/models |
GET | List available models |
# Chat completion (streaming)
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "atlas",
"messages": [{"role": "user", "content": "What is morphic resonance?"}],
"stream": true
}'
# Non-streaming
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "atlas",
"messages": [{"role": "user", "content": "Explain stigmergic memory"}],
"stream": false
}'
# List models
curl http://localhost:8080/v1/modelsFeatures: SSE streaming, CORS headers, echo mode for testing, 40 tests, 0 external dependencies.
The DeepSupervisionTrainer in atlas-corpus implements multi-pass deep supervision — each training batch runs N_sup=4..16 forward passes, summing loss across all supervision points with pheromone-driven latent carry between passes.
// atlas-corpus/src/deep_supervision.rs
pub struct DeepSupervisionTrainer {
pub n_sup: usize, // 4..16 forward passes per batch
pub latent_carry: bool, // carry hidden state between passes
pub pheromone_weight: f32, // pheromone × mutation-selection coupling
pub loss_trace: Vec<f32>, // per-pass loss telemetry
}Theory (TRM arXiv:2510.04871 validated): deep supervision accounts for >75% of TRM's improvement over baseline transformers. The DeepSupervisionTrainer implements this in the training loop — each N_sup pass is one phenotypic morph; latent carry approximates Lotka-Volterra equilibrium n̄ᵢ; pheromone_weight sets the mutation-selection coupling μ.
Convergence prediction: doubling N_sup → √2 speedup in O(1/√T) morphic convergence. Testable via the loss_trace telemetry.
atlas-safety v4.0.0 adds a tractable safety constitution implemented as Horn clauses, alongside the existing 5-state FSM and CircuitBreaker.
8 safety principles across 4 non-overlapping domains:
┌─────────────────┬──────────────────────────────────┐
│ capability │ scope_limits, capability_bounds │
│ data │ provenance_required, dedup_gate │
│ deployment │ audit_trail, circuit_breaker │
│ reasoning │ causal_grounding, zk_verifiable │
└─────────────────┴──────────────────────────────────┘
Why Horn clauses? Young (2026, arXiv:2501.15446) proves NP-hardness of general safety constitution verification. Horn-clause restriction (≤12 principles, 4 non-overlapping domains) ensures polynomial tractability — the safety checker can verify any system state in O(n·m) where n = principles, m = state predicates. No exponential blowup.
atlas-palace v4.0.0 extracts a PalaceBackend trait, enabling pluggable storage backends without API changes:
pub trait PalaceBackend: Send + Sync {
fn search(&self, query: &str, limit: usize) -> Vec<DrawerMatch>;
fn deposit_pheromones(&mut self, path: &[RoomId], ptype: PheromoneType, intensity: f32);
fn navigate(&self, from: RoomId, to: RoomId) -> Vec<RoomId>;
fn hot_paths(&self, limit: usize) -> Vec<Path>;
// ... 32 additional methods
}
// Palace implements PalaceBackend — fully trait-object safe
pub struct Palace { /* existing implementation */ }
impl PalaceBackend for Palace { /* ... */ }
// Swap backends without changing caller code
let palace: Box<dyn PalaceBackend> = Box::new(Palace::new(config));This is the prerequisite for LadybugDB migration (Q3 2026) — a drop-in Grafeo/LadybugDB backend can replace the default implementation with zero API changes.
| Stage | Weeks | Crates | Milestone |
|---|---|---|---|
| 1 | 1–4 | atlas-core → tensor → grad → optim → quant | f32 matmul CPU+GPU, backward pass through 2-layer MLP |
| 2 | 5–7 | atlas-model → tokenize | OLMo 3 7B forward pass in pure Rust, token generation |
| 3 | 8–9 | atlas-palace + atlas-mcp | GraphPalace 36-method engine native, MCP server |
| 4 | 10–11 | atlas-trm | TRM-CausalValidator, <10ms causal graph pass/fail |
| 5 | 12–16 | http → json → bayes → causal → zk → astra | Full ASTRA OODA in Rust, ZK provenance |
| 6 | 17–20 | atlas-corpus + atlas-api | QLoRA SFT, DeepSupervisionTrainer, OpenAI API |
| 7 | 21–22 | atlas-zk (ext) → cli | End-to-end proof chain, atlas-7b release binary |
Eight publication-quality figures are in docs/dashboard/diagrams/.
The interactive dashboard (project overview, roadmap, papers, component status) is at docs/dashboard/index.html.
| Figure | Description |
|---|---|
| Fig. 1 | Full System Architecture (v3.0, TRM cluster) |
| Fig. 2 | Discovery Flywheel — the self-improving loop |
| Fig. 3 | ASTRA OODA + GraphPalace integration |
| Fig. 4 | Morphic Warm-Start cross-run convergence |
| Fig. 5 | Stigmergic RLVR pheromone reward function |
| Fig. 6 | ZK Provenance Chain |
| Fig. 7 | Training Pipeline phase roadmap |
| Fig. 8 | Hybrid Generative-Recursive Architecture (TRM v3.0) |
| Paper | Venue | Contribution |
|---|---|---|
| Paper 1 | EMNLP 2026 | ATLAS architecture + LiveDiscoveryCorpus |
| Paper 2 | NeurIPS 2026 | Discovery Flywheel — closed-loop scientific intelligence |
| Paper 3 | ICML 2027 | Stigmergic RLVR — pheromone reward prevents policy collapse |
| Paper 4 | ICLR 2027 | O(1/√T) morphic convergence for LLMs (co-author Robin Dey) |
| Paper 5 | IEEE S&P 2027 | End-to-end ZK provenance for LLM outputs |
| Paper 6 | ICLR/NeurIPS 2027 | Hybrid generative-recursive architecture (TRM integration) |
git clone https://github.com/web3guru888/ATLAS.git
cd ATLAS
# Run all tests (excludes CUDA-requiring tensor tests on CPU-only machines)
cargo test --workspace --exclude atlas-tensor
# Build the atlas binary
cargo build --release -p atlas-cli
# Full OODA discovery loop
./target/release/atlas discover --cycles 5 --output corpus.json
# Train on discoveries
./target/release/atlas train --corpus corpus.json --epochs 3
# Start OpenAI-compatible API server
./target/release/atlas api serve --model /path/to/model --port 8080
# ZK-prove a claim
./target/release/atlas prove --claim "Pheromone trails compound information gain" \
--secret $(openssl rand -hex 16)
# Inspect palace memory
./target/release/atlas palace --stats --hot
# MCP server (connect to Claude Desktop / Cursor)
./target/release/atlas mcp serve --palace my-palace.jsonPrerequisites:
- Rust 1.75+ (
rustup update stable) - CUDA 12.x + nvcc (optional; falls back to CPU if absent)
- GPU with sm_75+ (Tesla T4 / A100+) for CUDA training path
532/532 tests passing · 21 crates · Zero external crate dependencies · CUDA sm_80 on A100-SXM4-40GB · 19.9 tok/s OLMo-3-7B-Think (BF16)
🏔 v4.0.3 is the current release. Math integrity fixes:
CanonicalPheromoneUpdateexp decay (λ never negative) +InvasionFitnessScorerReLU competition threshold (no mutualism). GPU-validated: 532/532 workspace + 47/47 A100 model tests. OLMo-3-7B-Think 19.9 tok/s confirmed.
- ✅ Discovery is real —
atlas discover --cycles 3hits NASA POWER, WHO GHO, World Bank, ArXiv live APIs; causal inference via PC algorithm; Bayesian quality gates - ✅ Memory is real — 5-type pheromone system (exploitation/exploration/success/traversal/recency), MMAS ceiling, A* semantic pathfinding (α·C_sem + β·C_phe + γ·C_str), Active Inference agents;
atlas palace --hotshows pheromone trails - ✅ Training is real — SFT with GradTape + AdamW + LoRA (rank=8) + gradient accumulation + safetensors checkpoint; DeepSupervisionTrainer (N_sup=4..16, loss trace, latent carry)
- ✅ GPU inference is real — SmolLM2-135M at 37.7 tok/s on A100-SXM4-40GB; OLMo-3-7B-Think at 19.9 tok/s (BF16 GPU, W16A32, 14 GB VRAM — Issue #9 fixed); SWA + YaRN RoPE (Issue #7 fixed)
- ✅ API is real —
atlas api serveexposes/v1/chat/completions+/v1/completions+/v1/models; SSE streaming; CORS; 40 tests - ✅ Provenance is real — Schnorr proofs + Groth16 stub (HMAC-SHA256, BLS12-381-compatible interface) + ProvenanceChain;
atlas provegenerates verifiable proofs - ✅ Safety is real — Horn-clause constitution (8 principles, 4 domains, Young 2026 NP-hardness validated); 5-state FSM (
BOOT→NOMINAL→DEGRADED→SAFE_MODE→EMERGENCY_STOP); CircuitBreaker; append-only audit log - ✅ Bridge is real —
AtlasBridgewith ZK-attested deposit/withdraw, Sepolia chain_id=11155111, Groth16 proof per transaction - ✅ MCP is real —
atlas mcp serveexposes 28 tools via JSON-RPC 2.0; McpConnectionPool (max 5, 5-min idle eviction); connects to Claude Desktop / Cursor
| Version | Theme | Tests |
|---|---|---|
| v0.1.0 | Infrastructure: f32 matmul, backward pass, GPU (7 stages) | 186 |
| v0.2.0 | Real Memory Palace + MCP (28 tools, JSON-RPC 2.0) | 236 |
| v0.3.0 + v0.4.0 | Real Discovery Engine + Validated Model Loading | 260 |
| v0.5.0 | Real Training Loop (LoRA, grad-accum, safetensors checkpoint) | 353 |
| v0.6.0 | Safety FSM + Groth16 stub + ZK Bridge | 383 |
| v0.7.0 | Benchmarks, CI, CHANGELOG, REPRODUCIBILITY | 383 |
| v1.0.0 | Production Release — all milestones complete | 383 |
| v2.0.0 | CAS Decay + OODA Feedback + Stigmergic Sampler + GPU dispatch (37.7 tok/s on A100) | 400 |
| v3.0.0-α.1 | atlas-api + PalaceBackend + GPU-resident forward pass + DeepSupervisionTrainer + Horn-clause safety | 426 |
| v4.0.0 | Champagnat n-morphic framework + Issue #7 fix (SWA + YaRN RoPE + config.json auto-patch for OLMo-3-7B) | 528 |
| v4.0.1 | Docs + test cleanup for v4.0.0 / Issue #7 | 528 |
| v4.0.2 | BF16 GPU inference path (Issue #9): OLMo-3-7B-Think 4.1 → 19.9 tok/s (4.8×), W16A32, GEMV kernels | 528 |
| v4.0.3 | Math integrity (Issue #11): λ exp decay + ReLU competition threshold. 47/47 GPU model tests. | 532 |
| Crate | Stage | Tests | Status |
|---|---|---|---|
| atlas-core | 1 | 2 | ✅ Error types, Result, traits |
| atlas-tensor | 1 | 6 | ✅ CPU+GPU matmul, INT8/INT4, sm_80 kernels (A100); GPU AdamW kernel; sgemm_vec zero-copy; BF16 GEMV (GpuBufBf16, sgemv_bf16_kernel, W16A32 inference path) |
| atlas-grad | 1 | 9 | ✅ GradTape, matmul/relu/add backward |
| atlas-optim | 1 | 6 | ✅ AdamW + CosineScheduler, warmup |
| atlas-quant | 1 | 7 | ✅ INT8, INT4, symmetric scaling |
| CUDA kernels | 1 | — | ✅ tiled GEMM, rmsnorm, rope, silu_mul, AdamW, INT8/INT4 — compiled on A100-SXM4-40GB (sm_80) |
| atlas-json | 2 | 12 | ✅ Recursive descent parser, surrogate pairs |
| atlas-tokenize | 2 | 6 | ✅ GPT-2 byte-level BPE, tokenizer.json |
| atlas-model | 2 | 27 | ✅ OLMo 3 / Llama 3, RoPE, GQA, SwiGLU, SWA, YaRN RoPE, config.json auto-patch; GPU-resident forward pass |
| atlas-palace | 3 | 79 | ✅ A* search, 5-type pheromones, Active Inference, MMAS, PalaceBackend trait, session_id, PalaceConfig; v4.0.3: CanonicalPheromoneUpdate uses exp(−x) decay (always positive, smooth, hardware-safe) |
| atlas-mcp | 3 | 32 | ✅ 28 MCP tools, JSON-RPC 2.0, live palace dispatch; McpConnectionPool (max 5, 5-min idle eviction) |
| atlas-api | 3 | 40 | ✅ OpenAI-compatible HTTP: /v1/chat/completions, /v1/completions, /v1/models; SSE streaming; CORS |
| atlas-trm | 4 | 12 | ✅ TRM-CausalValidator depth-6 RNN, Bayesian combining |
| atlas-http | 5 | 11 | ✅ HTTP/1.1 TcpStream, chunked decoding, curl HTTPS |
| atlas-bayes | 5 | 13 | ✅ BetaPrior, BayesNetwork, QualityGate, Jaccard novelty |
| atlas-causal | 5 | 10 | ✅ PC algorithm, Fisher-Z, standard normal CDF, Meek rules |
| atlas-zk | 5 | 19 | ✅ Schnorr + Groth16 stub (HMAC-SHA256, BLS12-381 interface) |
| atlas-astra | 5 | 15 | ✅ OODA: NASA POWER / WHO GHO / World Bank / ArXiv; OodaFeedback adaptive explore_ratio |
| atlas-corpus | 6 | 79 | ✅ SftTrainer, LoRA (rank=8), grad-accum, safetensors checkpoint; DeepSupervisionTrainer (N_sup 4–16, loss_trace); v4.0.3: InvasionFitnessScorer uses ReLU(cos_sim − 0.2) competition (α_ij ≥ 0, no mutualism) |
| atlas-safety | 6 | 30 | ✅ Horn-clause constitution (8 principles, 4 domains); 5-state FSM; CircuitBreaker; append-only audit log |
| atlas-bridge | 6 | 8 | ✅ ZK-attested Rings↔ETH interface, Sepolia chain_id=11155111 |
| atlas-cli | 7 | 30 | ✅ discover / corpus / train / eval / prove / palace / mcp / api / bench / status |
| TOTAL | 532 | ✅ All passing — v4.0.3 |
git clone https://github.com/web3guru888/ATLAS.git
cd ATLAS
cargo build --release -p atlas-cli
# Full OODA discovery + training loop
./target/release/atlas discover --cycles 3 --output my-corpus.json
./target/release/atlas train --corpus my-corpus.json --epochs 2
./target/release/atlas prove --claim "CO2 drives warming" --secret deadbeef01020304
./target/release/atlas palace --stats --hot
# OpenAI-compatible API server
./target/release/atlas api serve --model /path/to/model --port 8080
# MCP server (connect to Claude Desktop / Cursor)
./target/release/atlas mcp serve --palace my-palace.json
# Run benchmarks
./target/release/atlas bench --allATLAS exposes its memory palace as 28 MCP tools via stdio JSON-RPC 2.0, ready for Claude Desktop, Cursor, or any MCP client. v4.0.0 adds McpConnectionPool — lazy pool (max 5 connections, 5-min idle eviction) preventing connection leaks across concurrent MCP clients.
# Add to your Claude Desktop config (~/.config/claude/claude_desktop_config.json)
{
"mcpServers": {
"atlas-palace": {
"command": "./target/release/atlas",
"args": ["mcp", "--palace", "my-palace.json"]
}
}
}Tool categories:
| Category | Tools | Examples |
|---|---|---|
| Navigation | 8 | palace_search, palace_navigate, palace_find_similar |
| Operations | 5 | palace_add_wing, palace_add_room, palace_add_drawer |
| Knowledge Graph | 7 | palace_kg_add, palace_kg_query, palace_kg_contradictions |
| Stigmergy | 5 | palace_deposit_pheromones, palace_hot_paths, palace_cold_spots |
| Agent Diary | 3 | palace_create_agent, palace_diary_write, palace_diary_read |
Every tool call modifies live palace state. Pheromone trails compound across sessions. Knowledge graphs grow with every interaction.
ATLAS includes a zero-dependency benchmark suite using atlas_core::bench::Bench. Run with:
cargo test --workspace --exclude atlas-tensor -- --ignored --nocaptureRepresentative results (Ubuntu, Rust 1.95, A100-SXM4-40GB, CUDA 12.9):
| Benchmark | Metric | Description |
|---|---|---|
gpu_inference_smollm2 |
37.7 tok/s | SmolLM2-135M GPU inference (f32), A100-SXM4-40GB |
gpu_benchmark_olmo3_7b_think_bf16 |
19.9 tok/s | OLMo-3-7B-Think BF16 GPU inference (W16A32), A100-SXM4-40GB |
palace_search_1000 |
~50–200 µs/op | TF-IDF semantic search across 1000 drawers |
astar_100_nodes |
~20–100 µs/op | Pheromone-guided A* pathfinding (100-node KG) |
pheromone_deposit_decay_1000 |
~5–20 µs/op | 10 deposits + full decay cycle per iteration |
kg_query_100_edges |
~0.5–2 µs/op | KG edge lookup from a source node |
rmsnorm_2048 |
~1–5 µs/op | RMSNorm on 2048-dim vector |
rope_128dim_apply |
~50–200 ns/op | RoPE rotation on a single attention head |
schnorr_prove_verify |
~200–500 ns/op | Schnorr ZK proof generation + verification |
json_parse_1kb |
~5–20 µs/op | Parse a 1KB JSON document (zero-dep parser) |
Note: Numbers vary by hardware. Run benchmarks on your own machine for accurate results.
- 37.7 tok/s — GPU inference throughput (SmolLM2-135M on A100-SXM4-40GB, v4.0.0)
- 19.9 tok/s — GPU inference throughput (OLMo-3-7B-Think, BF16 W16A32, A100-SXM4-40GB, v4.0.2; was 4.1 tok/s CPU = 4.8× speedup)
- 2.4× — GPU speedup over CPU inference (SmolLM2-1.7B: 12.6 vs 5.2 tok/s)
- 507 MiB — VRAM for pre-pinned SmolLM2-135M weights
- d = 10.6 — Cohen's d for palace-memory vs. no-memory (ASTRA experiments)
- 34.4× — more discoveries with memory than without
- R² = 0.982 — O(1/√T) convergence fit (BUTTERS morphic warm-start)
- 1.83× — cross-domain novelty acceleration (DC-24 experiment)
- 7M params — TRM-CausalValidator size vs. 7B base model (1000× smaller)
- 45% — TRM accuracy on ARC-AGI-1 (Samsung SAIL Montreal, arXiv:2510.04871)
- <10ms — target TRM validation latency per causal graph
- ~86K — quality-gated training examples per month from ASTRA
- 8 principles / 4 domains — Horn-clause safety constitution (Young 2026, arXiv:2501.15446)
ATLAS v4.0 implements the Champagnat n-Morphic Framework (Issue #6), grounded in Champagnat-Méléard 2011 (PTRF) and Baar-Bovier-Champagnat 2017 (AAP). All Tier 1 (Sprint 1+2) proposals are live as of v4.0.0:
| Module | Crate | Key idea |
|---|---|---|
InvasionFitnessScorer |
atlas-corpus | Replaces raw pheromone softmax; α_ij = ReLU(cos_sim − 0.2) — Lotka-Volterra valid (v4.0.3) |
CognitiveBranching |
atlas-astra | Detects explore_ratio plateau → bifurcates OODA |
CanonicalPheromoneUpdate |
atlas-palace | Principled decay λ = base_rate × exp(−canonical_term) — always positive, smooth (v4.0.3) |
HJConcentrationPrior |
atlas-trm | Hopf-Cole sharpening across TRM recursion steps |
PolymorphicTrainer |
atlas-corpus | k=2,3 morphs (fast/slow/creative) with competition matrix |
Mathematical foundation: DeepSupervisionTrainer IS a k-Morphic Trait Substitution System (exact, not analogy). Each N_sup pass = one phenotypic morph. Champagnat Theorem 3.1 derivably explains TRM's >75% gain from deep supervision. Full theory: see research reports.
ATLAS models are published to Hugging Face under the openhubresearch organization.
First release: openhubresearch/ATLAS-OLMo-3-7B-Think-v4 — OLMo-3-7B-Think run through the ATLAS v4.0.3 n-morphic framework with BF16 inference (19.9 tok/s A100-SXM4-40GB, W16A32, 532/532 tests, 47/47 GPU model tests).
---
language: en
license: apache-2.0
library_name: atlas
tags:
- atlas
- stigmergic-memory
- active-inference
- causal-inference
- pure-rust
- zero-dependencies
- champagnat-morphic
- bf16-inference
base_model: allenai/OLMo-3-0125-7B
---Models run through ATLAS carry the full n-morphic framework: InvasionFitnessScorer (Lotka-Volterra valid competition), CanonicalPheromoneUpdate (principled adaptive decay), BarBovier2017Constraints (stability gates), CognitiveBranching (OODA bifurcation), and HJConcentrationPrior (Hopf-Cole sharpening). See atlasagi.org for model releases and the LiveDiscoveryCorpus dataset.
- Code (
crates/,kernels/,scripts/): Apache 2.0 - Documentation, paper, figures, datasets: CC BY 4.0
© 2026 Robin Dey, OpenHub Research (Thailand)
See NOTICE for attribution to incorporated components.
@software{atlas2026,
title = {ATLAS: Active-inference Training with Learned Adaptive Stigmergy},
author = {Robin Dey},
year = {2026},
institution = {OpenHub Research, Thailand},
url = {https://github.com/web3guru888/ATLAS},
note = {Pure Rust LLM training framework. Zero external dependencies.
v4.0.3: 21 crates, 532 tests, Champagnat n-morphic framework,
BF16 GPU inference — OLMo-3-7B-Think 19.9 tok/s on A100-SXM4-40GB (W16A32).
Math-validated: exp decay (Issue #11) + Lotka-Volterra competition fix.}
}