TokenJam

The Optimization Layer for AI Agents

Cut LLM spend without changing what your agent does. Four analyzers. One CLI. Runs entirely on your machine.

pip install tokenjam

No cloud · No signup · No vendor lock-in

The four products

TokenJam reads the same telemetry your agent already emits and surfaces four kinds of savings. Every finding is structural, honest, and reviewable — no opaque "AI says so" recommendations.

🪶 Downsize

Flag sessions where a cheaper model in the same family is worth a look. Never claims quality equivalence — surfaces examples so you can spot-check.

tj optimize downsize

Details →

💾 Cache

Show your current caching ratio per (provider, model) and suggest Anthropic prompt-cache breakpoints from stable prefixes in your real usage.

tj optimize cache

Details →

📜 Script

Find clusters of deterministic (tool_name, arg_shape) sequences that match the shape of work a plain script could replace.

tj optimize script

Details →

✂️ Trim

LLMLingua-2 token-significance classifier — predicts which regions of your prompts the model gives little weight to. Surfaces what's safe to cut.

tj optimize trim

Details →

Run all four with tj optimize. Run several with tj optimize downsize cache trim.

30-second quickstart

For Claude Code users — zero code, auto-backfills your last 30 days:

pip install "tokenjam[mcp]"
tj onboard --claude-code
tj optimize          # cost-saving candidates from your actual usage

For any Python agent:

from tokenjam.sdk import watch
from tokenjam.sdk.integrations.anthropic import patch_anthropic

patch_anthropic()

@watch(agent_id="my-agent")
def run(task: str) -> str:
    ...

→ Python SDK · TypeScript SDK · Codex · OTel-compatible agents

Why local-first matters

Your spans contain prompts, completions, tool inputs, and customer data. Shipping that to a SaaS vendor for "observability" is a data-egress decision most teams aren't ready to make.

	TokenJam	LangSmith	Langfuse	Datadog LLM Obs
Signup required	❌	✅	✅	✅
Data leaves your machine	❌	✅	cloud only	✅
Cost-optimization analyzers (Downsize, Cache, Script, Trim)	✅	❌	❌	❌
Real-time sensitive-action alerts	✅	❌	❌	❌
Behavioral drift detection	✅	❌	❌	❌
OTel GenAI SemConv native	✅	partial	partial	partial
Works with any agent / framework	✅	LangChain-first	partial	❌
Free, MIT licensed	✅	freemium	freemium	paid

Web UI

tj serve runs a local dashboard at http://127.0.0.1:7391/ with status, traces, cost breakdown, alerts, budget, and drift.

Beyond optimization

TokenJam is also a full observability stack. The four analyzers ride on top.

Real-time cost tracking — every LLM call priced as it happens
Safety alerts — 13 alert types, 6 channels (ntfy, Discord, Telegram, webhook, file, stdout)
Behavioral drift detection — Z-score baselines, no LLM required
Schema validation — declare or infer JSON Schema for tool outputs
OTel-native — point any OTLP exporter at tj serve and you're done
MCP server — 14 tools letting Claude Code query its own telemetry mid-session

CLI

tj optimize            # all four cost-optimization analyzers
tj optimize downsize   # one analyzer
tj status              # current cost, tokens, active alerts
tj cost --since 7d     # spend by agent / model / day / tool
tj alerts              # everything that fired while you were away
tj drift               # behavioral drift Z-scores
tj backfill claude-code # ingest historical ~/.claude/projects/ sessions
tj serve               # start the web UI + REST API

Full CLI reference →

Documentation

Topic	Where
🪶 Downsize / Cache / Script / Trim deep-dives	docs/optimize/
Claude Code & Codex integration	docs/claude-code-integration.md
Python SDK reference	docs/python-sdk.md
TypeScript SDK reference	docs/typescript-sdk.md
Framework support (LangChain / CrewAI / etc.)	docs/framework-support.md
Alert channels & rule reference	docs/alerts.md
Backfill from Langfuse / Helicone / OTLP	docs/backfill/
Configuration	docs/configuration.md
Architecture deep-dive	docs/architecture.md
Installation extras (Trim, framework patches)	docs/installation.md
Export to Grafana / Datadog / NDJSON	docs/export.md
NemoClaw sandbox observer	docs/nemoclaw-integration.md

Roadmap

Shipped in 0.3.x: Downsize · Cache · Script · Trim · Claude Code + Codex onboarding · MCP server · Web UI · Backfill adapters (Langfuse, Helicone, OTLP) · Period comparison · Routing-config export · Read-only policy preview

Up next:

tj policy add | edit | apply — unified rule surface
tj replay — replay captured sessions against new model versions
TypeScript framework patches (LangChain JS, OpenAI Agents SDK)
Vercel AI SDK & Mastra integrations
Docker image
GitHub Actions for CI drift/cost checks

tokenjam.dev · PyPI · npm · Issues

MIT License · Built by Metabuilder Labs

Name		Name	Last commit message	Last commit date
Latest commit History 253 Commits
.github		.github
.tj		.tj
docs		docs
examples		examples
incidents		incidents
pricing		pricing
sdk-ts		sdk-ts
tests		tests
tokenjam		tokenjam
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TokenJam

The Optimization Layer for AI Agents

The four products

🪶 Downsize

💾 Cache

📜 Script

✂️ Trim

30-second quickstart

Why local-first matters

Web UI

Beyond optimization

CLI

Documentation

Roadmap

About

Uh oh!

Releases 16

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TokenJam

The Optimization Layer for AI Agents

The four products

🪶 Downsize

💾 Cache

📜 Script

✂️ Trim

30-second quickstart

Why local-first matters

Web UI

Beyond optimization

CLI

Documentation

Roadmap

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 16

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages