Cut LLM spend without changing what your agent does. Four analyzers. One CLI. Runs entirely on your machine.
pip install tokenjam
No cloud Β· No signup Β· No vendor lock-in
TokenJam reads the same telemetry your agent already emits and surfaces four kinds of savings. Every finding is structural, honest, and reviewable β no opaque "AI says so" recommendations.
|
Flag sessions where a cheaper model in the same family is worth a look. Never claims quality equivalence β surfaces examples so you can spot-check. tj optimize downsize |
Show your current caching ratio per (provider, model) and suggest Anthropic prompt-cache breakpoints from stable prefixes in your real usage. tj optimize cache |
|
Find clusters of deterministic tj optimize script |
LLMLingua-2 token-significance classifier β predicts which regions of your prompts the model gives little weight to. Surfaces what's safe to cut. tj optimize trim |
Run all four with tj optimize. Run several with tj optimize downsize cache trim.
For Claude Code users β zero code, auto-backfills your last 30 days:
pip install "tokenjam[mcp]"
tj onboard --claude-code
tj optimize # cost-saving candidates from your actual usageFor any Python agent:
from tokenjam.sdk import watch
from tokenjam.sdk.integrations.anthropic import patch_anthropic
patch_anthropic()
@watch(agent_id="my-agent")
def run(task: str) -> str:
...β Python SDK Β· TypeScript SDK Β· Codex Β· OTel-compatible agents
Your spans contain prompts, completions, tool inputs, and customer data. Shipping that to a SaaS vendor for "observability" is a data-egress decision most teams aren't ready to make.
| TokenJam | LangSmith | Langfuse | Datadog LLM Obs | |
|---|---|---|---|---|
| Signup required | β | β | β | β |
| Data leaves your machine | β | β | cloud only | β |
| Cost-optimization analyzers (Downsize, Cache, Script, Trim) | β | β | β | β |
| Real-time sensitive-action alerts | β | β | β | β |
| Behavioral drift detection | β | β | β | β |
| OTel GenAI SemConv native | β | partial | partial | partial |
| Works with any agent / framework | β | LangChain-first | partial | β |
| Free, MIT licensed | β | freemium | freemium | paid |
tj serve runs a local dashboard at http://127.0.0.1:7391/ with status, traces, cost breakdown, alerts, budget, and drift.
![]() |
![]() |
![]() |
![]() |
TokenJam is also a full observability stack. The four analyzers ride on top.
- Real-time cost tracking β every LLM call priced as it happens
- Safety alerts β 13 alert types, 6 channels (ntfy, Discord, Telegram, webhook, file, stdout)
- Behavioral drift detection β Z-score baselines, no LLM required
- Schema validation β declare or infer JSON Schema for tool outputs
- OTel-native β point any OTLP exporter at
tj serveand you're done - MCP server β 14 tools letting Claude Code query its own telemetry mid-session
tj optimize # all four cost-optimization analyzers
tj optimize downsize # one analyzer
tj status # current cost, tokens, active alerts
tj cost --since 7d # spend by agent / model / day / tool
tj alerts # everything that fired while you were away
tj drift # behavioral drift Z-scores
tj backfill claude-code # ingest historical ~/.claude/projects/ sessions
tj serve # start the web UI + REST API| Topic | Where |
|---|---|
| πͺΆ Downsize / Cache / Script / Trim deep-dives | docs/optimize/ |
| Claude Code & Codex integration | docs/claude-code-integration.md |
| Python SDK reference | docs/python-sdk.md |
| TypeScript SDK reference | docs/typescript-sdk.md |
| Framework support (LangChain / CrewAI / etc.) | docs/framework-support.md |
| Alert channels & rule reference | docs/alerts.md |
| Backfill from Langfuse / Helicone / OTLP | docs/backfill/ |
| Configuration | docs/configuration.md |
| Architecture deep-dive | docs/architecture.md |
| Installation extras (Trim, framework patches) | docs/installation.md |
| Export to Grafana / Datadog / NDJSON | docs/export.md |
| NemoClaw sandbox observer | docs/nemoclaw-integration.md |
Shipped in 0.3.x: Downsize Β· Cache Β· Script Β· Trim Β· Claude Code + Codex onboarding Β· MCP server Β· Web UI Β· Backfill adapters (Langfuse, Helicone, OTLP) Β· Period comparison Β· Routing-config export Β· Read-only policy preview
Up next:
-
tj policy add | edit | applyβ unified rule surface -
tj replayβ replay captured sessions against new model versions - TypeScript framework patches (LangChain JS, OpenAI Agents SDK)
- Vercel AI SDK & Mastra integrations
- Docker image
- GitHub Actions for CI drift/cost checks
tokenjam.dev Β· PyPI Β· npm Β· Issues
MIT License Β· Built by Metabuilder Labs



