Skip to content

Metabuilder-Labs/tokenjam

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

253 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

TokenJam

TokenJam

The Optimization Layer for AI Agents

Cut LLM spend without changing what your agent does. Four analyzers. One CLI. Runs entirely on your machine.

CI PyPI Python npm License: MIT OTel

pip install tokenjam

No cloud Β· No signup Β· No vendor lock-in


The four products

TokenJam reads the same telemetry your agent already emits and surfaces four kinds of savings. Every finding is structural, honest, and reviewable β€” no opaque "AI says so" recommendations.

πŸͺΆ Downsize

Flag sessions where a cheaper model in the same family is worth a look. Never claims quality equivalence β€” surfaces examples so you can spot-check.

tj optimize downsize

Details β†’

πŸ’Ύ Cache

Show your current caching ratio per (provider, model) and suggest Anthropic prompt-cache breakpoints from stable prefixes in your real usage.

tj optimize cache

Details β†’

πŸ“œ Script

Find clusters of deterministic (tool_name, arg_shape) sequences that match the shape of work a plain script could replace.

tj optimize script

Details β†’

βœ‚οΈ Trim

LLMLingua-2 token-significance classifier β€” predicts which regions of your prompts the model gives little weight to. Surfaces what's safe to cut.

tj optimize trim

Details β†’

Run all four with tj optimize. Run several with tj optimize downsize cache trim.


30-second quickstart

For Claude Code users β€” zero code, auto-backfills your last 30 days:

pip install "tokenjam[mcp]"
tj onboard --claude-code
tj optimize          # cost-saving candidates from your actual usage

For any Python agent:

from tokenjam.sdk import watch
from tokenjam.sdk.integrations.anthropic import patch_anthropic

patch_anthropic()

@watch(agent_id="my-agent")
def run(task: str) -> str:
    ...

β†’ Python SDK Β· TypeScript SDK Β· Codex Β· OTel-compatible agents


Why local-first matters

Your spans contain prompts, completions, tool inputs, and customer data. Shipping that to a SaaS vendor for "observability" is a data-egress decision most teams aren't ready to make.

TokenJam LangSmith Langfuse Datadog LLM Obs
Signup required ❌ βœ… βœ… βœ…
Data leaves your machine ❌ βœ… cloud only βœ…
Cost-optimization analyzers (Downsize, Cache, Script, Trim) βœ… ❌ ❌ ❌
Real-time sensitive-action alerts βœ… ❌ ❌ ❌
Behavioral drift detection βœ… ❌ ❌ ❌
OTel GenAI SemConv native βœ… partial partial partial
Works with any agent / framework βœ… LangChain-first partial ❌
Free, MIT licensed βœ… freemium freemium paid

Web UI

tj serve runs a local dashboard at http://127.0.0.1:7391/ with status, traces, cost breakdown, alerts, budget, and drift.

tj status page tj cost page
tj traces page tj alerts page

Beyond optimization

TokenJam is also a full observability stack. The four analyzers ride on top.

  • Real-time cost tracking β€” every LLM call priced as it happens
  • Safety alerts β€” 13 alert types, 6 channels (ntfy, Discord, Telegram, webhook, file, stdout)
  • Behavioral drift detection β€” Z-score baselines, no LLM required
  • Schema validation β€” declare or infer JSON Schema for tool outputs
  • OTel-native β€” point any OTLP exporter at tj serve and you're done
  • MCP server β€” 14 tools letting Claude Code query its own telemetry mid-session

CLI

tj optimize            # all four cost-optimization analyzers
tj optimize downsize   # one analyzer
tj status              # current cost, tokens, active alerts
tj cost --since 7d     # spend by agent / model / day / tool
tj alerts              # everything that fired while you were away
tj drift               # behavioral drift Z-scores
tj backfill claude-code # ingest historical ~/.claude/projects/ sessions
tj serve               # start the web UI + REST API

Full CLI reference β†’


Documentation

Topic Where
πŸͺΆ Downsize / Cache / Script / Trim deep-dives docs/optimize/
Claude Code & Codex integration docs/claude-code-integration.md
Python SDK reference docs/python-sdk.md
TypeScript SDK reference docs/typescript-sdk.md
Framework support (LangChain / CrewAI / etc.) docs/framework-support.md
Alert channels & rule reference docs/alerts.md
Backfill from Langfuse / Helicone / OTLP docs/backfill/
Configuration docs/configuration.md
Architecture deep-dive docs/architecture.md
Installation extras (Trim, framework patches) docs/installation.md
Export to Grafana / Datadog / NDJSON docs/export.md
NemoClaw sandbox observer docs/nemoclaw-integration.md

Roadmap

Shipped in 0.3.x: Downsize Β· Cache Β· Script Β· Trim Β· Claude Code + Codex onboarding Β· MCP server Β· Web UI Β· Backfill adapters (Langfuse, Helicone, OTLP) Β· Period comparison Β· Routing-config export Β· Read-only policy preview

Up next:

  • tj policy add | edit | apply β€” unified rule surface
  • tj replay β€” replay captured sessions against new model versions
  • TypeScript framework patches (LangChain JS, OpenAI Agents SDK)
  • Vercel AI SDK & Mastra integrations
  • Docker image
  • GitHub Actions for CI drift/cost checks

tokenjam.dev Β· PyPI Β· npm Β· Issues

MIT License Β· Built by Metabuilder Labs