Source: ai-research/chopratejas-headroom-readme-2026-06-13.md (the chopratejas/headroom README, fetched 2026-06-13) — discovered via Matthew Berman’s OSS-projects video (raw/You_NEED_to_try_these_open-source_AI_projects_RIGHT_NOW.md). Benchmarks below are creator-reported (confidence medium); the repo ships a reproduction harness.
Headroom (chopratejas/headroom, ~25.9K stars, Apache-2.0) is a local-first context-compression layer that sits between an AI agent and the LLM. It compresses everything the agent reads — tool outputs, logs, RAG chunks, files, and conversation history — before it reaches the model, claiming 60–95% token reduction “with the same answers.” It is the most-starred entrant in the crowded token-optimizer field this wiki tracks, and differentiates on three axes: it runs locally (data stays on your machine), it is reversible (originals are cached for on-demand retrieval), and it offers cross-agent shared memory.
Key Takeaways
- Five ways to adopt it, from zero-code to inline. A proxy (
headroom proxy --port 8787, any language, no code changes); an agent wrap (headroom wrap claude|codex|cursor|aider|copilot— starts the proxy and points the tool at it); a library (compress(messages)in Python or TypeScript); an MCP server (headroom_compress/headroom_retrieve/headroom_stats); and SDK middleware for Anthropic/OpenAI, Vercel AI SDK, LiteLLM, LangChain, Agno, and Strands. - Content-aware compression, not blunt truncation. A
ContentRouterdetects content type and routes to the right compressor:SmartCrusherfor JSON,CodeCompressor(AST-aware) for Python/JS/Go/Rust/Java/C++, andKompress-base(a HuggingFace model trained on agentic traces) for prose. ACacheAlignerstabilizes prompt prefixes so Anthropic/OpenAI KV caches still hit after compression — important, since naive compression breaks prompt caching. - Reversible by design (CCR). Compressed-out originals are cached locally; if the model needs the full text it calls
headroom_retrieve. This is the safety valve that separates it from lossy context trimming — the agent can recover detail on demand within a configured TTL. - Cross-agent memory + failure mining.
headroom wrap claude --memorygives a shared, project-scoped, user-isolated memory store across Claude/Codex/Gemini with auto-dedup.headroom learnmines failed sessions and writes corrections back intoCLAUDE.md/AGENTS.md— a self-improvement loop adjacent to the AIOS “fold learnings back into the system” dimension. - Creator-reported numbers (verify before quoting). Workload savings: code search 17,765 → 1,408 tokens (92%), SRE incident debugging 65,694 → 5,118 (92%), GitHub issue triage 54,174 → 14,761 (73%), codebase exploration 78,502 → 41,254 (47%). Accuracy held on standard benchmarks (N=100): GSM8K ±0.000, TruthfulQA +0.030, SQuAD v2 97% @ 19% compression, BFCL tools 97% @ 32% compression. The repo has CI + codecov + a
python -m headroom.evals suite --tier 1reproduction path.^[ambiguous] - When to skip (per the README). If you only use a single provider’s native compaction and don’t need cross-agent memory, or you run in a sandbox where local processes can’t start.
Implementation
Tool/Service: Headroom (chopratejas/headroom), Apache-2.0. PyPI headroom-ai, npm headroom-ai, HuggingFace model chopratejas/kompress-v2-base.
Setup: pip install "headroom-ai[all]" (Python 3.10+) or npm install headroom-ai, then headroom wrap claude (or headroom proxy --port 8787). headroom perf reports the savings; granular extras include [proxy] [mcp] [ml] [code] [memory] [evals].
Cost: Free/OSS; runs locally, so the “cost” is the token savings (its purpose) plus local CPU/RAM for the embedder (Apple-GPU offload available via HEADROOM_EMBEDDER_RUNTIME=pytorch_mps).
Integration notes: headroom wrap claude supports --memory and --code-graph; Codex shares the memory store with Claude; OpenClaw installs it as a ContextEngine plugin. Any OpenAI-compatible client works through the proxy.
Try It
- Wrap Claude Code for one session and measure.
headroom wrap claudethenheadroom perf— compare token consumption against an unwrapped session on the same task. The honest test is whether answer quality holds, not just the token delta. - Stress the reversibility. Run a task where the agent genuinely needs a detail that got compressed away, and confirm
headroom_retrieverecovers it. If it can’t, the compression is lossier than advertised for your workload. - Compare against native compaction. Headroom is one of 12+ token optimizers catalogued in Claude Code Token Optimization; benchmark it against Claude Code’s own auto-compaction before adding a moving part. The Fable-5 field-test debate (“token-hungry is contested — it one-shots more often”) in the Fable 5 article is the backdrop: compression matters most for long agentic loops, less for one-shots.
Related
- Claude Code Token Optimization — the 12-optimizer landscape Headroom now tops by stars; read for the competitive field and native alternatives.
- Token Optimizer — a sibling source-available optimizer (ghost tokens, smart compaction); different design, same goal.
- Context Management in Claude Code — the native context-window mechanics Headroom sits on top of.
- agent-skills (Addy Osmani) — the sibling tool from the same Matthew Berman OSS-projects video.
- Five OSS Tools That Fix Claude Code’s Blind Spots — adjacent operator-workflow toolkit (Headroom pairs with the context-loss blind spot).
- The 2026 Claude Code AIOS Pattern —
headroom learninstantiates the “system improves itself” dimension.