Source: raw/gh-star-alexgreensh-token-optimizer.md (gh-stars puller) + ai-research/alexgreensh-token-optimizer-readme-2026-05-31.md (README extraction, github.com/alexgreensh/token-optimizer, 1,166★ / 114 releases / 257 tests, v5.8.7 on 2026-05-31).
Token Optimizer is a source-available, local-only plugin that audits a Claude Code (or OpenCode / OpenClaw / Codex) session for wasted context — what the author calls “ghost tokens” — then compresses re-reads, checkpoints state across compaction, and grades session efficiency. It targets a gap the wiki’s existing context articles describe but don’t tool: the slow context-quality decay of long agent sessions. All benchmarks below are creator-reported (single-maintainer repo, not independently reproduced).
Key Takeaways
- “Ghost tokens” = tokens consumed but never seen by the model. The author’s framing: bloated configs, skills that are loaded but never invoked,
MEMORY.mdcontent past line 200 (claimed silently truncated), and the “60-70% lost on each compaction.” The tool’s job is to surface and cut these. - Three waste categories. Structural (configs, unused skills, duplicate system prompts, stale memory — cited ~5K/session light, ~10-20K typical-heavy, ~35K high-waste), runtime (verbose command output, oversized MCP results, redundant file reads), and behavioral (cache expiration, late compaction, looping, model misrouting).
- Smart Compaction is the headline mechanism. It checkpoints decisions before auto-compact fires and restores them afterward, and archives large tool outputs with inline hints so the model can retrieve them post-compaction — directly addressing the lossy auto-compaction that long Claude Code sessions hit.
- Active Compression (v5): Delta Mode (diffs instead of full content on file re-reads), Structure Map (AST summaries instead of re-reading large code files), Bash Compression (16 CLI handlers for git/pytest/lint/logs).
- Quality Scoring (v6): dual S–F letter grades for Resource Health and Session Efficiency, plus a live dashboard at
localhost:24842— objective degradation metrics rather than a raw context-percentage bar. - License is PolyForm Noncommercial 1.0.0, not the
NOASSERTIONGitHub auto-detected. Free for personal/research/education and small teams (under 5 people OR under $20K/month revenue get an automatic commercial license); larger enterprises must negotiate. Same source-available family as GitNexus (vs MIT tools like Graphify). - Cache-safe by design. Claims it “never modifies content already in context,” preserving the prompt-cache prefix — relevant to the economics in Prompt Caching for Agencies.
How it works
The model never sees the audit happen — it runs around the session and reshapes what reaches the context window:
- Read-Cache / dedup. Unchanged file re-reads return a structural summary instead of full source. Delta Mode goes further: a re-read of a 2,000-token file becomes a ~50-token diff (author: “97% savings on that specific read”).
- Structure Map. Large code-file re-reads are replaced with AST summaries — author cites “95-99% compression” on that path.
- Bash Compression. 16 handlers truncate noisy command output (e.g., a “564-token pytest output” → 115 tokens; a 60-file
ls -la→ 50 lines + a marker). - Compaction survival. Checkpoints before the auto-compact event, archives big tool outputs with retrieval hints, restores prior decisions after — the mechanism that distinguishes it from a one-shot context trimmer.
Implementation
- Tool/Service:
alexgreensh/token-optimizer(Python plugin; Claude Code / OpenCode / OpenClaw / Codex). Zero dependencies, local-only. - Setup (Claude Code):
/plugin marketplace add alexgreensh/token-optimizerthen/plugin install token-optimizer@alexgreensh-token-optimizer; invoke/token-optimizeror/token-coach. Manual path:git clone … ~/.claude/token-optimizer && bash install.sh. - Cost: Free under PolyForm Noncommercial for individuals/small teams; commercial license required above the team-size/revenue thresholds.
- Integration notes: Dashboard at
http://localhost:24842/token-optimizer(auto-refreshes post-session); models cost across 4 pricing tiers. Designed not to break prompt-cache prefix stability.
Try It
- Install via the plugin marketplace and run
/token-coachat the end of a long session to see the S–F Resource Health / Session Efficiency grades. - Open the
localhost:24842dashboard after a heavy multi-agent run and read the per-turn token breakdown — compare against a session you ran without it. - Stress-test Smart Compaction: trigger a near-auto-compact session and confirm whether archived tool outputs are actually retrievable afterward (the load-bearing claim).
- If you maintain a large
MEMORY.md, check the “past line 200 truncation” claim against your own setup before trusting it.
Open Questions
- No independent benchmark verification. Every savings figure traces to the README; the 60-70%-lost-per-compaction and MEMORY.md-line-200-truncation claims in particular need confirmation against current Claude Code (v2.1.x) behavior. ^[inferred]
- Overlap with native compaction. Claude Code already preserves sensitive instructions across compaction (v2.1.139); how much incremental value Smart Compaction adds on top is untested here. ^[inferred]
- Single maintainer. 114 releases and 257 tests are strong signals, but the project is one author — durability/maintenance risk applies as with any single-owner tool.
Related
- 18 Claude Code Token-Optimization Techniques — the manual techniques playbook this tool automates.
- Context Management in Claude Code — the practices this tool automates.
- Prompt Caching for Agencies — the cache-prefix economics it claims to preserve.
- Claude Code Memory Architecture Comparison — where session/memory state lives that ghost tokens accrue against.
- agentmemory — adjacent persistent-memory approach to the same long-session decay problem.
- GitNexus — PolyForm Noncommercial source-available sibling (license precedent).
- Claude Code Best Practices — context-discipline guidance this complements.