Source: raw/gh-star-alexgreensh-token-optimizer.md (gh-stars puller) + ai-research/alexgreensh-token-optimizer-readme-2026-05-31.md (README extraction, github.com/alexgreensh/token-optimizer, 1,166★ / 114 releases / 257 tests, v5.8.7 on 2026-05-31).

Token Optimizer is a source-available, local-only plugin that audits a Claude Code (or OpenCode / OpenClaw / Codex) session for wasted context — what the author calls “ghost tokens” — then compresses re-reads, checkpoints state across compaction, and grades session efficiency. It targets a gap the wiki’s existing context articles describe but don’t tool: the slow context-quality decay of long agent sessions. All benchmarks below are creator-reported (single-maintainer repo, not independently reproduced).

Key Takeaways

  • “Ghost tokens” = tokens consumed but never seen by the model. The author’s framing: bloated configs, skills that are loaded but never invoked, MEMORY.md content past line 200 (claimed silently truncated), and the “60-70% lost on each compaction.” The tool’s job is to surface and cut these.
  • Three waste categories. Structural (configs, unused skills, duplicate system prompts, stale memory — cited ~5K/session light, ~10-20K typical-heavy, ~35K high-waste), runtime (verbose command output, oversized MCP results, redundant file reads), and behavioral (cache expiration, late compaction, looping, model misrouting).
  • Smart Compaction is the headline mechanism. It checkpoints decisions before auto-compact fires and restores them afterward, and archives large tool outputs with inline hints so the model can retrieve them post-compaction — directly addressing the lossy auto-compaction that long Claude Code sessions hit.
  • Active Compression (v5): Delta Mode (diffs instead of full content on file re-reads), Structure Map (AST summaries instead of re-reading large code files), Bash Compression (16 CLI handlers for git/pytest/lint/logs).
  • Quality Scoring (v6): dual S–F letter grades for Resource Health and Session Efficiency, plus a live dashboard at localhost:24842 — objective degradation metrics rather than a raw context-percentage bar.
  • License is PolyForm Noncommercial 1.0.0, not the NOASSERTION GitHub auto-detected. Free for personal/research/education and small teams (under 5 people OR under $20K/month revenue get an automatic commercial license); larger enterprises must negotiate. Same source-available family as GitNexus (vs MIT tools like Graphify).
  • Cache-safe by design. Claims it “never modifies content already in context,” preserving the prompt-cache prefix — relevant to the economics in Prompt Caching for Agencies.

How it works

The model never sees the audit happen — it runs around the session and reshapes what reaches the context window:

  • Read-Cache / dedup. Unchanged file re-reads return a structural summary instead of full source. Delta Mode goes further: a re-read of a 2,000-token file becomes a ~50-token diff (author: “97% savings on that specific read”).
  • Structure Map. Large code-file re-reads are replaced with AST summaries — author cites “95-99% compression” on that path.
  • Bash Compression. 16 handlers truncate noisy command output (e.g., a “564-token pytest output” → 115 tokens; a 60-file ls -la → 50 lines + a marker).
  • Compaction survival. Checkpoints before the auto-compact event, archives big tool outputs with retrieval hints, restores prior decisions after — the mechanism that distinguishes it from a one-shot context trimmer.

Implementation

  • Tool/Service: alexgreensh/token-optimizer (Python plugin; Claude Code / OpenCode / OpenClaw / Codex). Zero dependencies, local-only.
  • Setup (Claude Code): /plugin marketplace add alexgreensh/token-optimizer then /plugin install token-optimizer@alexgreensh-token-optimizer; invoke /token-optimizer or /token-coach. Manual path: git clone … ~/.claude/token-optimizer && bash install.sh.
  • Cost: Free under PolyForm Noncommercial for individuals/small teams; commercial license required above the team-size/revenue thresholds.
  • Integration notes: Dashboard at http://localhost:24842/token-optimizer (auto-refreshes post-session); models cost across 4 pricing tiers. Designed not to break prompt-cache prefix stability.

Try It

  1. Install via the plugin marketplace and run /token-coach at the end of a long session to see the S–F Resource Health / Session Efficiency grades.
  2. Open the localhost:24842 dashboard after a heavy multi-agent run and read the per-turn token breakdown — compare against a session you ran without it.
  3. Stress-test Smart Compaction: trigger a near-auto-compact session and confirm whether archived tool outputs are actually retrievable afterward (the load-bearing claim).
  4. If you maintain a large MEMORY.md, check the “past line 200 truncation” claim against your own setup before trusting it.

Open Questions

  • No independent benchmark verification. Every savings figure traces to the README; the 60-70%-lost-per-compaction and MEMORY.md-line-200-truncation claims in particular need confirmation against current Claude Code (v2.1.x) behavior. ^[inferred]
  • Overlap with native compaction. Claude Code already preserves sensitive instructions across compaction (v2.1.139); how much incremental value Smart Compaction adds on top is untested here. ^[inferred]
  • Single maintainer. 114 releases and 257 tests are strong signals, but the project is one author — durability/maintenance risk applies as with any single-owner tool.