Claude Code Token Optimization (18 Hacks in 18 Minutes)

Source: 18 Claude Code Token Hacks In 18 Minutes (YouTube, https://youtu.be/O2k_qwZA8HU)

A field-collected playlist of 18 token-management techniques for Claude Code users who keep hitting their session caps — even on the $200 Max plan. The framing is mechanical: every message in a session re-reads the entire conversation, the CLAUDE.md, MCP definitions, system prompt, and skills, so cost compounds rather than adds. One developer cited in the source tracked a 100+ message chat where 98.5% of tokens were spent re-reading prior history. The hacks are organized into three tiers — easy habits first, then surgical context discipline, then model and time-of-day strategy.

The 18 Techniques

Tier 1 — Habits (9)

Start fresh conversations with /clear. Every message in a long chat is exponentially more expensive than the same message in a fresh chat. Don’t carry context about topic A into a conversation about topic B.
Disconnect unused MCP servers. Each connected MCP server loads its full tool definitions into context on every message — a single server can be ~18,000 tokens per turn. Run /mcp at session start, drop the ones you don’t need, and prefer CLIs over MCP wrappers when both exist (e.g., Google Workspace CLI vs the MCP).
Batch prompts into one message. Three separate messages cost roughly 3× a combined message because of history re-reads. If Claude gets something slightly wrong, edit the original message and regenerate rather than sending a follow-up correction — edits replace the bad exchange, follow-ups stack permanently.
Use plan mode before any real task. Plan mode forces Claude to map the approach and ask clarifying questions, preventing the single biggest token sink: going down the wrong path and scrapping the work. Add to CLAUDE.md: “Do not make changes until you have 95% confidence. Ask follow-up questions until you reach that level.”
Run /context and /cost. /context shows exactly what’s eating tokens right now — conversation history, MCP overhead, loaded files. /cost shows actual token usage and estimated spend for the session. Even a fresh session can already be ~51K tokens deep from system prompt, tools, custom agents, skills, and memory files.
Set up a status line. Run /status line in the terminal and ask Claude to wire up a display showing model, a visual usage bar, and tokens / 1M context window. Visibility prevents surprise cap-outs.
Keep your Claude usage dashboard open. Same idea — peek every 20–40 minutes, or wire an automation to ping you at 30-minute intervals when usage is climbing.
Be surgical about pasting. Before dropping a document, ask whether Claude needs the whole thing. If the bug is in one function, paste that function. Precision in is precision out.
Watch Claude work. Don’t fire-and-forget. In a bad loop, ~80% of tokens produce zero value — re-reading the same files, repeating mistakes. Stop the run the moment it goes off course.

Tier 2 — Context discipline (5)

Keep CLAUDE.md lean. Project root, under 200 lines. Tech stack, conventions, build commands, the 95% confidence rule. Treat it as an index — point at where bigger context lives rather than embedding it. Every message in every chat re-reads this file, so 1,000 lines of it gets re-read every turn, even on a “hi.”
Be surgical with file references. Don’t say “here’s the repo, find the bug.” Say “check verifyUser in auth.js” or use @filename to point Claude at exactly the file you mean.
Manually /compact at ~60% context. Auto-compact only fires near 95%, by which point output quality has already degraded. Compact early with explicit instructions on what to preserve. After 3–4 compacts the quality decays — at that point grab a session summary, /clear, paste the summary back, and continue.
Mind the 5-minute prompt-cache window. Claude Code caches unchanged context to skip reprocessing, but the cache times out at 5 minutes. Stepping away and coming back triggers a full reprocess at full cost — the cause of the “random spike” people feel after a break. Run /compact or /clear before walking away.
Watch command output bloat. Shell commands dump their full output into context — a git log returning 200 commits sends every line as tokens, and the UI hides this behind a one-liner. Deny noisy commands at the project permissions level when you know they aren’t needed.

Tier 3 — Model and time-of-day strategy (4 + bonus)

Pick the right model. Sonnet for default coding, Haiku for sub-agents / formatting / simple tasks, Opus for deep architectural planning only when Sonnet wasn’t enough — keep Opus under ~20% of usage. For huge-codebase reviews, consider bringing in Codex via the official plugin so Claude tokens aren’t burned on review passes.
Account for sub-agent cost. Agent workflows use ~7–10× more tokens than a single-agent session because each sub-agent wakes with its own full context and reloads system tools and files. Use them deliberately — pair one-off delegation with Haiku so 80% of your tokens are on the cheap model. Multi-agent teams produce nicer output but burn fast; reserve them for high-value runs.
Respect peak vs off-peak hours. Anthropic drains the 5-hour session window faster during peak hours (8am–2pm Eastern, weekdays). Schedule big refactors, multi-agent runs, and long projects for afternoons, evenings, and weekends. Bonus 3.5: if you’re near a reset with budget left, go heavy and get your money’s worth; if you’re near the cap with hours to go, step away rather than burning the last 5%.
Use CLAUDE.md as your system constitution. Store stable decisions, architecture rules, and progress summaries — “save decisions, not conversations.” Every architectural call captured there is a paragraph you never type again. The author also experiments with a self-evolving “Applied learning” section where Claude appends a one-line bullet (under 15 words, no explanations) when something fails repeatedly or a workaround is found — but this needs frequent pruning to stop bloat.

Recent Signals

[Reddit signal — r/ClaudeCode 2026-05-09]: r/ClaudeCode post 1t8461y (“Which token optimizer would you recommend?”, score 43) catalogs 12 token-optimizer projects competing for the same workload — rtk-ai/rtk (CLI proxy claiming 60-90%), JuliusBrussee/caveman (skill cutting 65% via verbose-output suppression), yamadashy/repomix (repo-as-single-file bundler), mksglu/context-mode (sandboxes tool output, 98% claim), DeusData/codebase-memory-mcp (155-language graph index, 99% claim), jgravelle/jcodemunch-mcp (tree-sitter AST), chopratejas/headroom, yvgude/lean-ctx (60-95% / up to 99% on cached reads), colbymchenry/codegraph (local pre-indexed knowledge graph), samuelfaj/distill (CLI-output distillation), manojmallick/sigmap (97% claim, 21 langs), mpecan/tokf (config-driven CLI compressor). The thread’s load-bearing insight: u/zoomaaron measured RTK at 3-5% saving in their actual workflow vs vendor-claimed 60-90% — always-measure-in-your-own-workflow is the rule, not the vendor headline. u/stellarton: “I would not stack all of these at first. You’ll save tokens and lose debuggability. Pick the one that matches the bottleneck you actually have.”

[Reddit signal — r/ClaudeCode 2026-05-09]: r/ClaudeCode post 1t7ycsv (link to dzarlax’s Substack post-mortem) reports a ** $1, 000 o v er ni g h t b u r n * * f ro ma Cl a u d e C o d e a g e n t s t u c kinan error - correc t i o n l oo p —1 re q u es tp er 3 seco n d s,$ 0.37 per request from cache reads alone (~ $1.50/ sec,$ 120-140/hr). Three load-bearing failure modes: (1) prompt caching is not free — 90% discount on 250k cached tokens still adds up over hundreds of requests (Opus reads 1M cached tokens at $1.50, soe a c h re q u es tw a s$ 0.37); (2) Anthropic’s web-console Hard Limit was bypassed — Anthropic support confirmed a bug where API billing limits were ignored for Claude Code requests on Claude for Work / Team subscriptions; (3) the wrapper hid the runaway — author was using AgentDeck (PTY-managed Claude Code) and the terminal could have been completely closed. Recovery: /logout revoked the API access token. Prevention recommendations: monitor /subagents for “silent” agents in the background; treat --auto mode on a massive monorepo as a budget hazard. Adds the cost-blast-radius failure mode to this article’s already-strong cost-discipline list — pairs with #9 (“Watch Claude work”) and #13 (“Mind the 5-minute prompt-cache window”).

Try It

Start with these four:

Run /context on your next session and screenshot the breakdown. Most users have no idea where their tokens go before they look.
Disconnect every MCP server you didn’t use today, then start the session.
Add the 95%-confidence rule to your CLAUDE.md.
Set a manual /compact reminder at 60% — don’t wait for auto-compact at 95%.

Opus 4.7 Best Practices — model-routing context for tier 3 hack 15
Claude Code CLI Reference — full surface for /context, /cost, /compact, /clear, status line
Agent Skills Overview — progressive disclosure is the same context-discipline principle applied to skills
Claude Code Subagents — context for tier 3 hack 16 (the 7–10× cost factor)
Claude Code Plugins and Marketplaces — where the Codex plugin and MCP installs come from
Week 15 Release Notes — /cost per-model and cache-hit breakdown landed here

Jonathon's AI Wiki

Explorer

Claude Code Token Optimization (18 Hacks in 18 Minutes)

The 18 Techniques

Tier 1 — Habits (9)

Tier 2 — Context discipline (5)

Tier 3 — Model and time-of-day strategy (4 + bonus)

Recent Signals

Try It

Graph View

Table of Contents

Backlinks

Jonathon's AI Wiki

Explorer

Claude Code Token Optimization (18 Hacks in 18 Minutes)

The 18 Techniques

Tier 1 — Habits (9)

Tier 2 — Context discipline (5)

Tier 3 — Model and time-of-day strategy (4 + bonus)

Recent Signals

Try It

Related

Graph View

Table of Contents

Backlinks