Source: 18 Claude Code Token Hacks In 18 Minutes (YouTube, https://youtu.be/O2k_qwZA8HU)
A field-collected playlist of 18 token-management techniques for Claude Code users who keep hitting their session caps — even on the $200 Max plan. The framing is mechanical: every message in a session re-reads the entire conversation, the CLAUDE.md, MCP definitions, system prompt, and skills, so cost compounds rather than adds. One developer cited in the source tracked a 100+ message chat where 98.5% of tokens were spent re-reading prior history. The hacks are organized into three tiers — easy habits first, then surgical context discipline, then model and time-of-day strategy.
The 18 Techniques
Tier 1 — Habits (9)
- Start fresh conversations with
/clear. Every message in a long chat is exponentially more expensive than the same message in a fresh chat. Don’t carry context about topic A into a conversation about topic B. - Disconnect unused MCP servers. Each connected MCP server loads its full tool definitions into context on every message — a single server can be ~18,000 tokens per turn. Run
/mcpat session start, drop the ones you don’t need, and prefer CLIs over MCP wrappers when both exist (e.g., Google Workspace CLI vs the MCP). - Batch prompts into one message. Three separate messages cost roughly 3× a combined message because of history re-reads. If Claude gets something slightly wrong, edit the original message and regenerate rather than sending a follow-up correction — edits replace the bad exchange, follow-ups stack permanently.
- Use plan mode before any real task. Plan mode forces Claude to map the approach and ask clarifying questions, preventing the single biggest token sink: going down the wrong path and scrapping the work. Add to
CLAUDE.md: “Do not make changes until you have 95% confidence. Ask follow-up questions until you reach that level.” - Run
/contextand/cost./contextshows exactly what’s eating tokens right now — conversation history, MCP overhead, loaded files./costshows actual token usage and estimated spend for the session. Even a fresh session can already be ~51K tokens deep from system prompt, tools, custom agents, skills, and memory files. - Set up a status line. Run
/status linein the terminal and ask Claude to wire up a display showing model, a visual usage bar, andtokens / 1M context window. Visibility prevents surprise cap-outs. - Keep your Claude usage dashboard open. Same idea — peek every 20–40 minutes, or wire an automation to ping you at 30-minute intervals when usage is climbing.
- Be surgical about pasting. Before dropping a document, ask whether Claude needs the whole thing. If the bug is in one function, paste that function. Precision in is precision out.
- Watch Claude work. Don’t fire-and-forget. In a bad loop, ~80% of tokens produce zero value — re-reading the same files, repeating mistakes. Stop the run the moment it goes off course.
Tier 2 — Context discipline (5)
- Keep
CLAUDE.mdlean. Project root, under 200 lines. Tech stack, conventions, build commands, the 95% confidence rule. Treat it as an index — point at where bigger context lives rather than embedding it. Every message in every chat re-reads this file, so 1,000 lines of it gets re-read every turn, even on a “hi.” - Be surgical with file references. Don’t say “here’s the repo, find the bug.” Say “check
verifyUserinauth.js” or use@filenameto point Claude at exactly the file you mean. - Manually
/compactat ~60% context. Auto-compact only fires near 95%, by which point output quality has already degraded. Compact early with explicit instructions on what to preserve. After 3–4 compacts the quality decays — at that point grab a session summary,/clear, paste the summary back, and continue. - Mind the 5-minute prompt-cache window. Claude Code caches unchanged context to skip reprocessing, but the cache times out at 5 minutes. Stepping away and coming back triggers a full reprocess at full cost — the cause of the “random spike” people feel after a break. Run
/compactor/clearbefore walking away. - Watch command output bloat. Shell commands dump their full output into context — a
git logreturning 200 commits sends every line as tokens, and the UI hides this behind a one-liner. Deny noisy commands at the project permissions level when you know they aren’t needed.
Tier 3 — Model and time-of-day strategy (4 + bonus)
- Pick the right model. Sonnet for default coding, Haiku for sub-agents / formatting / simple tasks, Opus for deep architectural planning only when Sonnet wasn’t enough — keep Opus under ~20% of usage. For huge-codebase reviews, consider bringing in Codex via the official plugin so Claude tokens aren’t burned on review passes.
- Account for sub-agent cost. Agent workflows use ~7–10× more tokens than a single-agent session because each sub-agent wakes with its own full context and reloads system tools and files. Use them deliberately — pair one-off delegation with Haiku so 80% of your tokens are on the cheap model. Multi-agent teams produce nicer output but burn fast; reserve them for high-value runs.
- Respect peak vs off-peak hours. Anthropic drains the 5-hour session window faster during peak hours (8am–2pm Eastern, weekdays). Schedule big refactors, multi-agent runs, and long projects for afternoons, evenings, and weekends. Bonus 3.5: if you’re near a reset with budget left, go heavy and get your money’s worth; if you’re near the cap with hours to go, step away rather than burning the last 5%.
- Use
CLAUDE.mdas your system constitution. Store stable decisions, architecture rules, and progress summaries — “save decisions, not conversations.” Every architectural call captured there is a paragraph you never type again. The author also experiments with a self-evolving “Applied learning” section where Claude appends a one-line bullet (under 15 words, no explanations) when something fails repeatedly or a workaround is found — but this needs frequent pruning to stop bloat.
Recent Signals
[Reddit signal — r/ClaudeCode 2026-05-09]: r/ClaudeCode post 1t8461y (“Which token optimizer would you recommend?”, score 43) catalogs 12 token-optimizer projects competing for the same workload — rtk-ai/rtk (CLI proxy claiming 60-90%), JuliusBrussee/caveman (skill cutting 65% via verbose-output suppression), yamadashy/repomix (repo-as-single-file bundler), mksglu/context-mode (sandboxes tool output, 98% claim), DeusData/codebase-memory-mcp (155-language graph index, 99% claim), jgravelle/jcodemunch-mcp (tree-sitter AST), chopratejas/headroom, yvgude/lean-ctx (60-95% / up to 99% on cached reads), colbymchenry/codegraph (local pre-indexed knowledge graph), samuelfaj/distill (CLI-output distillation), manojmallick/sigmap (97% claim, 21 langs), mpecan/tokf (config-driven CLI compressor). The thread’s load-bearing insight: u/zoomaaron measured RTK at 3-5% saving in their actual workflow vs vendor-claimed 60-90% — always-measure-in-your-own-workflow is the rule, not the vendor headline. u/stellarton: “I would not stack all of these at first. You’ll save tokens and lose debuggability. Pick the one that matches the bottleneck you actually have.”
[Reddit signal — r/ClaudeCode 2026-05-09]: r/ClaudeCode post 1t7ycsv (link to dzarlax’s Substack post-mortem) reports a **0.37 per request from cache reads alone (~120-140/hr). Three load-bearing failure modes: (1) prompt caching is not free — 90% discount on 250k cached tokens still adds up over hundreds of requests (Opus reads 1M cached tokens at 0.37); (2) Anthropic’s web-console Hard Limit was bypassed — Anthropic support confirmed a bug where API billing limits were ignored for Claude Code requests on Claude for Work / Team subscriptions; (3) the wrapper hid the runaway — author was using AgentDeck (PTY-managed Claude Code) and the terminal could have been completely closed. Recovery: /logout revoked the API access token. Prevention recommendations: monitor /subagents for “silent” agents in the background; treat --auto mode on a massive monorepo as a budget hazard. Adds the cost-blast-radius failure mode to this article’s already-strong cost-discipline list — pairs with #9 (“Watch Claude work”) and #13 (“Mind the 5-minute prompt-cache window”).
Try It
Start with these four:
- Run
/contexton your next session and screenshot the breakdown. Most users have no idea where their tokens go before they look. - Disconnect every MCP server you didn’t use today, then start the session.
- Add the 95%-confidence rule to your
CLAUDE.md. - Set a manual
/compactreminder at 60% — don’t wait for auto-compact at 95%.
Related
- Opus 4.7 Best Practices — model-routing context for tier 3 hack 15
- Claude Code CLI Reference — full surface for
/context,/cost,/compact,/clear, status line - Agent Skills Overview — progressive disclosure is the same context-discipline principle applied to skills
- Claude Code Subagents — context for tier 3 hack 16 (the 7–10× cost factor)
- Claude Code Plugins and Marketplaces — where the Codex plugin and MCP installs come from
- Week 15 Release Notes —
/costper-model and cache-hit breakdown landed here