Source: raw/Context_Management_in_Claude_Code.md — Anthropic-produced YouTube tutorial eW3oTyfeWZ0 (~2 min).

Short Anthropic-authored explainer covering the primitives of context management in Claude Code: how the context window fills up, how /compact / /clear / /context differ, where CLAUDE.md fits, and the four levers (specificity, MCP discipline, skills, subagents) that reduce context pressure. Companion to the longer Anthropic’s Official Best Practices for Claude Code doc — same primitives, distilled to a 2-minute walkthrough.

Key Takeaways

  • Context window = Claude’s working memory. Every prompt, file read, tool call, and tool result is appended to the window. Finite space → optimization is “extremely important.”
  • Automatic compaction kicks in near the limit. Summarizes important details and removes unnecessary tool-call results. Caveat: “could potentially lose details in your previous conversation.”
  • Three commands, three intents.
    • /compact — manual compaction; preserves a memory of prior work. Use when continuing the same feature but running low on space.
    • /clear — wipes everything, starts from scratch. Use when switching to a new feature so prior conversation doesn’t bias new work.
    • /context — inspect current context size + category breakdown + graphic. Diagnostic, no state change.
  • CLAUDE.md is the cross-session memory layer. Anything Claude should remember in other sessions belongs there. Avoids re-discovering things from scratch on every new session.
  • Specificity paradox. The irony of writing a shorter prompt: it actually takes up more context long-run, because Claude has to look around the codebase and do its own thinking to compensate. A clear sentence or two upfront is cheaper than ambiguity.
  • MCP server discipline. MCP servers load all of their tools into context by default. If a server is unrelated to the current project, turn it off. Reduces baseline context load.
  • Skills vs MCP for context. Skills work similarly to MCP servers but don’t put the entire thing into context — explicit endorsement of skills as a lower-context-cost alternative to MCP when both could deliver the same functionality.
  • Subagents run in parallel with a separate context window. For tasks where you only need the answer not the journey (e.g., “where are the authentication endpoints?”), a subagent does the work and returns just the summary to the main agent. Main context is preserved.

Summary heuristic (from the closing line)

“Use /compact to summarize long sessions and /clear to start fresh. To use your context window effectively, be specific with what you want.”

Why It Matters

  • Primitive-level Anthropic guidance. Short videos like this are the canonical specification for what each command does — Reddit chatter (“the 4 context tools”) is community paraphrasing of these primitives. Cite this video when teaching the commands.
  • The MCP-vs-Skills cost framing is novel. Most context comparisons focus on MCP-vs-MCP. The “skills are lower-context than MCP for the same task” framing is the operational call-to-action — prefer skills when a skill can do the job.
  • Sub-agent context isolation is the leverage primitive. For exploration questions (“where is X?”), spawning a subagent and returning a summary is the cheapest way to preserve main-thread context. Pattern works across Code, Cowork, and any agent surface with subagent support.

Try It

  1. Run /context mid-session on your most active Claude Code project. Note which category dominates — usually file reads or MCP tool definitions.
  2. If MCP definitions dominate, audit your MCP servers and turn off the ones unrelated to the current project. Compare /context before/after.
  3. Add a one-line note in CLAUDE.md next time you re-explain something Claude should have remembered. Over a week, the file becomes self-documenting cross-session memory.
  4. Next time you’d /compact after a feature is done, try /clear instead and see if the next feature lands cleaner.

Community reproduction — 161-turn refactor → 52 turns via batching plugin (2026-05-18)

[Reddit signal — r/ClaudeCode 2026-05-18] reddit-1th04si (ChampionshipNo2815, 44 score / 75 comments): A Max-plan user traced a single rename refactor across a few files. Claude Code ran 161 turns to finish it — every read, every grep, every edit is its own API call, and each one re-ingests everything before it as input tokens. By turn 60, you’re paying context cost on the entire session history. Same task re-run with a Claude Code plugin that collapses search + read into one call and stacks all edits into one round-trip finished in 52 turns. No model change, no plan change — just fewer round-trips per task. Reproduction of the “specificity paradox” framing above: tool-call-amplifies-context isn’t an abstract concern, it dominates real-world session-limit consumption. Concrete falsifiable claim (161 → 52, ~68% reduction) on a small refactor; the plugin name itself isn’t called out in the post but the pattern (search+read fusion + edit batching) is the operational lever.

Cross-tool note — building a clean context window on the filesystem (Nate B Jones, 2026-05-30)

[YouTube — "How I AI / My Weekly Codex Experiments," rqVzTX8w_w0] Nate B Jones’s weekly-experiments take adds a filesystem-as-context-assembly technique he reports works in Codex but not Claude Code or Cowork (he tried the same workflow in both): tell the agent to scan your file system and copy the relevant files — described in natural language (“this is what it’s about, roughly when I made it”), not by filename — into a clean working folder, then open a fresh chat pointed only at that folder plus the task (detailed instructions dropped in as a file). The clean, scoped working set is what unlocks 30-50K-word document work, plus complex spreadsheet/coding work, across a folder structure. He roots it in Codex’s repo/sandbox heritage — to Codex, text files and code files are the same thing in a folder.

Paired prompting shift (applies to both Claude and Codex): prompting in 2026 has moved from prompt-engineering → “point at files + tell it what good looks like (an eval, or a few standard-defining questions)” → now define the shape of the task collaboratively first (a messy human-and-agent back-and-forth) and then execute agentically — he reports 5.5 doesn’t lose the thread when you flip from “let’s define this” to “now go do it.” A useful contrast to the Claude-Code-native levers above: same goal (a clean, deliberately-assembled context window) reached by having the agent stage files rather than by /compact / /clear discipline. ^[inferred — single-operator anecdote; the “doesn’t work in CC/Cowork” claim is his test, not a benchmarked result]

The “dumb zone” — where context degrades before the limit (Cole Medin, 2026-06)

[YouTube — "How to Build Effective Claude Code Agents in 2026," Nate Herk × Cole Medin] Operators report Claude Code’s attention degrades well before the 1M-token limit — a “dumb zone” where the needle-in-a-haystack / lost-in-the-middle effect amplifies and the model “acts dumb.” Cole Medin’s subjective per-model thresholds: Opus 4.8 ~250k tokens, Opus 4.7 ~200k, Sonnet 4.6 ~100–125k. Past those points results visibly degrade — which is also why running ~20 always-on MCP servers (each stuffing ~20k tokens of definitions) hurts: you enter the dumb zone before doing any work. The takeaway reinforces the levers above (/compact, /clear, MCP discipline, subagents) but adds a where does it start heuristic the article previously lacked: keep working context comfortably under the per-model threshold, not merely under the hard limit. ^[inferred — subjective operator estimates; two operators in the source converge near ~250k for Opus, but no benchmark backs the specific numbers]

Open Questions

  • The video doesn’t quantify the “MCP-vs-Skills” context cost differential — would be useful to have Anthropic publish a token-cost table comparing a 10-tool MCP server vs an equivalent skill bundle.
  • How does /compact interact with custom instructions in CLAUDE.md — does compaction preserve or summarize them?
  • Which Claude Code plugin collapses search+read+edit into batched round-trips? The reddit-1th04si post doesn’t name it. A strong candidate is Token Optimizer (alexgreensh), whose Read-Cache dedup + Delta Mode + AST Structure Maps compress exactly these re-read round-trips — though it is not confirmed to be the post’s referent. ^[inferred]