Source: Beyond_the_basics_with_Claude_Code.md — YouTube https://www.youtube.com/watch?v=tuY2ChJIx48, Daisy Holman (engineer on the Claude Code team at Anthropic, ex-chair of the C++ committee), Code with Claude London 2026, ~45 min, uploaded 2026-05. Whisper-fallback transcript — direct quotes are paraphrased rather than verbatim.

Daisy Holman’s London 2026 talk is pitched one rung above the introductory Claude Code material — it is explicitly about agentic software engineering in monorepos with thousands of engineers and tens of thousands of skills, not single-developer programming tasks. The thesis: Claude Code out-of-the-box is fine for zero-to-one work, but at professional scale you have to engineer the context window the way you would engineer code on an Arduino — every byte is contested, and most of the plugin primitives that look free are not. The talk walks through the three things Claude needs (access, knowledge, tooling), the hard ceiling of the context window plus its KV-cache constraint, and then ranks the four plugin primitives (MCP, skills, hooks, subagents) by how well they actually scale. The closing third covers async + parallel workflow primitives Anthropic uses internally on the Claude Code team itself.

Key Takeaways

  • Three customization categories Claude needs at scale: access, knowledge, tooling. Access = team chat, CI/CD, dashboards, internal docs (so Claude sees the why of a task, not just the source code). Knowledge = your code base’s conventions and institutional memory, delivered via in-context learning because fine-tuning underperforms (specialized fine-tunes hallucinate more per late-2025 research, and frontier models churn faster than a fine-tune cycle). Tooling = the agentic equivalents of syntax highlighting, LSPs, code completion, and red squigglies.
  • The context window is a fixed resource. Frontier models have been roughly 1M tokens for about a year and the model gains over that period came from elsewhere, not from bigger context. Treat it as a hard constraint, not something that will be papered over by future scale-up.
  • The KV cache makes naive LRU eviction unusable. Evicting an early-prompt token invalidates the entire downstream cache, and uncached tokens cost ~10× more. Practical rule: put stable shared content (system prompt, tool definitions, persistent skills) at the front of the context, and volatile per-task content at the end where it can be dropped cheaply. Early agent harnesses borrowed L1-cache LRU thinking; at 1M tokens with KV caching it is the wrong model.
  • Plugin primitives ranked worst-to-best at monorepo scale: MCP → Skills → Subagents → Hooks.
    • MCP is worst at scale: every tool’s name + description + schema sits in the system prompt unconditionally. Twenty servers × 15 tools each and most of your context is tool definitions. Designed in the chatbot era where the agent had no shell. Still appropriate for public integrations (auth, transport, non-technical end users) but a CLI + skill is almost always the right call for in-monorepo developer experience.
    • Skills are better but not zero-overhead: the body is lazy-loaded, but the description sits in the system prompt always (a paragraph, ~300-400 tokens for reliable triggering). No defined hierarchy / lazy sub-skills yet — Daisy hints that’s coming in the next couple of weeks.
    • Subagents push the system prompt into a child context, so only the description (one-liner) and the tool-call result land in the parent. Still pay the one-liner-per-subagent tax at 100k subagents.
    • Hooks are the only true zero-overhead abstraction. They run outside the context window. 100,000 hooks where 99,995 don’t match cost zero tokens — your only constraint is your machine. A constrained resource has been blown out into an unconstrained one.
  • Tool search lazy-loads MCP definitions but isn’t a free lunch. Names go in the system prompt, full schema loads on demand. The catch: Claude won’t search for a tool unless something specific in the user’s intent triggers it. Generic tools (edit, bash) still need their full schema in the prompt. The more description you trim, the less reliably Claude searches.
  • Distinguish “tools that scale with intelligence” from “tools that compensate for a lack of intelligence.” Red-squiggly nudges (over-ridable reminders via post-tool-use hooks) scale with intelligence — smarter models use them better. Hard blocks that forbid Claude from ever doing X compensate for current limitations and don’t compound as models improve. Default to the nudge.
  • Why Anthropic refused to let plugins ship their own CLAUDE.md. It looked like the most common feature request, but it would have been the most expensive abstraction in the catalog — every plugin would have shipped one unconditionally. The official escape hatch is a SessionStart hook that returns text into context: same effect, but the cost is visible to the user instead of hidden.
  • Memory is for low-quality, short-lived information. Plugins are context-engineering primitives — evaluated, iterated, deliberate. Memory is fire-and-forget. Don’t confuse the two.
  • Context-switching workflow at scale = worktrees + /color + /rename + persistent named agents. Daisy runs long-lived worktrees that track upstream main, each with a persistent named Claude Code agent that owns its directory. /color is “syntax highlighting for the agentic era” — color triggers human memory fast enough to cut context-switch latency. /rename does the same for the color-blind.
  • /loop is the cron tool exposed as a slash command. Runs a prompt every N minutes; Claude can self-disable when the prompt no longer applies. Daisy’s headline use case: babysitting PRs through CI overnight. Even CI runs that take two hours can be left for a day and a half — Claude fixes failures as they appear, pipelining the engineer’s day.
  • Auto-permissions mode is required infrastructure for loop / agent teams / overnight. It’s not --dangerously-skip-permissions; under the hood it’s a classifier agent plus an adversarial-check sub-pass that vets each tool call. Cost premium roughly 30-40% more tokens (and decreasing). Without it, /loop, Claude Agents, and overnight work all stall on permission prompts.
  • Other newly-mentioned surfaces: Claude Agents view (one place to see every running agent with state classification, peek-in, prompt-from-dashboard), /remote-control (phone + Claude Code desktop status check — “30-second after-dinner check-in”), and a send-message tool relaxing across-instance restrictions so Claude instances on the same account can talk to each other directly.
  • Three take-homes Daisy closes on: give Claude access to everything you alt-tab to, mind the context window, pick plugin abstractions that scale to 100k.

Details

Three categories of customization

Daisy’s framing is that out-of-the-box Claude Code sees a repo in a shell, which is enough for greenfield projects with no conventions, no debt, no external stakeholders. Anything beyond that needs three things:

  • Access — team chat (where decisions are actually made), CI/CD (Daisy: stop fixing CI failures yourself, agents are very good at it), production dashboards, internal design docs and runbooks, and recorded/transcribed meetings. Her concrete habit: feed meeting transcripts into Claude immediately after the meeting, ask for low-hanging fruit, get two or three PRs per meeting. The diagnostic she recommends: spend one day trying to never leave the Claude Code surface, write down every alt-tab on paper, then wire each of those tools up — the gap is much larger than you notice until you close it.
  • Knowledge — code base conventions, institutional memory, internal vocabulary, internal APIs. Fine-tuning gets asked about constantly and underperforms: late-2025 research shows specialized fine-tuning increases hallucination, and at frontier-model churn rates the economics don’t work even for very large companies. Everything has to go through in-context learning (ICL). Daisy’s aside: ICL is “a fancy word for text files,” but ICL still has real optimization headroom past “drop a text file in.”
  • Tooling — the agentic equivalents of what professional human developers have used for years: syntax highlighting, LSP, code completion, and especially red squigglies. Post-tool-use hooks are where red squigglies live for agents: run linters, run type-checkers, run LSP feedback, run “this is a generated file, please don’t commit it” reminders. The fastest way to make Claude better on your codebase is not a smarter model — it’s a tighter feedback loop, and you usually already have all the scripts because you needed them for humans.

Why the context window is a hard ceiling

Frontier-model context windows have been ~1M tokens for around a year. 200k-token models are now rarer; 1M is the modal frontier. Models from a year ago vs. today differ in many ways, but context window size is not one of them. So context engineering is happening against a fixed target. The constraint is real, not a stop-gap.

Daisy’s metaphor: it’s like running npm install on an Arduino. There’s no margin for shipping packages willy-nilly. The C++-derived principle she invokes is “don’t pay for what you don’t use” — zero-overhead abstraction — and that principle is the lens she uses to rank the four plugin primitives.

The KV cache complicates the analogy. Her first instinct on seeing the problem was that it looked like an L1 cache (LRU evict the least-recently-used tools, swap in the new ones). It doesn’t work. Changing anything early in the prompt invalidates the KV cache for everything after it. Uncached tokens cost roughly 10× cached ones. So an LRU-style tools block — evict an unused tool, slot in a new one — has to recompute the entire downstream context every time, which is more expensive than just keeping the unused tools. LRU made sense in the 32k-token era when KV caching was less effective; at 1M tokens with cheap-cached-vs-expensive-uncached economics, the right architecture is stable content at the front, volatile content at the end.

Plugin primitives, ranked by scalability

The talk’s central drill-down. Each primitive is examined against the “what if you had 100k of these in your monorepo?” stress test.

MCP. Designed in the chatbot era — agents were simpler, the LLM had no shell, no filesystem, no CLI. MCP injected tools into context and handled transport, auth, and process lifecycle. Strengths: transport-agnostic, auth handled, the right surface for an external company shipping a public Claude integration. At monorepo scale, however, every tool’s name + description + schema sits in the system prompt; twenty servers of fifteen tools each and most of your context is tool definitions. Tool search mitigates this by lazy-loading the schema, but Claude only searches when user intent points at a tool clearly (Slack, etc.), so generic tools (edit, bash) still need their full schema up front. For in-monorepo developer experience where the user already has a CLI and access, a skill that explains how to use the CLI almost always beats wrapping the CLI in MCP.

Skills. “Lazy system prompt” — a CLAUDE.md-like file with a one-line description in front matter. The description always loads; Claude has a tool to read the full skill body on demand. Body is lazy, description is not — and reliably triggering a skill from a sparse description often requires a paragraph (300-400 tokens). No hierarchy / lazy sub-skills yet, though Daisy hints that’s shipping “next couple of weeks.” So skills are better than MCP at scale but still not zero-overhead; they were designed before the “monorepo with 100,000 skills” use case became real.

Subagents. Subagent description (one-liner) sits in the parent prompt; the subagent’s system prompt lives in a separate context. The main loop only pays for the tool call and the result coming back. A subagent can read 50 files so the main loop doesn’t have to. Still pay the one-liner-per-subagent tax at 100k subagents, and Anthropic is experimenting with ways to do better there too.

Hooks. The only true zero-overhead abstraction in the catalog. JSON-in / JSON-out events called by the harness on the user’s machine. 100,000 hooks where 99,995 don’t match returns no text into the context window and costs zero tokens — the only constraint is your computer. A previously constrained resource (context tokens) has been blown out into an unconstrained one (compute on the user’s box). Not perfect — hooks deal in regex and word-parsing on the way in, which isn’t AGI-pilled, and you can call subagents from inside a hook to make smarter decisions at a token-cost premium. But on the scaling axis it dominates everything else.

Why “let plugins ship a CLAUDE.md” was rejected

The single most common request Daisy gets on plugins. It was rejected because it looked free and wasn’t: every plugin would have shipped one, every plugin would have appended unconditional text to every user’s context, and the cost would have been hidden from the user. The official escape hatch is a SessionStart hook that returns text. Functionally identical; the difference is the user can see they’re paying for it.

This is also why memory is treated as a different animal. Memory is fire-and-forget low-quality short-lived info. Plugins are iterated, evaluated context-engineering primitives. Don’t confuse the two.

Async + parallel = context-switching workflow

The closing third is about how Anthropic actually uses Claude Code internally on the Claude Code team. Two themes: async (walk away, come back) and parallelism (multiple things at once). The combination forces you to be good at context switching, which most engineers (including Daisy, who used to give talks on C++ template metaprogramming and loved 8-hour flow states) initially hate.

Worktrees are the entry point: separate checkouts of the same repo on the same machine, one Claude Code instance per worktree. Different branches that all track upstream main. Daisy’s setup is two long-lived worktrees for the two Anthropic monorepos (since most companies have two monorepos at most) plus the Claude Code repo, all with persistent agents that own their directories.

/color and /rename are explicitly framed as “syntax highlighting for the agentic era” — visual + nominal hooks that snap the human brain back into context for a given session faster than reading the scrollback.

/loop (internal name: cron tool) runs a prompt at fixed intervals; Claude can self-disable when the prompt no longer applies. Daisy’s headline use case is pipelining PRs through CI overnight — even two-hour CI runs become irrelevant because the engineer comes back to fixed failures.

Auto-permissions mode (classifier agent + adversarial sub-pass checking each tool call) is what makes /loop, the Claude Agents view, and overnight work usable — without it, every async workflow stalls on permission prompts. Token cost ~30-40% extra and dropping.

Claude Agents (one dashboard for every running agent) and /remote-control (phone + desktop status check) round out the surface. Send-message-tool is being relaxed so Claude instances on the same account can talk to each other directly — the parallel agent’s conversation becomes another place Claude needs access to.

Try It

  • Run Daisy’s alt-tab audit for one workday: don’t leave the Claude Code terminal voluntarily; every time you reach for another tool, write the tool name on paper. At end of day, wire each one up (MCP, skill, hook, or SessionStart injection). The gap is bigger than you expect.
  • Audit one MCP server you wrote and decide whether it could be a skill + CLI instead. If the user is a developer in your monorepo with shell access, the skill version is almost always cheaper at scale.
  • If you maintain a plugin that “wants” to ship a CLAUDE.md, move that text into a SessionStart hook return so the cost is visible to the user.
  • Turn on auto-permissions mode and try /loop as an overnight CI-fixing baby-sitter on one real PR. Watch what happens to your morning queue.
  • Set up persistent named Claude Code agents per worktree on at least one monorepo. Use /color + /rename per session and see whether per-switch latency drops.
  • For any new “should this be a hook or a skill?” decision: if it can be a hook, default to hook. Hooks are the only primitive that costs zero context unless they fire.