Source: raw/CLI_vs_MCP_-_How_AI_Agents_Choose_the_Right_Tool_for_the_Job.md — YouTube primer (https://www.youtube.com/watch?v=g9JIUM0MHgQ, fetched 2026-05-20).
The reproducible-experiment framing of when AI agents should reach for raw CLI commands (bash, cat, grep, git, curl) versus MCP server tool calls. Walks three worked exercises that surface the explicit tradeoffs in token cost, context-window pressure, and capability-gap coverage. Complements the Skills vs MCP vs Plugins decision framework — that one is about extensibility-layer choice at setup time, this one is about tool-selection choice at runtime.
Key Takeaways
- CLI vs MCP is a runtime tool-selection choice, and the answer is “use both” — CLI when terminal commands map directly to the job, MCP when the abstraction or governance justifies the schema cost.
- MCP loads every tool’s schema into context upfront, even tools the agent never calls. Filesystem MCP ≈ 2,000 tokens for 13 tools; GitHub MCP ≈ 55,000 tokens for 80 tools — paid at session start, eating context and (on API pricing) money.
- CLI wins for file ops, Git, and text processing — the model knows
grep,git,curlcold from training data, so no schema is needed, and CLI composes via pipes (grep | sort | uniq -c) where independent MCP tool calls cannot. - MCP wins on three triggers: a capability gap raw tooling can’t bridge (fetching a JS-rendered Next.js page), server-managed authentication (OAuth for Slack/Notion/databases), and org-level controls (per-user access, audit trails).
- The diagnostic anti-pattern: if the agent starts reverse-engineering a JavaScript framework just to read a web page, it picked the wrong surface.
- Claude Code’s Tool Search mitigates the upfront schema tax by loading MCP tool schemas on demand instead of at session start — keep servers enabled without paying the full context cost.
The two surfaces in one line
- CLI — agent runs regular terminal commands (
ls,cat,grep,curl,git). Same commands a developer would type. - MCP — dedicated servers expose structured tools with a name, a description in English, and a JSON schema defining inputs and outputs.
Why the tradeoff is non-trivial
The argument many developers are making: MCP is paying a steep tax for knowledge the model already has.
- AI models trained on millions of CLI examples from Stack Overflow + man pages — they already know
grep -n,git log --oneline -10,curl -s, etc. No schema needed. - Every MCP tool schema gets loaded into the model’s context window at the start of the conversation. Each tool can cost hundreds of tokens.
- Filesystem MCP advertises 13 tools; even using only 2 of them, the agent loads the schemas for all 13 (a couple of thousand tokens of tool definitions).
- GitHub MCP server advertises 80 tools — ~55,000 tokens of tool definitions injected up front, even if you only use one or two of them.
- On API pricing, those tokens are actual money and eat directly into the context window space available for the actual work.
The three worked exercises
Exercise 1 — Simple file operations (CLI wins, MCP works)
Task: read notes.md, then search both markdown files in a folder for the word “agent.”
- CLI path — 2 bash calls:
cat notes.mdto dump file contents, thengrep -n agent *.mdto scan. Compact, no schema needed. - MCP path — 2 calls to filesystem server’s
read_fileandsearch_files. Worked, but the agent loaded the schemas for all 13 filesystem-server tools (11 of which it didn’t touch) — a couple of thousand tokens of upfront waste. - Verdict: Either approach succeeds, but CLI is more compact at the schema layer.
Exercise 2 — Git operations (CLI wins by a wide margin)
Task: show last 10 commits and check working-tree status.
- CLI path —
git log --oneline -10+git status. Model knows Git cold from training data — flags, format strings, all of it. - MCP path — GitHub MCP ships 80 tools, ~55K tokens of definitions injected upfront even though the agent only needs one or two.
- Verdict: For local developer tools, MCP is paying a steep tax for knowledge the model already has.
Exercise 3 — Fetch a Next.js-rendered web page (MCP wins decisively)
Task: fetch modelcontextprotocol.io and summarize the main heading + first paragraphs.
- MCP path — single call to a
FetcherMCP server (headless-browser-backed). Onefetch_urltool call, ~250 tokens, completed in seconds. - CLI path —
curl -s URL | head -200returned almost entirely JavaScript bundle code because Model Context Protocol’s site is a Next.js application. The server doesn’t send a finished HTML page — it sends a JS app that builds the page in the browser, andcurldoesn’t run JavaScript. The agent then improvised:- Chained HTML-strip / JS-filter text processors. Didn’t work.
- Tried to find page content embedded as JSON inside source code. Found fragments, not the full page.
- Wrote a Python script to reverse-engineer Next.js’s internal data-streaming format. Took several minutes and over 2,000 tokens plus local processing.
- Verdict: When raw tooling can’t bridge to what you need, MCP wins by closing the capability gap. The video’s punchline: “If the agent ever starts reverse-engineering a JavaScript framework just to read a web page, that’s a good sign it picked the wrong one.”
The decision pattern surfaced
CLI wins when:
- Commands map directly to jobs — file ops, Git, text processing, running scripts.
- The model’s training data already encodes the tool deeply (decades of Stack Overflow + man pages).
- You want compositions via pipes — chain
grep | sort | uniq -cin one line. MCP can’t compose like this because each tool call is independent.
MCP wins when:
- There is a gap between what the raw tool gives you and what you actually need (Next.js fetch is the canonical example).
- Authentication is involved — OAuth tokens for Slack, Notion, databases. With CLI, the agent has to manage OAuth tokens, look up channel IDs, handle token refresh — all manual even when the AI is doing it. With MCP, the server handles all of that. Server-managed rather than agent-managed.
- Organization-level controls matter — per-user access control, no shared credentials, audit trails for what was done. Hard to bolt onto CLI after the fact; built into the MCP protocol.
The framework’s verdict
“The answer is to use both. The AI agent I tested uses both, CLI and MCP side by side for differing tasks — CLI when the commands map to the job, MCP when the abstraction or governance justifies it.”
The choice is up to the agent and the person prompting it. The “reverse-engineering a JavaScript framework to read a webpage” anti-pattern is the diagnostic: if that happens, the agent picked the wrong surface.
Implementation
- Tool/Service: Claude Code (or any agent harness with both
Bashaccess and connected MCP servers). - Setup: No special setup required. Both surfaces are available simultaneously in most agentic harnesses.
- Cost:
- Filesystem MCP server schemas — ~2,000 tokens injected at session start (for 13 tools).
- GitHub MCP server schemas — ~55,000 tokens injected at session start (for 80 tools).
- CLI commands — zero schema cost; the knowledge is baked into the model weights.
- Integration notes:
- In Claude Code, the Tool Search indirection layer (see What’s New in Claude Code — Dixon talk) loads MCP tool schemas on-demand rather than upfront, reducing the upfront token tax. This is the harness-level mitigation for the schema-cost problem the video calls out.
- The Railway team explicitly designed their MCP server around this tradeoff — see Railway Remote MCP. They shipped 7 tools and are reducing the count, routing every multi-step operation through one
railway-agentdelegation tool. “Context is expensive on both sides.”
Try It
- Reproduce Exercise 1 — drop two markdown files in a folder. Ask Claude Code to read one and search both for a keyword, first using only Bash, then using only the filesystem MCP server. Compare turn count and observable token consumption.
- Run
/contextafter a fresh session with the GitHub MCP server enabled. Note the percentage of context consumed by tool definitions before you’ve done any work. Then disable the GitHub MCP and run again to see the delta. - Apply the decision rule to your current MCP server stack: for each connected server, ask “does the model already know how to do this from CLI knowledge it has from training?” If yes, the server may be net-negative on context. Authentication-gated services (Slack, Notion, OAuth-protected databases) almost always justify their schema cost; local-filesystem and local-git operations often don’t.
- Use Tool Search in Claude Code to keep MCP servers enabled without paying the upfront schema tax — schemas load on demand, not at session start.
Related
- Skills vs MCP vs Plugins — When to Use Which — the extensibility-layer decision framework (kitchen analogy: MCP = kitchen, skills = recipes, plugins = meal kits). Complements this article’s runtime tool-selection angle.
- Essential MCP Servers for 2026 — which MCP servers are worth their context cost; this article gives the framework for evaluating that.
- [[claude-ai/railway-remote-mcp|Railway Remote MCP and
railway agentCLI]] — Railway’s deliberate design choice to minimize tool count and route through delegation. Real-world case study of the “context is expensive on both sides” principle the video formalizes. - What’s New in Claude Code (Dixon talk) — Tool Search indirection layer that mitigates the upfront-schema-cost problem.
- Context Management in Claude Code — broader context-budget discipline.
- 18 Claude Code Token-Optimization Techniques — includes MCP per-tool result-size override (
anthropic/maxResultSizeChars) — same theme of MCP context hygiene. - Context7 — Up-to-Date Library Docs MCP — example of an MCP server that justifies its schema cost (delivers what raw CLI cannot — current versioned library docs).
- Agent Skills Overview — agent skills are the layer above raw tool selection.
Open Questions
- The video says GitHub MCP ships 80 tools / ~55K tokens of definitions. With Claude Code’s Tool Search indirection (W14+), what is the actual upfront cost after the harness mitigates the load? The W14 release digest mentions 95% context reduction; verification at-scale would be useful.
- For CLI-knowledge degradation: do model upgrades (Sonnet 4.6 → 4.7 → Mythos) measurably improve baseline CLI command knowledge, or has it plateaued for common tools (
grep,git,curl)? - The “MCP wins when authentication is involved” rule — is there a published benchmark on the failure rate of agent-managed OAuth flows vs server-managed OAuth flows in long-horizon agent runs?