Source: raw/Don_t_build_more_AI_agents_until_you_watch_this.md — YouTube, Nate B Jones (“AI News & Strategy Daily”, video id BOXK2XFLA-E)

Nate B Jones argues the next phase of agent work is maintenance, not construction: the durable, ownable layer around an agent is its harness (a.k.a. workbench), and the discipline that matters in 2026 is pruning and auditing that harness rather than piling on more tools. The anchor claim — Vercel made its sales-inbox agent better by deleting ~80% of its tools — reframes “add a capability” as the beginner instinct and “ask what to remove” as the maintenance instinct. This complements (and partly contradicts) the wiki’s Canva case study, which concludes the harness is disposable and evals are the durable asset.

Key Takeaways

  • The harness is the maintainable layer. Agent = the worker; harness = the workbench: what it reads, what it remembers, which tools it can touch, what it’s allowed to change, what proof it must bring back, and what stops it when work gets risky. The model moves underneath you; the harness is what you actually own and maintain.
  • Tool-pruning over tool-piling. “The beginner instinct is to add; the maintenance instinct is to ask what should be removed.” Concrete anchor: Vercel deleted ~80% of its sales-inbox agent’s tools and it improved — built by studying one top rep’s observed workflow, not the documented one. ^[extracted]
  • Agents break in two directions — a falsifiable framing not previously in the wiki: (1) world-drift — sources, workflows, and definitions around the agent change; and (2) model-improvement — a tool that helped a weaker model can confuse a stronger one, and a guardrail that protected you from a clumsy model can trap a better one.
  • Agents inherit the crud of the systems around them. A stale wiki page, a renamed CRM field, or a drifted dashboard definition is merely annoying to a human but dangerous to a proactive agent that produces convincing output from a bad source.
  • Frontier labs are betting better models ship the harness faster. Codex and Claude Code are framed as carefully-maintained harnesses (terminal, browser, computer-use, plugins, memory, approvals, sandboxing, logs) — a compounding flywheel where the model improves the workbench that runs the model.
  • The more custom the harness, the more you own the upkeep. Light harness = instructions + memory + source folders + repeatable methods + proof/escalation rules. Deep harness = data feed, review screen, permission levels, logs, model choice, escalation paths, approval rules — plus a plan for what happens when the model changes.

The 5-point agent-maintenance audit

Run this on any agent you already operate (not a build checklist — a maintenance checklist):

  1. What’s it eating? Are its sources still current? Did the workflow move? Did an old source quietly become misleading?
  2. Test its reach. Is it read-only, or can it draft, create tickets, post to Slack, update records, spend money, publish? Re-verify the blast radius.
  3. Check its job. Don’t let a summary agent silently become a planning agent. Change the job on purpose, not by drift.
  4. Check the proof. Demand a linkable trail a human can inspect — linked tickets, quoted customer language, an explicit list of sources it couldn’t access — not the agent’s own say-so.
  5. Check the value. Does anyone read its output? Does it save time after review? Should it be rebuilt or retired?

Try It

  1. Run the 5-point audit on one agent you already operate. Write the answers down. The “check its reach” and “check the proof” steps usually surface the most surprising drift.
  2. Do a tool-pruning pass. List every tool your agent can call, then ask of each: would removing it make the agent worse? Delete the ones where the answer is no, and re-measure.
  3. Write the “when the model changes” plan. For your deepest custom harness, note which guardrails exist only because the old model was clumsy — those are the first candidates to relax when you upgrade.

Open Questions

  • The ~80% tool-deletion figure for Vercel’s agent is stated in the talk without a primary source link — worth corroborating against a Vercel writeup before treating the exact number as established. ^[ambiguous]
  • Jones frames the harness as the durable asset; Canva frames it as disposable. The reconciliation (which parts of a harness are durable vs throwaway) is an open synthesis candidate.