Source: raw/Finally._Agent_Loops_Clearly_Explained..md — Nate Herk (“AI Automation”, @nateherk), “Finally. Agent Loops Clearly Explained.” (YouTube, https://www.youtube.com/watch?v=EuzYhzB0vbI, 14:33, 2026-06-19).
Nate Herk’s grounded explainer on agent loops — valuable less for the definition (which matches the rest of this topic) than for its deliberate counterweight to the hype: most tasks don’t need loops, fleets of 24/7 agents are overkill for most people, and whether any of this applies depends on your role. Coming from a knowledge-worker (non-coding) perspective, Herk reframes loops around the one piece that actually pays off for everyone — verification — and demonstrates it with three worked /goal runs pulled from Matthew Berman’s Loop Library. Treat the mental model (Reason-Act-Observe + the quality-vs-attempts curve) and the per-artifact done-criteria breakdown as the reusable parts; the “here’s how I personally use them” framing is opinion.
Key Takeaways
- The grounding message (the distinctive bit): “The majority of tasks don’t need loops.” You are not falling behind if you aren’t running five agents orchestrating five sub-agents 24/7. Cargo-culting Peter Steinberger’s always-on fleets “doesn’t mean it applies to you and your use case” — he’s a hardcore coder at OpenAI; a knowledge worker’s needs differ. Scaling a workflow you don’t understand just “scales problems.”
- Loop = trigger + action + stop condition — the same definition the rest of this topic uses (see Write Loops, Not Prompts). Herk quotes the canonical thesis verbatim: “Loop engineering is replacing yourself as the person who prompts the agent” (see Loop Engineering — Addy Osmani).
- The core mental model is Reason-Act-Observe (ReAct): the agent reasons on what to do, acts, observes the result, and repeats until the done-criteria is met — “a smart intern you don’t micromanage.”
- Why loops win — the quality-vs-attempts curve. AI never one-shots to 100%; quality climbs with each feedback-and-iteration pass. A loop outsources that feedback loop to the agent instead of the human, so you reach 90-95% in 3-4 autonomous passes instead of babysitting each one. Loops “are not supposed to give you 100% perfect output — they’re supposed to help you get much closer on the first try.”
- Verification is the part that pays off for everyone. Herk now wraps most tasks in a loop — not for fleet-scale autonomy but purely for the verification/iteration. “A loop is only as good as its done-check.”
- Solo loops beat fleets for most people. Three topologies: solo loop (one agent reason-act-observe — his most common), maker-checker (one does, one grades), manager + helpers (one orchestrator). Most work needs only a solo loop + a good prompt in one terminal session — no “massive agent architecture.”
- Two questions before you build any loop: (1) What does “done” mean? — get the metric as objective as possible (“keep iterating until X = Y”); (2) How will it check? — give the agent the right tools to verify that specific artifact type.
- Match the run length to the payoff. Herk has run loops 12+ hours and found them “not super useful”; his sweet spot is ~35 min to a couple hours, or a “chunky loop before bed” for an experimental overnight run he then iterates on by hand. He doesn’t need 3-4-day runs.
The two pillars: an objective goal + a way to check
Herk reduces a good loop to two pillars, both of which the human owns:
- Goal (definition of done). Humans are good at defining the end state. Make it objective where possible — the cake-fork test (“comes out clean = done”). The best loops say “keep iterating until X metric equals Y result.” When the goal is unavoidably subjective (“until you’re satisfied / 100% confident”), the loop gets more brittle — the fix is to push the criteria toward something measurable, e.g. spin up a dedicated scorer sub-agent and validate its scoring against evaluations so you can trust it.
- Verification (how it checks). After acting, the agent must observe — visual verification, a code test, whatever proves the artifact. Verification looks different per artifact type, and it’s your job to give the agent the right tools:
- A game: check visually, check functionally, and actually play the levels to see if anything breaks.
- A script: no visual check needed — check flow, check it matches your tone of voice.
- A video (his own most-common loop, editing with HyperFrames in Claude Code via
/goal): get the transcript, cut mistakes/pauses, make and sync the beats, render, then verify every beat is in-bounds and lines up with the transcript. “That’s how people say ‘I did this in one shot’ — because it was a loop, with verification and iteration.”
Worked /goal examples (from Berman’s Loop Library)
Herk runs three [[claude-ai/claude-code-goal-command-walkthrough|/goal]] loops live, two lifted straight from the Loop Library — useful as concrete demonstrations of the verification cadence:
- Thumbnail concepts (27 min, Claude Code). “Make 10 thumbnail concepts, score each against MrBeast thumbnails on a rubric (clarity at small size, curiosity, emotional pull, contrast), pick the top 3, find the weakest part of each, improve, rescore, and keep iterating on the strongest until satisfied.” It produced 10, narrowed to 8, made V2s, then a winning #8 V3. Weakness: “until satisfied” is subjective — the improvement would be an objective scorer sub-agent.
- Three.js spinning plane (37 min). Build, then verify by opening the browser, spinning it, checking it renders, and iterating. Not perfect, “but so much better than if I’d just said ‘build me a 3D plane.‘”
- Abbey Road in pure CSS/HTML, no image-gen (stopped at V7). Done-criteria: “if the average score is ≥ 9, stop” plus a hard cap of 8 passes. Each pass rendered the HTML in a browser, screenshotted it, and compared against the reference — visibly improving V1→V7. The output still looked nothing like the photo (image-gen would’ve been closer), but it demonstrates the screenshot-verification loop and why a hard stop matters when the done-criteria may never be reachable.
The throughline: a loop is only as good as its done-check, and a hard cap is what saves you when the goal is too hard to ever satisfy.
What makes a loop actually work
Herk’s checklist (a compact restatement of the topic’s primitives — see Loop Engineering for the full catalog):
- a checkable goal · a hard stop · good tools · memory · a separate checker · planning first · logging · and cost that makes sense (loops with a hard goal and a hard done-criteria can run forever — bound them).
Related
- Agent Loops (topic index) — the learning path this explainer slots into as a grounded, practitioner counterweight.
- Write Loops, Not Prompts — the same trigger/action/stop definition and beginner starter loops.
- Verifier-First Loops — the verification discipline (maker/checker, proof outside the agent) Herk centers his whole take on.
- LOOPS — Everything You Need to Know (Matthew Berman) — the trigger × goal framework + the Loop Library the worked examples come from.
- Loop Library (Forward Future) — the catalog Herk pulls the thumbnail and three.js loops from.
- Should You Build a Loop? — the economics layer behind “most tasks don’t need loops” and “match run length to payoff.”
- The Loop Is the Unit of Work — the maker/checker-as-shippability synthesis Herk’s solo/maker-checker/manager topologies map onto.
- Nate Herk — Claude Code Operating Systems Course — the same creator’s broader Claude Code system.
Try It
- Default to a solo loop, not a fleet. For most tasks, wrap one Claude Code session in a
/goalwith a clear done-criteria — skip the multi-agent architecture unless you genuinely need scale. - Answer the two questions first: write down what “done” means (as an objective metric if you can) and exactly how the agent will check it for this artifact type.
- Always set a hard cap (e.g. “stop at avg ≥ 9 or after 8 passes”) so a too-hard goal can’t run for days.
- Add verification tools to match the artifact: browser + screenshots for UI, a test runner for code, transcript/beat checks for video.
- When a score is subjective, make a scorer. Spin up a dedicated scoring sub-agent and validate it against examples before trusting its grades.
- Use overnight runs experimentally: fire a chunky loop before bed, then iterate on the output by hand — don’t assume a 4-day autonomous run is the goal.
Open Questions
- Herk works in knowledge-work, not large codebase refactors; how far the “solo loop + verification” emphasis transfers to team software engineering (where always-on fleets may pay off) is left open by his own admission.^[inferred]
- The slide deck and full audit he references are gated behind his Skool community and were not ingested here.