The Best Way to Vibe Code (Matthew Berman) — Automations, Loops & the Loop Library

Source: raw/I_figured_out_the_best_way_to_vibe_code.md (YouTube — Matthew Berman, “I figured out the best way to vibe code”, https://www.youtube.com/watch?v=wwfJlSF34n8) + raw/reddit-1uomrr8.md (Reddit r/ClaudeAI, u/MDawg74, “Retired Disabled Army combat vet, no coding background… shipped tank game,” 2026-07-06)

A practitioner walkthrough of how Matthew Berman — a creator who uses every major agentic coding tool professionally — actually structures his AI coding workflow. Berman’s thesis: beginners prompt, wait, review, and prompt again; experts automate the whole loop. Most of the conceptual spine here (a loop is a trigger + repeated action + goal; worktrees; multi-model; agents.md/skills) is already covered in depth by the rest of this topic, so this article focuses on Berman’s distinctive, concrete contributions: a free Loop Library, three named loops he runs nightly, the test/docs/logging flywheel, the automation race-condition trick, and his honest “still unsolved” call on parallel merge-and-deploy. As a creator video, the macro framing (“there are levels to AI coding”) is opinion; the loops and setups below are the reproducible parts.

Key Takeaways

The expert move is automating the prompt-review-reprompt cycle, not prompting faster. Berman’s primary harnesses are Cursor and Codex; he rates Claude Code, Devin, and Factory highly too (he uses Claude Code less only because he burns quota fast).
New named resource: the Loop Library — a free, community-submittable catalog of reusable agent loops at signals.forwardfuture.ai/loop-library, announced for the first time in this video. It is the Berman-curated counterpart to Cobus Greyling’s pattern registry.
A loop = three things: (1) a trigger to start it, (2) an action it repeats, (3) a goal that stops it. (Same definition the rest of this topic uses — see Write Loops, Not Prompts.)
Automations and loops pair up: an automation fires an agent prompt on a trigger (schedule or GitHub event); a loop is the repeated-until-goal action it kicks off. Cursor and Codex both ship automations as a first-class feature.
The flywheel: keep 100% test coverage, always-fresh docs, and exhaustive logging in every codebase. Each is maintainable by a standing automation — and full logs are what let an agent autonomously diagnose and fix production errors.
Parallel merge + deploy at agent scale is genuinely unsolved (Berman’s words, after talking to the OpenAI and Cursor teams). A dozen agents racing to main serialize on CI/deploy and force each other to rebase and restart. Partial workaround: batch commits.
Go cloud when running many agents in parallel; stay local for speed, control, and newest features. Berman is migrating his whole workflow to cloud agents because 12-20 local agents crush his machine.
Encode the multi-model routing as a skill (his example: plan with Fable, write code with Composer, review with GPT-5.5) — driven by speed and cost, not capability worship.
A no-code counter-example shows the discipline matters more than the tooling. A non-technical builder with no IDE, no Claude Code, and no automation loops shipped a real game (Vibe Tanks, a single 130KB HTML file) using pure manual discipline — spec every feature, test every build on real devices, reject what failed, keep a byte-identical rollback before every change. See the case study below.

The Loop Library (the new thing here)

Berman built and is hosting a free Loop Library at signals.forwardfuture.ai/loop-library — loops he uses, loops others use, and a submission path for your own generalized loops. It is meant to be a concrete antidote to “handwavy theoretical” loop talk. Treat it as a companion to the wiki’s existing loop references rather than a replacement: where loop-engineering gives a taxonomy + readiness ladder + failure catalog, the Loop Library is a copy-and-run recipe collection.

Three loops Berman actually runs

Each is a standing automation (scheduled) wrapping a goal-bounded loop. These are the concrete recipes worth stealing:

Overnight docs sweep. Trigger: nightly (e.g. 1:00 AM automation). Action: review the full codebase, compare yesterday’s changes against the docs, update any gaps (public README + internal docs). Goal: docs reflect the latest code; open a PR with the changes.
Sub-50ms page-load loop. Trigger: manual/on-demand. Action: load every page, modal, and sidebar; for anything over 50 ms, optimize queries and the site. Goal: every surface loads in under 50 ms — “continue until everything loads in under 50 milliseconds.” Berman let this run for hours and reports the app ended up “lightning fast.” (This is the most novel recipe — a performance loop with a hard numeric stop condition.)
Production error sweep. Trigger: nightly. Action: read production logs, find errors, analyze the cause, write a fix. Goal: a PR waiting for every error by morning. Depends on full log coverage (see the flywheel below).

Automations: trigger -> prompt -> tools (and the race-condition trick)

In both Cursor and Codex, an automation needs a trigger, a prompt (instructions), and optionally memories / tools / MCP servers. Berman’s worked example automates code-review follow-through:

Trigger: GitHub “pull request opened.”
Problem: the PR opens before the code-review bot finishes reviewing.
The trick: write the dependency in plain English — “wait until you see [the reviewer’s] comments on the PR” — and the agent will literally wait. Then: “go through each comment, address them, push the new code back to the PR.”
Cursor auto-detects tools the automation needs (here, the GitHub “comment on pull request” tool) and prompts you to enable them.

Codex is equivalent: build the automation via natural-language chat or manually (title, prompt, repo, schedule, memories, tools). The reusable insight is the “wait until you see X” pattern for sequencing an automation behind an async dependency without polling logic.

Sponsor disclosure

The code-review tool in Berman’s example is Greptile (the video’s paid sponsor) — connect it to a repo and it auto-reviews every PR with a summary, a 0-5 merge-confidence score, a change flowchart, and copy-paste fix prompts. The reusable part is the review -> auto-fix -> resubmit automation pattern, which is vendor-independent (it works with any PR-review bot, including autofix-pr` cloud CI loop). Treat the specific tool endorsement as sponsored.

The test + docs + logging flywheel

Berman’s “no reason to have suboptimal code anymore” argument rests on three standing automations:

Tests: an automation checks coverage; if it is not 100%, it writes tests until it is.
Docs: the overnight docs sweep keeps every change documented.
Logging: log everything (a 7- or 30-day window is cheap) — because exhaustive logs are the raw material the production-error-sweep loop consumes. No logs, no autonomous error fixing.

The three reinforce each other: perfect tests + perfect docs + perfect logs make the codebase legible enough for agents to maintain it unattended.^[inferred — Berman states the three practices and calls them a flywheel; the “legibility enables unattended maintenance” framing connects them to this topic’s verification thesis]

The unsolved problem: parallel merge & deploy

Berman is candid that merging and deploying many parallel agents is not solved — a rare honest negative result for a creator video, and one the rest of this topic only touches obliquely:

A dozen agents all trying to land on main near-simultaneously serialize on CI/deploy. Each merge invalidates the others, which must rebase, re-run tests, and merge again — then re-trigger CI/deploy. They “stumble over each other” and lock the commit/deploy process.
Partial workaround — batch commits: open many PRs, then let a single agent review all the changes, combine them, and merge + deploy once. “Far from perfect.”
Berman claims Cursor announced (the day of recording) it is building its own Git alternative purpose-built for agent-scale deployment.^[Berman’s claim, as stated in the video — not independently verified here]

Cloud vs local agents (his decision criteria)

	Cloud agents	Local agents
Parallelism	Effectively unlimited (datacenter, not your CPU/RAM)	Bound by your machine; 12-20 agents “slows to a crawl”
Isolation	Each agent gets a fully isolated environment — no cross-agent file conflicts	Even per-agent worktrees hit “weird edge cases”
Access	From anywhere, incl. mobile apps	Tied to the machine (remote control possible, but bandwidth-bound)
Speed	Pays spin-up latency per agent	Faster — environment is always ready
Control / visibility	Less direct	More — you see files change locally
Feature freshness	Newest features arrive later	Newest features ship here first

Cloud-specific perk he calls out: Cursor cloud agents auto-record a video + screenshots of the changes so you can verify visually instead of trusting a text summary. Setup caveat: give a cloud agent the full environment (.env.local, client secrets, env vars) you’d give a local one — treat it as a real environment. Berman’s net lean: moving his entire workflow to cloud.

Case study: manual verification discipline with zero coding background (Vibe Tanks)

Added 2026-07-06, from a Reddit post (r/ClaudeAI) by a retired, disabled Army combat veteran with no coding background.

Berman’s playbook above assumes a professional creator running Cursor, Codex, worktrees, and standing automations. This case study is close to the opposite persona: a solo, non-technical builder working entirely inside Claude’s web chat interface (no IDE, no Claude Code, no automation loops), who took a 17KB prototype through roughly 250 iterations with Claude to ship Vibe Tanks — a free, playable browser game packed into a single 130KB HTML file, covering a 60Hz deterministic simulation, procedural graphics, a synthesized soundtrack that speeds up as matches heat up, a humanized AI opponent, stats/trophies, PWA install, and a Cloudflare Worker feedback backend.

The builder’s self-described discipline, standing in for any automated flywheel:

Spec every feature before asking Claude to build it.
Test every build on real devices — not just in a browser dev tool.
Reject what failed rather than accepting a build that mostly works.
Keep a byte-identical rollback before every change, so any regression can be undone exactly.

This is a manual, human-run version of the discipline the wiki elsewhere calls verifier-first (see Verifier-First Loops): gate every change on a real check and don’t trust the model’s self-report — the same idea Berman’s flywheel automates with standing test/docs/logging jobs, here run entirely by hand by someone with no coding background at all.^[inferred: the framing against verifier-first loops and Berman’s automated flywheel is this article’s synthesis; the source describes its own discipline but does not reference either] The builder reports Claude shipped silent bugs more than once — including a self-recursing audio compressor that killed performance for days until the frame loop was instrumented to hunt it — and credits the reject-and-rollback discipline, not any automated safety net, with keeping the project shippable. Free to play at hammeranvilbrew.com/vibetanks, no ads or accounts.

Already covered elsewhere (pointers, not repetition)

Berman also walks through fundamentals this topic and claude-ai/ already document — read those for depth:

What a loop is -> Write Loops, Not Prompts and The Loop Is the Unit of Work.
Worktrees for parallel agents (“a second working folder… resolve conflicts at merge”) -> Worktrees; also primitive #2 in loop-engineering.
agents.md / CLAUDE.md / Cursor rules (commit style, message format, model personality, coding prefs; Cursor “rules” write to agents.md and auto-capture learned preferences) -> CLAUDE.md primer.
Skills (“anything you do more than once, make it a skill”; off-the-shelf skills install by pasting a URL; agents auto-discover which skill to use at runtime; use cases: repeated prompts, domain rules, tool instructions, quality gates) -> agent-skills (Addy Osmani), whose spec->PRD->implement->test->QA->deploy lifecycle matches the ~61k-star “agent skills” repo Berman recommends.^[inferred — Berman names an off-the-shelf “agent skills” repo at ~61k stars with that exact lifecycle; the wiki’s documented match is Osmani’s agent-skills]
Multi-model routing as a skill (plan/execute/review across model families for speed + cost) -> Verifier-First Loops (split plan/execute/evaluate across models) and modes are tool subsets.

Agent Loops (topic index) — the learning path this article slots into as a practitioner case study.
Write Loops, Not Prompts — the one-sentence loop definition + three beginner starter loops; the conceptual prerequisite to Berman’s recipes.
Loop Engineering (Cobus Greyling) — the taxonomy/readiness/failure-catalog reference; Berman’s Loop Library is the recipe-collection counterpart.
Loop Library (Forward Future) — the dedicated article on the Loop Library resource itself (catalog + the architecture-satisfaction loop example).
LOOPS — Everything You Need to Know (Matthew Berman) — Berman’s dedicated loops explainer: the trigger × goal framework, the full 7-loop catalog with verbatim prompts, and the two caveats (goal-design is hard; loops are expensive).
Should You Build a Loop? — the cost/security decision layer behind running nightly loops like Berman’s.
Verifier-First Loops — the verification discipline that hardens the docs/error/test loops above.
Worktrees — the parallel-isolation mechanism Berman relies on locally.
Reflecting on a Year of Claude Code — Boris Cherny’s first-party “my job is to write loops” + /babysit PR loop, the same automation thesis from inside Anthropic.

Try It

Bookmark the Loop Library (signals.forwardfuture.ai/loop-library) and lift one recipe — the production-error-sweep or overnight-docs-sweep is the lowest-risk place to start.
Stand up the flywheel prerequisites first: turn on exhaustive logging (7-30 day retention) and a coverage check, so an error-sweep loop has logs to read and a test bar to hold.
Build one automation in Cursor or Codex: trigger = “pull request opened”, prompt = “wait until you see the review comments on the PR, then address each one and push the fix back,” and enable the GitHub comment tool when prompted.
Bound every loop with a goal, not just a schedule — e.g. “continue until every page loads under 50 ms” — so it stops on a verifiable condition (pair with the cost controls in Write Loops, Not Prompts).
Decide cloud vs local by parallelism: if you routinely run 10+ agents, move them to cloud (and provision their .env/secrets); keep local for fast, high-control single-agent work.

Open Questions

Parallel merge-and-deploy has no clean solution yet; whether Cursor’s claimed agent-scale Git alternative (or batch-commit conventions) actually fixes it is unverified.
The Loop Library is new and thinly populated at launch; the durability and breadth of its recipes is unproven.

Jonathon's AI Wiki

Explorer

The Best Way to Vibe Code (Matthew Berman) — Automations, Loops & the Loop Library

Key Takeaways

The Loop Library (the new thing here)

Three loops Berman actually runs

Automations: trigger -> prompt -> tools (and the race-condition trick)

The test + docs + logging flywheel

The unsolved problem: parallel merge & deploy

Cloud vs local agents (his decision criteria)

Case study: manual verification discipline with zero coding background (Vibe Tanks)

Already covered elsewhere (pointers, not repetition)

Try It

Open Questions

Graph View

Table of Contents

Backlinks

Jonathon's AI Wiki

Explorer

The Best Way to Vibe Code (Matthew Berman) — Automations, Loops & the Loop Library

Key Takeaways

The Loop Library (the new thing here)

Three loops Berman actually runs

Automations: trigger -> prompt -> tools (and the race-condition trick)

The test + docs + logging flywheel

The unsolved problem: parallel merge & deploy

Cloud vs local agents (his decision criteria)

Case study: manual verification discipline with zero coding background (Vibe Tanks)

Already covered elsewhere (pointers, not repetition)

Related

Try It

Open Questions

Graph View

Table of Contents

Backlinks