How Gusto's CTO Uses Claude Code to Ship Like a Startup

Source: raw/How_Gusto_s_CTO_uses_Claude_Code_to_ship_like_a_startup.md Creator: Claire Vo (How I AI) — guest: Eddie Kim (Gusto CTO & co-founder) | URL: https://www.youtube.com/watch?v=5FKBkUCaLa8 | Platform: YouTube

Eddie Kim, CTO and co-founder of Gusto, tells Claire Vo how a team of five (four engineers including himself, plus one designer) built and launched Gusto Co-founder — an AI agent product — in 10 weeks from zero code inside an R&D organization of 1,000+ people. The interesting part isn’t the product; it’s the operating model: no meetings, no tech specs, no Figmas, no Jira board, and almost no documentation, on a deliberately minimal stack of a Cloudflare Worker plus the Vercel AI SDK. The episode doubles as a concrete playbook for shipping agentic products fast — PR-as-spec, the “trash-can method,” eval-driven fixes, and designers shipping production code.

Key Takeaways

Team & timeline: Gusto Co-founder was built by 5 people — Eddie plus 3 engineers (he counts himself, so 4 engineers) and 1 designer, Katie Kovalcin — over 10 weeks, from initial idea and zero code to a “tier-one” launch. Gusto’s R&D org is 1,000+ people; Eddie frames this as a tiny, low-risk fraction of total R&D spend with potentially huge payoff.
Origin = a vibe-coded prototype on a layover. On vacation in February, flying Madrid → London → San Francisco, a delay made him miss his connection. He spent the 5-hour layover Claude Coding an idea that had been “percolating”; he had a working prototype by the time he landed. He recorded a Loom, shared it with senior ICs and Katie, and the group that “leaned in” became the team (no formal “council”).
The whiteboard was the only doc. At the quarterly “anchor week” (March, Denver office) they reserved a room one Thursday and whiteboarded a single page of the app. A photo of that whiteboard was the only documentation produced in the entire 10 weeks. The shipped product stuck remarkably close to it (tasks → task runs → assets, later renamed automations / artifacts).
Process defined by what they removed: no meetings, no tech specs, no Figmas, no Jira/stories, no stand-ups, no retros. The one structured thing kept was a 24/7 “perma-zoom” — an always-on Zoom room (everyone remote) people drift in and out of.
No product manager — “everybody was kind of a product manager.” Build a feature → demo it in the perma-zoom → discuss “does this make sense?” → if yes, code-review it on the spot; if no, delete it. PRs are real, review-ready PRs, not drafts.
The “trash-can method” (Claire Vo’s term): because the cost of writing code is now so low, it’s reasonable to write a polished, review-ready PR and then close it. Two variants: (a) PR + preview branch to validate, then close if it’s not the thing; (b) ship V1, learn from customers, then trash all the code and rebuild from scratch on a /v2 branch. Eddie’s own prototype was deleted day one (an engineer pushed to rebuild it in TypeScript as a stateless Cloudflare Worker agent) — in hindsight the best decision.
Code-review culture is the unlock: median PR review time on this team was 9 minutes (team-level, not org-wide) — because someone was always in the perma-zoom to pair on a review. Claire’s anti-pattern flag: non-engineers’ PRs should never get slower review than engineers’.
Designers shipping production code: Katie the designer landed in the 94th percentile of “true throughput” (PRs merged to production) across the entire R&D org — top of ~20, measured with the DX tool. Her stated enablers: slightly more technical curiosity than most designers, and a team of 3-4 engineers who reviewed her code, taught her to prompt Claude better, and developed her taste for good vs. bad generated code.
Docs are “absolutely dead” for zero-to-one projects (Eddie). Claire’s adjacent frame: with newer models, agents now write docs that are “none of my business” — docs for agents, not humans.

The Stack & Build Process

The technical core readers asked for — how to actually build an agentic product without a heavy framework:

Agent loop: a Cloudflare Worker.
Model orchestration: the Vercel AI SDK. “That’s it.” No additional harness on top; everything else built in-house.
No while loop. Eddie was “blown away” that you don’t hand-roll the loop — “you call stream and it takes care of the loop for you.” The AI SDK also lets you switch models, expose tools, and read files, which is why he argues agent-building is “really not that scary and complicated.”
Memory is trivial: “memory to us is a tool that writes to a database column called memory.” No third-party memory/planning framework. The minimal Worker-plus-SDK stack mirrors a broader 2026 move away from heavyweight agent harnesses. ^[inferred]
PR-as-PRD + feature flags: the PR is the proposal and the solution. Work merged to production behind a feature flag, onto a hidden page — a “block of marble” chipped into shape in production.
Faked frontend first: Katie shipped a pure-frontend, canned-response version (UI present, same response every time — “like a first pass on Lovable”) behind the flag. Engineers then wired in data models and the agent loop, morphing the fake into the real thing in place.
Eval-driven fixes in Claude Code (Eddie’s live demo): from a user-feedback GitHub issue, he starts Claude Code (--dangerously-skip-permissions), dictates with Whisper Flow (“I barely type these days”), and prompts roughly: “Read this issue and come up with a fix. First write an eval that fails to reproduce the issue, then solve it, then prove the eval passes, then check the rest of the suite, then open a PR.” He’s not a classic TDD person, but for AI/conversation fixes “it’s basically the only way we work now.” The most important step is reviewing the generated code, eval, and prompt change before asking for the PR; while it runs he starts a second or third item in another Claude Code terminal.
The product (Gusto Co-founder): a normal agent loop connected to tools, but pre-loaded with everything Gusto already knows about the business (employees, payroll, schedules, time-off). Users talk to it via SMS or Slack (WhatsApp and Telegram planned), and it has connectors to QuickBooks, Google Sheets, and Notion. Demo: a real massage-spa customer runs payroll in natural language — Co-founder pulls a Mindbody export from a spreadsheet, applies the owner’s rules (hot-stone upsell + $15, CB D - o i l +$ 20, pooled tips split by therapist count), updates the payroll, then stops to confirm before submitting.
Why messaging-first: the original idea came from Eddie setting up OpenClaw himself and texting it over Telegram. That visceral experience — plus its pain (hard to set up, needs a Mac mini “you can’t even get today”) — seeded Co-founder’s hypothesis: safe, cloud-hosted, with SMS/Slack as first-class channels.

Scaling the Model (and Permission)

As a co-founder, Eddie had implicit permission to break Gusto’s own rules; other teams skipping specs and Figmas “might get a slap on the wrist.”
To scale it, leaders must explicitly grant permission to work this way — and Eddie would go further: forbid producing docs/Figmas/specs, so teams that want to work this way feel allowed to.
Comparable “techno-psychological” experiments cited: Chintan at Coinbase telling engineers to “delete your IDE” and to “only touch the inputs — re-prompt, don’t rewrite the agent’s output.”
Advice to leaders: don’t stop at a prototype — get hands-on merging real, reviewed, production-quality code. Eddie went near-IC mode for 10 weeks and hit ~95th percentile on DX over 3 months “to prove it.”
Keep the team small early (“kicking people out of projects” is Claire’s stated speed trick); add people only once the shape is clear.
Eddie’s prompting style when Claude goes off-track: stay polite and open-ended (“if you think this is a good idea, could you try this?“) — partly because he wants the model to push back and offer better options rather than just defer.

Try It

Build a minimal agent the same way: a serverless function (Cloudflare Worker or similar) running the Vercel AI SDK; let the SDK own the loop and model-switching instead of hand-rolling a while loop. See Agent SDK — How the Agent Loop Works and Running the Agentic Loop.
Adopt the eval-driven fix loop: when fixing an agent/conversation bug in Claude Code, prompt it to “write a failing eval that reproduces the issue, fix it, prove the eval passes and the suite stays green, then open a PR” — and always review the generated code/prompt before merging.
Try PR-as-spec: build the real feature, open a review-ready PR, and decide in review whether to keep or close it. Treat closed PRs as cheap, not waste.
Validate UX with a faked frontend: ship a canned-response, pure-frontend version behind a feature flag before the backend exists; wire in the real agent loop in place.
Start memory simple: a tool that writes to a memory database column, before reaching for a memory framework.
If you lead: prototype to prove feasibility, then actually merge reviewed production code — and prioritize non-engineers’ PRs as highly as engineers’.
Feel it yourself: set up a personal messaging-channel agent (e.g., OpenClaw) to understand messaging-first agent UX viscerally.

Open Questions

Which model(s) Co-founder runs on via the Vercel AI SDK is not named in the episode.
Pricing and general availability of Gusto Co-founder beyond the waitlist (gusto.com/cofounder) are not given.
How the team gates high-stakes actions (e.g., submitting payroll) beyond the human-confirm step shown is not detailed.
How team size and process change as Co-founder scales past the original five is not quantified.

Jonathon's AI Wiki

Explorer

How Gusto's CTO Uses Claude Code to Ship Like a Startup

Key Takeaways

The Stack & Build Process

Scaling the Model (and Permission)

Try It

Open Questions

Graph View

Table of Contents

Backlinks

Jonathon's AI Wiki

Explorer

How Gusto's CTO Uses Claude Code to Ship Like a Startup

Key Takeaways

The Stack & Build Process

Scaling the Model (and Permission)

Try It

Related

Open Questions

Graph View

Table of Contents

Backlinks