Source: raw/How_Gusto_s_CTO_uses_Claude_Code_to_ship_like_a_startup.md Creator: Claire Vo (How I AI) — guest: Eddie Kim (Gusto CTO & co-founder) | URL: https://www.youtube.com/watch?v=5FKBkUCaLa8 | Platform: YouTube
Eddie Kim, CTO and co-founder of Gusto, tells Claire Vo how a team of five (four engineers including himself, plus one designer) built and launched Gusto Co-founder — an AI agent product — in 10 weeks from zero code inside an R&D organization of 1,000+ people. The interesting part isn’t the product; it’s the operating model: no meetings, no tech specs, no Figmas, no Jira board, and almost no documentation, on a deliberately minimal stack of a Cloudflare Worker plus the Vercel AI SDK. The episode doubles as a concrete playbook for shipping agentic products fast — PR-as-spec, the “trash-can method,” eval-driven fixes, and designers shipping production code.
Key Takeaways
- Team & timeline: Gusto Co-founder was built by 5 people — Eddie plus 3 engineers (he counts himself, so 4 engineers) and 1 designer, Katie Kovalcin — over 10 weeks, from initial idea and zero code to a “tier-one” launch. Gusto’s R&D org is 1,000+ people; Eddie frames this as a tiny, low-risk fraction of total R&D spend with potentially huge payoff.
- Origin = a vibe-coded prototype on a layover. On vacation in February, flying Madrid → London → San Francisco, a delay made him miss his connection. He spent the 5-hour layover Claude Coding an idea that had been “percolating”; he had a working prototype by the time he landed. He recorded a Loom, shared it with senior ICs and Katie, and the group that “leaned in” became the team (no formal “council”).
- The whiteboard was the only doc. At the quarterly “anchor week” (March, Denver office) they reserved a room one Thursday and whiteboarded a single page of the app. A photo of that whiteboard was the only documentation produced in the entire 10 weeks. The shipped product stuck remarkably close to it (tasks → task runs → assets, later renamed automations / artifacts).
- Process defined by what they removed: no meetings, no tech specs, no Figmas, no Jira/stories, no stand-ups, no retros. The one structured thing kept was a 24/7 “perma-zoom” — an always-on Zoom room (everyone remote) people drift in and out of.
- No product manager — “everybody was kind of a product manager.” Build a feature → demo it in the perma-zoom → discuss “does this make sense?” → if yes, code-review it on the spot; if no, delete it. PRs are real, review-ready PRs, not drafts.
- The “trash-can method” (Claire Vo’s term): because the cost of writing code is now so low, it’s reasonable to write a polished, review-ready PR and then close it. Two variants: (a) PR + preview branch to validate, then close if it’s not the thing; (b) ship V1, learn from customers, then trash all the code and rebuild from scratch on a
/v2branch. Eddie’s own prototype was deleted day one (an engineer pushed to rebuild it in TypeScript as a stateless Cloudflare Worker agent) — in hindsight the best decision. - Code-review culture is the unlock: median PR review time on this team was 9 minutes (team-level, not org-wide) — because someone was always in the perma-zoom to pair on a review. Claire’s anti-pattern flag: non-engineers’ PRs should never get slower review than engineers’.
- Designers shipping production code: Katie the designer landed in the 94th percentile of “true throughput” (PRs merged to production) across the entire R&D org — top of ~20, measured with the DX tool. Her stated enablers: slightly more technical curiosity than most designers, and a team of 3-4 engineers who reviewed her code, taught her to prompt Claude better, and developed her taste for good vs. bad generated code.
- Docs are “absolutely dead” for zero-to-one projects (Eddie). Claire’s adjacent frame: with newer models, agents now write docs that are “none of my business” — docs for agents, not humans.
The Stack & Build Process
The technical core readers asked for — how to actually build an agentic product without a heavy framework:
- Agent loop: a Cloudflare Worker.
- Model orchestration: the Vercel AI SDK. “That’s it.” No additional harness on top; everything else built in-house.
- No while loop. Eddie was “blown away” that you don’t hand-roll the loop — “you call
streamand it takes care of the loop for you.” The AI SDK also lets you switch models, expose tools, and read files, which is why he argues agent-building is “really not that scary and complicated.” - Memory is trivial: “memory to us is a tool that writes to a database column called
memory.” No third-party memory/planning framework. The minimal Worker-plus-SDK stack mirrors a broader 2026 move away from heavyweight agent harnesses. ^[inferred] - PR-as-PRD + feature flags: the PR is the proposal and the solution. Work merged to production behind a feature flag, onto a hidden page — a “block of marble” chipped into shape in production.
- Faked frontend first: Katie shipped a pure-frontend, canned-response version (UI present, same response every time — “like a first pass on Lovable”) behind the flag. Engineers then wired in data models and the agent loop, morphing the fake into the real thing in place.
- Eval-driven fixes in Claude Code (Eddie’s live demo): from a user-feedback GitHub issue, he starts Claude Code (
--dangerously-skip-permissions), dictates with Whisper Flow (“I barely type these days”), and prompts roughly: “Read this issue and come up with a fix. First write an eval that fails to reproduce the issue, then solve it, then prove the eval passes, then check the rest of the suite, then open a PR.” He’s not a classic TDD person, but for AI/conversation fixes “it’s basically the only way we work now.” The most important step is reviewing the generated code, eval, and prompt change before asking for the PR; while it runs he starts a second or third item in another Claude Code terminal. - The product (Gusto Co-founder): a normal agent loop connected to tools, but pre-loaded with everything Gusto already knows about the business (employees, payroll, schedules, time-off). Users talk to it via SMS or Slack (WhatsApp and Telegram planned), and it has connectors to QuickBooks, Google Sheets, and Notion. Demo: a real massage-spa customer runs payroll in natural language — Co-founder pulls a Mindbody export from a spreadsheet, applies the owner’s rules (hot-stone upsell +20, pooled tips split by therapist count), updates the payroll, then stops to confirm before submitting.
- Why messaging-first: the original idea came from Eddie setting up OpenClaw himself and texting it over Telegram. That visceral experience — plus its pain (hard to set up, needs a Mac mini “you can’t even get today”) — seeded Co-founder’s hypothesis: safe, cloud-hosted, with SMS/Slack as first-class channels.
Scaling the Model (and Permission)
- As a co-founder, Eddie had implicit permission to break Gusto’s own rules; other teams skipping specs and Figmas “might get a slap on the wrist.”
- To scale it, leaders must explicitly grant permission to work this way — and Eddie would go further: forbid producing docs/Figmas/specs, so teams that want to work this way feel allowed to.
- Comparable “techno-psychological” experiments cited: Chintan at Coinbase telling engineers to “delete your IDE” and to “only touch the inputs — re-prompt, don’t rewrite the agent’s output.”
- Advice to leaders: don’t stop at a prototype — get hands-on merging real, reviewed, production-quality code. Eddie went near-IC mode for 10 weeks and hit ~95th percentile on DX over 3 months “to prove it.”
- Keep the team small early (“kicking people out of projects” is Claire’s stated speed trick); add people only once the shape is clear.
- Eddie’s prompting style when Claude goes off-track: stay polite and open-ended (“if you think this is a good idea, could you try this?“) — partly because he wants the model to push back and offer better options rather than just defer.
Try It
- Build a minimal agent the same way: a serverless function (Cloudflare Worker or similar) running the Vercel AI SDK; let the SDK own the loop and model-switching instead of hand-rolling a
whileloop. See Agent SDK — How the Agent Loop Works and Running the Agentic Loop. - Adopt the eval-driven fix loop: when fixing an agent/conversation bug in Claude Code, prompt it to “write a failing eval that reproduces the issue, fix it, prove the eval passes and the suite stays green, then open a PR” — and always review the generated code/prompt before merging.
- Try PR-as-spec: build the real feature, open a review-ready PR, and decide in review whether to keep or close it. Treat closed PRs as cheap, not waste.
- Validate UX with a faked frontend: ship a canned-response, pure-frontend version behind a feature flag before the backend exists; wire in the real agent loop in place.
- Start memory simple: a tool that writes to a
memorydatabase column, before reaching for a memory framework. - If you lead: prototype to prove feasibility, then actually merge reviewed production code — and prioritize non-engineers’ PRs as highly as engineers’.
- Feel it yourself: set up a personal messaging-channel agent (e.g., OpenClaw) to understand messaging-first agent UX viscerally.
Related
- The New Shape of Product Work (Ambrosino)
- Agent SDK — How the Agent Loop Works
- Running the Agentic Loop
- 2026 AI-Work Restructuring
- From Vibe Coding to Agentic Engineering
- OpenClaw Concepts
- Reflecting on a Year of Claude Code
Open Questions
- Which model(s) Co-founder runs on via the Vercel AI SDK is not named in the episode.
- Pricing and general availability of Gusto Co-founder beyond the waitlist (gusto.com/cofounder) are not given.
- How the team gates high-stakes actions (e.g., submitting payroll) beyond the human-confirm step shown is not detailed.
- How team size and process change as Co-founder scales past the original five is not quantified.