Karpathy-Style AutoResearch for Cold Outbound — An Experiment Loop Optimizing Reply Rate

Source: raw/Claude_Code_+_Karpathy_Destroys_Every_Lead_Gen_Agency.md — unnamed B2B lead-gen agency founder, YouTube qw0xdTtzK1w

A B2B lead-gen agency applies the idea of Andrej Karpathy’s AutoResearch (an experiment loop that self-improves toward a target metric) to cold email — optimizing positive reply rate. Notably, when the founder set it up in Claude Code, Claude discarded Karpathy’s actual repository as unnecessary: “it’s just simply the idea of what we’re trying to achieve.” The agency reports an enterprise client’s reply rate doubling (20.71 vs 10.71 replies per 1,000, sustained week-over-week) without humans writing emails or manually uploading leads.

Key Takeaways

The metric to optimize is positive reply rate. The system reads past campaigns (copy, leads, target companies), labels who responded positively vs negatively and which campaigns won, then does more of what works and less of what doesn’t — no human ideating each round. (Karpathy’s AutoResearch originally ran an experiment every ~5 min to self-improve a small local model; here the same loop shape optimizes outbound, not model training.)
Context engineering is the foundation. Enter the company website → auto-onboarding draft → fill gaps from a walk-and-talk voice-memo transcript → the agent writes ICP.md, case-study.md, value-prop.md, and problem-statement.md. “Without really good context, none of this is going to matter.”
Pre-build the ENTIRE TAM up front — no game-time decisions. The key early failure was letting Claude pick experiment targets on the fly, which produced thin samples. Fix: enrich every company with every conceivable data point in advance, so the agent only selects from complete data. Signals captured include sales motion (demo / contact-sales / meet-us CTA buttons), pricing (lowest/highest listed, enterprise, free trial), org ratios (sales leaders vs ICs; CRO / VP / director of sales; CMO / VP marketing / GTM engineer), ad presence (Meta/Google), and listed case studies.
“The list is the message.” Campaign experiments are generated from the enriched data — each surfaces a data-driven angle (a pain point or value prop), plus a suggested email and 3–5 verified contacts per experiment. The founder deliberately under-invested in the email copy itself (“you’ll have your own opinion on what a good cold email looks like”).
Guardrails are hard-coded — keep them: the CTA is locked (so a generated campaign can’t accidentally give the product away free), Million Verifier must approve an address before it enters SmartLead, human approval is hard-coded, and the TAM lives in Supabase (or a local CSV in the give-away version).
The loop: approved experiments load into SmartLead + Instantly; a weekly Codex automation / Claude Code routine reviews prior reply rates and proposes the next round for human approval. The system tracks who’s been contacted so experiments don’t repeat.
Stack: Claude Code or Codex as the brain; Clay (+ derived data points), Rapid API, Apify (+ the Apify MCP for ad data), Prospeo, Blitz API, the OSS HTML-to-text library for site crawling, and the Batch API + GPT-5 Nano for bulk AI processing.
Results (self-reported via the SmartLead API): automatic campaigns hit 20.71 replies / 1,000 vs 10.71 / 1,000 for non-automatic (Apr 6–17), holding week-over-week; agency-wide, 50+ customers generating 200–300 positive responses/day. ^[inferred — single-source, self-reported metrics, not independently audited]

Try It

Build context files first: feed your site + a voice-memo brain-dump transcript to Claude Code/Codex; have it draft ICP.md, case-study.md, value-prop.md, problem-statement.md.
Enrich your full TAM up front (Clay + Apify / Prospeo / Rapid API) with sales-motion, pricing, and org-ratio signals — don’t let the agent pick targets from thin data.
Generate experiments from the data (“the list is the message”) — each with suggested copy + 3–5 verified contacts.
Lock the CTA, gate sends behind Million Verifier + your approval, then load approved experiments into SmartLead / Instantly.
Schedule a weekly routine (Claude Code routine / Codex automation) to read reply rates and propose the next round.

AutoResearch (Thu Vu walkthrough) — the ratchet-loop this generalizes from LLM training to outbound; the walkthrough only hypothesized the marketing application this video deploys.
Karpathy Pattern — community adoption of Karpathy’s open-source ideas.
AI Automation for Client Acquisition — adjacent outbound/lead-gen automation.
Five AI Automations Businesses Pay For — where reply-rate-optimizing outbound fits the demand map.
Agents & Agentic Systems — the experiment-loop agent pattern.
Clay + Claude Code — Natural-Language Lead Generation — the lighter one-prompt Clay-MCP flow; this article’s TAM/SmartLead system is the heavier reply-rate-optimizing counterpart.

Open Questions

The presenter / agency name is not stated in the transcript — attribution is low-confidence.
The repository is promised (“I’m going to give away the repository”) but not verified released at recording; the install claims are unverified pending the repo.
Reply-rate metrics are self-reported from the agency’s own SmartLead pull, not independently audited.

Jonathon's AI Wiki

Explorer

Karpathy-Style AutoResearch for Cold Outbound — An Experiment Loop Optimizing Reply Rate

Key Takeaways

Try It

Open Questions

Graph View

Table of Contents

Backlinks

Jonathon's AI Wiki

Explorer

Karpathy-Style AutoResearch for Cold Outbound — An Experiment Loop Optimizing Reply Rate

Key Takeaways

Try It

Related

Open Questions

Graph View

Table of Contents

Backlinks