Teaching Agents to Learn From Your Team

Source: Teaching agents to learn from your team (YouTube uGroRwlC9y4), Petra (Head of Developer Experience, Warp), Code with Claude London 2026 (uploaded 2026-05-22). Transcript via local Whisper fallback (no YouTube captions). Surname not stated in the transcript opening.

Warp ships a terminal designed for running and managing agents; Petra leads developer experience there and her team runs the developer-community surface. Her talk frames the central failure mode of agent projects as the “80% gap” — agents that kind of sort of work but never make it to daily-production status — and argues the fix isn’t a better initial prompt but a deliberately designed feedback loop that lets the agent learn from what the team is already doing. She walks through this through the story of Buzz, Warp’s social-response triage agent: ~15 skill files, near-zero hand-written code, built in a few days, that monitors Warp’s social mentions, drafts replies, decides what to skip, and improves itself daily via Slack emoji reactions and PRs against its own skill repo. The takeaway she asks the audience to remember is design the feedback loop, not the prompt.

Key Takeaways

The 80% gap is where most agents die. Petra polled the room: most attendees had built an agent; many had one running daily; very few were happy with the daily output. The gap between “kind of sort of works” and “ship it and let it go” is where teams burn time tweaking prompts and the agent ends up worse-than-no-agent.
Buzz at a glance. Warp’s social-mention triage agent. ~15 skill files. Zero hand-written code — purely Claude skill files plus the agentic primitives. Connects to the X API, Slack, and other services. Built in a few days. Handles a few thousand mentions per month. 50% get skipped. A few thousand cloud-agent runs per month.
Buzz makes one of three decisions per mention: reply (and draft the message), like (engagement-only, no reply), or skip (not actually about Warp / talking to someone else / no value in inserting Warp). The team still sends every reply manually — the agent removes the 90% of effort that isn’t the actual reply.
External-check agent loops don’t work for taste tasks. Petra explicitly names the Ralph Loop as the canonical pattern that works for coding (unit tests + browser/computer-use give the agent a binary “am I there yet” signal). Social replies have no such signal — you’d have to send replies live and measure brand perception. The loop has to come from somewhere else.
Switch from rules to principles. Buzz started as a “checklist” prompt — long lists of if X then Y. The output sounded like a robot and broke on every new situation. Rewriting the skill file as principles (“don’t get defensive when users complain”, “come across as a product builder, not a support agent processing tickets”) cut the skill to roughly one-fifth the original length and produced better output because the agent could reason flexibly about new situations.
The agent needs to learn how to learn. When Petra gave Buzz human feedback on its drafts and asked it to improve, Buzz reverted to writing brittle rules (“if a person is venting about a product issue, never mention pricing in the first line”). She had to encode the learning behavior itself as a separate skill — “look at what you did, look at my feedback, look at your current instructions, what would your instructions need to be to produce the ideal output?” — so the agent generalises feedback into principles rather than memorising cases.
Feedback has to ride on existing team behaviour. The decisive move was making the daily feedback loop cost the team essentially nothing extra. Buzz posts each mention to a Slack channel with its suggested action, reasoning, and drafted reply. The team uses emoji reactions (which they were already going to use to avoid stepping on each other) to record what they actually did. Notes in the Slack thread give richer feedback when needed.
Buzz files its own PRs. Daily, Buzz reads the emoji-reaction deltas between its suggestions and the team’s actions, plus the thread notes, and opens a pull request against its own skills repo with surgical edits to the relevant instructions — not random rules appended to a list. A morning Slack ping links to the PR. ~60-second review because the diff is a handful of English lines.
Make it feel like a teammate. Giving Buzz a name, a little personality, and a Slack presence increases how much meaningful feedback humans give it. People interact more with agents that feel like teammates than with anonymous bots.
Three pieces, all required. Principles tell the agent what to do. The learning skill lets it improve. The daily feedback loop gives it the input to improve from. Drop any one and the system fails.
Cloud-agent infrastructure. Buzz runs as a cloud agent on Oz^[inferred], Warp’s orchestration platform for cloud agents. Triggered by schedules, API calls, webhooks, and cron — the same primitive class as Claude Code’s Routines, which Petra explicitly cross-references (“you all have seen routines on the cloud code”).
Numbers. A few thousand mentions/month • 50% skipped • 15 skills (growing) • a few thousand cloud-agent runs/month • daily PR review measured in seconds. Daily DM from Buzz with health-metric graphs (action distribution, who is replying, how much).

The 80% gap — why most agents die

Agents are easy to get to almost working. The trap is the long tail of judgment and taste that the initial prompt can’t capture. Teams keep tweaking the prompt because they can feel the agent is close, but every fix is brittle and every new situation re-opens the gap. Petra argues this is where most agents become net-negative against having no agent at all.

The Ralph Loop and other agent-loop patterns work when there’s an external check the agent can run on its own — unit tests, browser-use confirming a button works, a curl that returns 200. For taste tasks (social replies, code-review comments, customer responses, Slack messages), the only signal is downstream human reaction, and that’s too slow and noisy to feed back into an agent loop in the conventional way.

Rules → principles — the first unlock

The first version of Buzz had a long if-then rule list. Symptoms: replies sounded robotic, the agent broke on novel situations because the rule set didn’t cover them, the skill file kept growing.

Petra’s reframe: treat the agent like a new team member. You don’t onboard a human by handing them a rule table — you explain how to think, what the purpose is, what good and bad look like, what principles they should hold (“don’t get defensive”, “be impact-baked”^[inferred — Whisper transcript unclear, likely “impact-focused” or “impartial”], “come across as a product builder not a support agent processing tickets”). Rewriting the skill as principles cut the file to ~20% of its original length and improved output, because principles are flexible enough for the agent to reason about new situations.

Teaching the agent to learn — the second unlock

Even with principles, Buzz wasn’t great. Petra ran the standard engineering loop — collect outputs, give feedback, ask the agent to improve. The agent’s instinct was to add brittle rules back: case-specific instructions that solved exactly the one example and generalised to nothing.

Her fix was to encode the learning behaviour itself as a separate skill file. The skill tells the agent: take your current output, take the human feedback, take your current instructions, and figure out what gap in your instructions would explain the delta. Edit the instructions to close that gap as a principle, not a case. The agent now generalises feedback rather than memorising it.

At this point Buzz had two components: principles (what to do) and a learning skill (how to absorb feedback). The remaining question was who keeps feeding it.

The feedback loop — the third unlock

Petra didn’t want another team meeting, a feedback rotation, or any net-new ritual. The constraint was: the smallest possible behaviour change on the team’s side that still gives the agent rich learning signal.

The loop she designed runs entirely on tools the team already uses:

Buzz posts to Slack. Each mention lands in a dedicated channel with: the mention, the suggested action (reply / like / skip), Buzz’s reasoning, and — if it’s a reply — a drafted message. Heavy use of Slack’s structured formatting to make each post scannable.
The team reacts with emojis indicating the action they actually took. A check mark means “I sent the reply.” Different emojis cover the other actions. The team was already going to use emoji reactions to avoid stepping on each other’s mentions — Buzz reuses that breadcrumb trail.
The team optionally adds a note in the Slack thread when an action diverges from Buzz’s suggestion (“we shouldn’t correct the user here — they said something nice and we just answered their question”) or when there’s nuance worth preserving.
Daily, Buzz reads the channel. It compares its suggestions against the team’s emoji reactions, ingests the thread notes, and draws takeaways about where its instructions were wrong.
Buzz opens a PR. Skills live in a Git repo so they’re handled as code. The PR makes surgical edits to the relevant instructions — not random appended rules — and a morning Slack message links to it with a brief explanation.
The team reviews in ~60 seconds. English-line diffs, clear reasoning, normal GitHub review flow. Quick-edit feature for tweaking phrasing before merging.

Petra calls out the keep-it-simple insight explicitly: the trickiest part of any agent feedback loop is the humans. If the loop takes too much time, sits outside the normal workflow, or feels like extra work, people just won’t do it. Buzz’s loop deliberately rides on actions the team was already taking.

Why the agent has a personality

Buzz has a name and “a little bit of whimsy”. Petra observed that humans interact more meaningfully — and give richer feedback — to agents that feel like teammates. You can talk to a teammate in Slack, leave them a note, react to their message. Anonymous bots get ignored. The personality is functional, not decorative.

Control — humans stay in the merge loop

Buzz proposes; humans dispose. Two control surfaces:

PR review. Every instruction change is a PR. The team can decline, edit, or merge.
Quick instruction edits. Petra explicitly likes being able to tweak the phrasing of an instruction directly so the team controls exactly how things are worded — “we don’t want the agent to just change its instructions willy-nilly… we don’t want it doubling down on some weird direction.”

This is the structural answer to instruction drift: the agent never edits itself silently; every change ships through a code-review surface the team already understands.

Generalising past Buzz

Petra spends the closing minutes generalising:

The same loop applies to customer replies, code-review comments, Slack drafts, and most “fuzzy” judgment tasks that don’t have an external unit-test-like check.
The three pieces (principles, learning skill, daily feedback loop) all need each other — none alone is sufficient.
The closing prompt to the audience: if you remember one thing, focus on designing the feedback loop, not nailing the initial prompt. The initial prompt can be just good. The loop is what makes the agent better over time as the problem evolves.

Try It

Audit your agent for the rules-vs-principles split. Re-read your skill file. Count if … then … clauses. If the file reads like a checklist, rewrite it as principles describing how to think about the task and what good looks like. Expect a ~5× length reduction with equal-or-better output.
Add a learning skill. Write a separate skill file that takes (current output, human feedback, current instructions) → (proposed instruction edit that generalises the feedback into a principle). Without this, your agent will tack on brittle case-specific rules every time it’s corrected.
Ride existing team behaviour. Don’t invent a new feedback ritual. Find an action the team is already taking (Slack reactions, GitHub review comments, ticket tags) and have the agent read those instead. The first failure mode of feedback loops is the humans.
Make the agent file PRs against its own instructions. Skills in Git. Daily run. Surgical edits, not appended rules. ~60-second review. The team stays in the merge loop, which prevents instruction drift and gives a natural place to push back.
Cross-reference with Claude Code Routines. If you’re on Claude Code rather than Warp, the cloud-agent triggering primitive is Routines — Petra explicitly cross-referenced this in the talk (“you all have seen routines on the cloud code”). Same shape: schedule / webhook / cron triggers an agent to run unattended in the background.

Open Questions

Petra’s surname — not stated in the transcript opening. Worth resolving from the public Code with Claude London 2026 speaker page if a refresh pass lands.
“Oz” as the orchestration platform name — Whisper transcription, plausible but unverified. Could be a Warp-internal codename or a mishear of another product name.
“Impact-baked” — Whisper-garbled adjective in the don’t-get-defensive principle. Likely “impact-focused” or “impartial” but not recoverable from this transcript alone.
Buzz’s learning-skill source. Petra describes the meta-skill but doesn’t show the prompt verbatim. A community implementation would benefit from seeing the actual skill file.

Code with Claude London 2026 — Opening Keynote — the conference this talk sits inside
Managed Agents Production — Jess Ann + Lance Martin — same conference, complementary view of long-running autonomous agents with explicit feedback primitives
Fiona Fung — Running an AI-native engineering org — same conference, organisational pattern for teams running agents at scale
Spotify — Coding is no longer the constraint — same conference, complementary “agents that improve themselves and the team” thesis
Auto Memory — Claude’s first-party answer to “let the agent persist what it learns”, parallel direction to Buzz’s PR-against-own-instructions pattern
Skills — the primitive Buzz is built on (~15 skill files, zero code)
Claude Code Routines — the cloud-trigger primitive Petra explicitly cross-referenced for the audience
Reflexio — external self-improvement harness that takes a similar shape (extract playbooks from completed runs); useful contrast — Reflexio is a generic harness, Buzz is a domain-specific hand-built loop
Claude Skills + WEO Governance — adjacent pattern of skills as the human-in-the-loop control surface for agent behaviour

Jonathon's AI Wiki

Explorer

Teaching Agents to Learn From Your Team — Petra (Warp)

Key Takeaways

The 80% gap — why most agents die

Rules → principles — the first unlock

Teaching the agent to learn — the second unlock

The feedback loop — the third unlock

Why the agent has a personality

Control — humans stay in the merge loop

Generalising past Buzz

Try It

Open Questions

Graph View

Table of Contents

Backlinks

Jonathon's AI Wiki

Explorer

Teaching Agents to Learn From Your Team — Petra (Warp)

Key Takeaways

The 80% gap — why most agents die

Rules → principles — the first unlock

Teaching the agent to learn — the second unlock

The feedback loop — the third unlock

Why the agent has a personality

Control — humans stay in the merge loop

Generalising past Buzz

Try It

Open Questions

Related

Graph View

Table of Contents

Backlinks