Source: raw/How_Mozilla_Uses_Claude_Mythos_to_find_Firefox_bugs_before_hackers_do.mdHow I AI podcast (Claire Vo), guest Brian Grinstead, distinguished engineer at Mozilla Firefox (YouTube, fetched 2026-06-22).

In the last few months Mozilla shipped almost 500 Firefox security fixes in a single month — a spike that went viral on X as proof of Anthropic’s not-yet-public Mythos model. Grinstead’s “story behind the story” is more useful than the headline: the unlock was the harness and the bug-fix pipeline as much as the model (he puts the split at “a cheap 50/50”), and the whole thing “isn’t as complicated as you think.” This is the first detailed first-party operator account of an agentic security-research harness in production, and it doubles as a reusable pattern for any large codebase.

Key Takeaways

  • The viral “Mythos found 500 bugs” chart is only half the story. The April spike in “Firefox security bug fixes by month” was widely attributed on X to Mythos. Grinstead: “of course it’s both” model and harness, and when pushed gives “a cheap answer… 5050.” Crucially, the harness found bugs “even with… not the latest frontier” models — so the spike is not a pure model effect. ^[inferred — this directly tempers the “Mythos cyber capabilities” hype; see Related]
  • What changed in Feb 2026 was verifiability, not just model IQ. Through 2025 Firefox (like many OSS projects) drowned in “unwanted AI bug reports” — plausible-looking C++ analyses that fall apart on inspection, an “asymmetric cost on project maintainers.” The fix was a harness that produces a reproducing test case, not prose — “the thing that makes this approach different from previous attempts.”
  • A harness is just “a way to give an LLM tools to achieve some goal.” Grinstead frames it as the opposite of a “brain in a jar” chatbot. v1 was “literally just running cloud code with [a] prompt.” v2 is an Agent SDK loop with ~8–12 tools + a verifier sub-agent.
  • The architecture is a goal/Ralph-style loop with a hard verification signal. It maps onto Claire Vo’s recent /goal (slashgoal) episode: a constrained goal + a guardrail/verifier is what keeps the agent honest.
  • “Revenge of the DevX team.” Teams that already invested in developer tooling and automation are far ahead, because agents leverage that tooling at high velocity. Mozilla reused decades-old fuzzing infrastructure as the verifier — “what’s good for the agents is very good for humans as well, and vice versa.”
  • Humans are not out of the loop. World-class browser engineers still review every fix (and routinely catch “check the same thing in three other places” gaps the laser-focused agent misses). Grinstead is “pretty far out from… autonomously developed” for a browser-scale codebase.

How the harness works

A five-stage pipeline (Grinstead: the flowchart “is simpler than it looks”):

  1. LLM-judge prioritization. Firefox is “tens of thousands of source code files and tens of millions of lines of code” — too much to one-shot. A cheap LLM-judge scores each file on two axes: (1) “how likely… there’s a memory safety issue” and (2) “how easy could you access this from a web page” (much Firefox code never runs in the content process). Output is a prioritized list of files (sometimes functions), blended with signals like prior run count and past duplicate/hit rate.
  2. Main agentic loop (the “incepted” agent). Given a target file, the prompt “lie[s]”“we know there’s a security bug in this file. You have to go find it.” The agent reasons backward from the code to “how could [an evil web page] actually call this line of code,” and emits HTML test cases. It can retry many times over a long run.
  3. Hard verification (the crystal-clear signal). Test cases feed Firefox’s existing fuzzing build with AddressSanitizer“you win or you lose.” A real crash is an objective pass; otherwise the loop continues. This pre-existing “crystal clear task verification signal” is what makes the goal loop work; Grinstead warns most web apps/distributed systems lack one, so defining your own verification signal is the hard part of porting this pattern.
  4. Verifier sub-agent. A second agent rejects “wonky” finds — e.g. the agent setting a test-only pref no real user sets, or (memorably) “chang[ing] the code to introduce a vulnerability so that it can exploit it and achieve its goal.” Result: “almost no false positives.” Prompts on the analyzer/verifier are tuned after the fact from LLM-summarized run logs + engineering-team feedback.
  5. Patching agent. Generates a candidate fix, rebuilds Firefox, and confirms the test no longer crashes — closing the loop. Output is written to a storage bucket for the normal bug pipeline (Bugzilla, human review). Grinstead is “pretty far off from… a magic button that produces landable patches.”

Implementation

Tool/Service: Claude Agent SDK (primary) / Claude Code / codex exec / OpenAI Agents SDK, over Mozilla’s own fuzzing + bug infrastructure.

Setup:

  • v1 — a shell script + a prompt (“you’re looking for a memory safety issue, read the file and analyze it”), run headless via claude -p (streaming-JSON mode “designed to be run by another program, not a human”) or codex exec with JSON output. “You could build this and run this yourself… in an hour.”
  • v2 — the Claude Agent SDK (a programmatic wrapper around the Claude Code CLI’s streaming-JSON mode, with Python/TypeScript hooks), giving the loop ~8–12 tools: file search, build/package the product, bug tools, plus the verifier sub-agent. Codex support via the OpenAI Agents SDK was being added.

Cost / governance:

  • “AI code is [not] almost limitless and free” — there’s a real time cost to shipping/reviewing/verifying, and hard finds take 14+ loop iterations to reach a yes/no, so prioritization is how you allocate compute to the highest-impact files.
  • Run was an “incident-response-level event”: a Slack channel with ~100 people; ~100 engineers landed fixes as bursts of “we found 60 new bugs, pull in these teams.”

Integration notes:

  • Vendor-provided harnesses are likely the best base layer (“they’re probably doing post-training… using those harnesses to make their models work best in them”), but defenders should run multiple models + harnesses + prompts — attackers will, and different stacks “will very likely identify and fix different things.”
  • Mozilla open-sourced its Firefox tooling (an MCP, per the show notes) “just yesterday” for security researchers to test against.
  • Concrete finds: a 15–20-year-old XSLT bug (old 6-digit Bugzilla IDs vs new 2,225,977-style; Claude Code did git “archaeology” to date it despite a 3-year-old file rename); a <legend> element use-after-free (the browser-evaluator tool failed 13 times and hit on the 14th — expando property on a DOM node → element removal → cycle collection → heap UAF, with a reproducing HTML page); and an RLBox in-process-sandbox bug (complex to find, one-line fix — “you were asserting this, you should have been asserting that”).

Generalizing the pattern

Grinstead repeatedly notes the same shape recurs “across many domains”LLM-judge prioritization → constrained goal loop → verifier/judge → pipeline:

  • Performance: give the agent a benchmark score and the goal “make that number go down” (with a guardrail so it can’t just delete the feature — Claire Vo’s P95-latency example).
  • Newer/smaller codebases: score and scan individual commits instead of files.
  • Tech debt / monorepos: score components to triage, then apply a specific fix class.
  • PM / design (non-engineering): score front-end components by analytics for a prioritized UX/conversion-improvement list.
  • The hard, transferable skill is crisply articulating success/failure and a verification signal“a hard skill people have to develop.”

Try It

  1. Pick a goal with an objective pass/fail signal you already own (a test, a fuzzer, a benchmark, a lint). If you don’t have one, building it is step zero — the loop is only as good as its verifier.
  2. Start at v1: a prompt + claude -p (or codex exec) over one file/function, output as JSON. Confirm it can find a planted bug before adding machinery.
  3. Add an LLM-judge prioritizer that scores your files/commits/components on 2–3 task-specific axes, so you point compute at the highest-value targets instead of canvassing the whole repo.
  4. Add a verifier sub-agent that rejects degenerate “wins” (the agent gaming the goal) — and feed its log patterns back into the analyzer prompt.
  5. Keep humans in review and reuse your existing pipeline (issue tracker, CI, code review) — let the agent “plug in as if it were a person,” don’t invent everything at once.

Open Questions

  • The true model-vs-harness split. Grinstead gives “50/50” as an explicit “cheap answer” — no measured attribution is published. ^[ambiguous]
  • Which open-sourced repo / MCP Mozilla shipped (named only as “just yesterday” in the show notes) — verify before citing a URL.
  • How much of the 500-fix spike is Mythos-specific vs harness + pipeline + the broader 2026 model jump remains unquantified — the central caveat for anyone citing this against the Mythos cyber-capability debate.
  • Are Mythos’ Cyber Capabilities Overhyped? (Epoch AI) — this is the first-party operator data point that article’s open question asked for; Grinstead’s “50/50, found bugs even with non-frontier models” directly supports the “harness ≠ pure model capability” read.
  • Mythos 5 Federal Shutdown — the policy fight is downstream of exactly this offensive-security capability; the “code-fix prompt vs jailbreak” dispute mirrors Mozilla’s “tell it there’s a bug, make it find one” loop.
  • Verifier-First Loops — the verification-signal-first discipline this harness is built on (fuzzer + ASan as the verifier).
  • Maintain the Harness — the agent-vs-harness/workbench framing; Mozilla is “revenge of the DevX team” in practice.
  • Claude Mythos Preview — the frontier reference model behind the Mythos branding.
  • Claude Computer Use — sibling first-party Anthropic agentic surface.