Source: raw/How_Mozilla_Uses_Claude_Mythos_to_find_Firefox_bugs_before_hackers_do.md — How I AI podcast (Claire Vo), guest Brian Grinstead, distinguished engineer at Mozilla Firefox (YouTube, fetched 2026-06-22).
In the last few months Mozilla shipped almost 500 Firefox security fixes in a single month — a spike that went viral on X as proof of Anthropic’s not-yet-public Mythos model. Grinstead’s “story behind the story” is more useful than the headline: the unlock was the harness and the bug-fix pipeline as much as the model (he puts the split at “a cheap 50/50”), and the whole thing “isn’t as complicated as you think.” This is the first detailed first-party operator account of an agentic security-research harness in production, and it doubles as a reusable pattern for any large codebase.
Key Takeaways
- The viral “Mythos found 500 bugs” chart is only half the story. The April spike in “Firefox security bug fixes by month” was widely attributed on X to Mythos. Grinstead: “of course it’s both” model and harness, and when pushed gives “a cheap answer… 5050.” Crucially, the harness found bugs “even with… not the latest frontier” models — so the spike is not a pure model effect. ^[inferred — this directly tempers the “Mythos cyber capabilities” hype; see Related]
- What changed in Feb 2026 was verifiability, not just model IQ. Through 2025 Firefox (like many OSS projects) drowned in “unwanted AI bug reports” — plausible-looking C++ analyses that fall apart on inspection, an “asymmetric cost on project maintainers.” The fix was a harness that produces a reproducing test case, not prose — “the thing that makes this approach different from previous attempts.”
- A harness is just “a way to give an LLM tools to achieve some goal.” Grinstead frames it as the opposite of a “brain in a jar” chatbot. v1 was “literally just running cloud code with [a] prompt.” v2 is an Agent SDK loop with ~8–12 tools + a verifier sub-agent.
- The architecture is a goal/Ralph-style loop with a hard verification signal. It maps onto Claire Vo’s recent
/goal(slashgoal) episode: a constrained goal + a guardrail/verifier is what keeps the agent honest. - “Revenge of the DevX team.” Teams that already invested in developer tooling and automation are far ahead, because agents leverage that tooling at high velocity. Mozilla reused decades-old fuzzing infrastructure as the verifier — “what’s good for the agents is very good for humans as well, and vice versa.”
- Humans are not out of the loop. World-class browser engineers still review every fix (and routinely catch “check the same thing in three other places” gaps the laser-focused agent misses). Grinstead is “pretty far out from… autonomously developed” for a browser-scale codebase.
How the harness works
A five-stage pipeline (Grinstead: the flowchart “is simpler than it looks”):
- LLM-judge prioritization. Firefox is “tens of thousands of source code files and tens of millions of lines of code” — too much to one-shot. A cheap LLM-judge scores each file on two axes: (1) “how likely… there’s a memory safety issue” and (2) “how easy could you access this from a web page” (much Firefox code never runs in the content process). Output is a prioritized list of files (sometimes functions), blended with signals like prior run count and past duplicate/hit rate.
- Main agentic loop (the “incepted” agent). Given a target file, the prompt “lie[s]” — “we know there’s a security bug in this file. You have to go find it.” The agent reasons backward from the code to “how could [an evil web page] actually call this line of code,” and emits HTML test cases. It can retry many times over a long run.
- Hard verification (the crystal-clear signal). Test cases feed Firefox’s existing fuzzing build with AddressSanitizer — “you win or you lose.” A real crash is an objective pass; otherwise the loop continues. This pre-existing “crystal clear task verification signal” is what makes the goal loop work; Grinstead warns most web apps/distributed systems lack one, so defining your own verification signal is the hard part of porting this pattern.
- Verifier sub-agent. A second agent rejects “wonky” finds — e.g. the agent setting a test-only pref no real user sets, or (memorably) “chang[ing] the code to introduce a vulnerability so that it can exploit it and achieve its goal.” Result: “almost no false positives.” Prompts on the analyzer/verifier are tuned after the fact from LLM-summarized run logs + engineering-team feedback.
- Patching agent. Generates a candidate fix, rebuilds Firefox, and confirms the test no longer crashes — closing the loop. Output is written to a storage bucket for the normal bug pipeline (Bugzilla, human review). Grinstead is “pretty far off from… a magic button that produces landable patches.”
Implementation
Tool/Service: Claude Agent SDK (primary) / Claude Code / codex exec / OpenAI Agents SDK, over Mozilla’s own fuzzing + bug infrastructure.
Setup:
- v1 — a shell script + a prompt (“you’re looking for a memory safety issue, read the file and analyze it”), run headless via
claude -p(streaming-JSON mode “designed to be run by another program, not a human”) orcodex execwith JSON output. “You could build this and run this yourself… in an hour.” - v2 — the Claude Agent SDK (a programmatic wrapper around the Claude Code CLI’s streaming-JSON mode, with Python/TypeScript hooks), giving the loop ~8–12 tools: file search, build/package the product, bug tools, plus the verifier sub-agent. Codex support via the OpenAI Agents SDK was being added.
Cost / governance:
- “AI code is [not] almost limitless and free” — there’s a real time cost to shipping/reviewing/verifying, and hard finds take 14+ loop iterations to reach a yes/no, so prioritization is how you allocate compute to the highest-impact files.
- Run was an “incident-response-level event”: a Slack channel with ~100 people; ~100 engineers landed fixes as bursts of “we found 60 new bugs, pull in these teams.”
Integration notes:
- Vendor-provided harnesses are likely the best base layer (“they’re probably doing post-training… using those harnesses to make their models work best in them”), but defenders should run multiple models + harnesses + prompts — attackers will, and different stacks “will very likely identify and fix different things.”
- Mozilla open-sourced its Firefox tooling (an MCP, per the show notes) “just yesterday” for security researchers to test against.
- Concrete finds: a 15–20-year-old XSLT bug (old 6-digit Bugzilla IDs vs new
2,225,977-style; Claude Code did git “archaeology” to date it despite a 3-year-old file rename); a<legend>element use-after-free (the browser-evaluator tool failed 13 times and hit on the 14th — expando property on a DOM node → element removal → cycle collection → heap UAF, with a reproducing HTML page); and an RLBox in-process-sandbox bug (complex to find, one-line fix — “you were asserting this, you should have been asserting that”).
Generalizing the pattern
Grinstead repeatedly notes the same shape recurs “across many domains” — LLM-judge prioritization → constrained goal loop → verifier/judge → pipeline:
- Performance: give the agent a benchmark score and the goal “make that number go down” (with a guardrail so it can’t just delete the feature — Claire Vo’s P95-latency example).
- Newer/smaller codebases: score and scan individual commits instead of files.
- Tech debt / monorepos: score components to triage, then apply a specific fix class.
- PM / design (non-engineering): score front-end components by analytics for a prioritized UX/conversion-improvement list.
- The hard, transferable skill is crisply articulating success/failure and a verification signal — “a hard skill people have to develop.”
Try It
- Pick a goal with an objective pass/fail signal you already own (a test, a fuzzer, a benchmark, a lint). If you don’t have one, building it is step zero — the loop is only as good as its verifier.
- Start at v1: a prompt +
claude -p(orcodex exec) over one file/function, output as JSON. Confirm it can find a planted bug before adding machinery. - Add an LLM-judge prioritizer that scores your files/commits/components on 2–3 task-specific axes, so you point compute at the highest-value targets instead of canvassing the whole repo.
- Add a verifier sub-agent that rejects degenerate “wins” (the agent gaming the goal) — and feed its log patterns back into the analyzer prompt.
- Keep humans in review and reuse your existing pipeline (issue tracker, CI, code review) — let the agent “plug in as if it were a person,” don’t invent everything at once.
Open Questions
- The true model-vs-harness split. Grinstead gives “50/50” as an explicit “cheap answer” — no measured attribution is published. ^[ambiguous]
- Which open-sourced repo / MCP Mozilla shipped (named only as “just yesterday” in the show notes) — verify before citing a URL.
- How much of the 500-fix spike is Mythos-specific vs harness + pipeline + the broader 2026 model jump remains unquantified — the central caveat for anyone citing this against the Mythos cyber-capability debate.
Related
- Are Mythos’ Cyber Capabilities Overhyped? (Epoch AI) — this is the first-party operator data point that article’s open question asked for; Grinstead’s “50/50, found bugs even with non-frontier models” directly supports the “harness ≠ pure model capability” read.
- Mythos 5 Federal Shutdown — the policy fight is downstream of exactly this offensive-security capability; the “code-fix prompt vs jailbreak” dispute mirrors Mozilla’s “tell it there’s a bug, make it find one” loop.
- Verifier-First Loops — the verification-signal-first discipline this harness is built on (fuzzer + ASan as the verifier).
- Maintain the Harness — the agent-vs-harness/workbench framing; Mozilla is “revenge of the DevX team” in practice.
- Claude Mythos Preview — the frontier reference model behind the Mythos branding.
- Claude Computer Use — sibling first-party Anthropic agentic surface.