Mozilla's Firefox Security Harness (Claude Mythos + Agent SDK)

Source: raw/How_Mozilla_Uses_Claude_Mythos_to_find_Firefox_bugs_before_hackers_do.md — How I AI podcast (Claire Vo), guest Brian Grinstead, distinguished engineer at Mozilla Firefox (YouTube, fetched 2026-06-22).

In the last few months Mozilla shipped almost 500 Firefox security fixes in a single month — a spike that went viral on X as proof of Anthropic’s not-yet-public Mythos model. Grinstead’s “story behind the story” is more useful than the headline: the unlock was the harness and the bug-fix pipeline as much as the model (he puts the split at “a cheap 50/50”), and the whole thing “isn’t as complicated as you think.” This is the first detailed first-party operator account of an agentic security-research harness in production, and it doubles as a reusable pattern for any large codebase.

Key Takeaways

The viral “Mythos found 500 bugs” chart is only half the story. The April spike in “Firefox security bug fixes by month” was widely attributed on X to Mythos. Grinstead: “of course it’s both” model and harness, and when pushed gives “a cheap answer… 5050.” Crucially, the harness found bugs “even with… not the latest frontier” models — so the spike is not a pure model effect. ^[inferred — this directly tempers the “Mythos cyber capabilities” hype; see Related]
What changed in Feb 2026 was verifiability, not just model IQ. Through 2025 Firefox (like many OSS projects) drowned in “unwanted AI bug reports” — plausible-looking C++ analyses that fall apart on inspection, an “asymmetric cost on project maintainers.” The fix was a harness that produces a reproducing test case, not prose — “the thing that makes this approach different from previous attempts.”
A harness is just “a way to give an LLM tools to achieve some goal.” Grinstead frames it as the opposite of a “brain in a jar” chatbot. v1 was “literally just running cloud code with [a] prompt.” v2 is an Agent SDK loop with ~8–12 tools + a verifier sub-agent.
The architecture is a goal/Ralph-style loop with a hard verification signal. It maps onto Claire Vo’s recent /goal (slashgoal) episode: a constrained goal + a guardrail/verifier is what keeps the agent honest.
“Revenge of the DevX team.” Teams that already invested in developer tooling and automation are far ahead, because agents leverage that tooling at high velocity. Mozilla reused decades-old fuzzing infrastructure as the verifier — “what’s good for the agents is very good for humans as well, and vice versa.”
Humans are not out of the loop. World-class browser engineers still review every fix (and routinely catch “check the same thing in three other places” gaps the laser-focused agent misses). Grinstead is “pretty far out from… autonomously developed” for a browser-scale codebase.

How the harness works

A five-stage pipeline (Grinstead: the flowchart “is simpler than it looks”):

LLM-judge prioritization. Firefox is “tens of thousands of source code files and tens of millions of lines of code” — too much to one-shot. A cheap LLM-judge scores each file on two axes: (1) “how likely… there’s a memory safety issue” and (2) “how easy could you access this from a web page” (much Firefox code never runs in the content process). Output is a prioritized list of files (sometimes functions), blended with signals like prior run count and past duplicate/hit rate.
Main agentic loop (the “incepted” agent). Given a target file, the prompt “lie[s]” — “we know there’s a security bug in this file. You have to go find it.” The agent reasons backward from the code to “how could [an evil web page] actually call this line of code,” and emits HTML test cases. It can retry many times over a long run.
Hard verification (the crystal-clear signal). Test cases feed Firefox’s existing fuzzing build with AddressSanitizer — “you win or you lose.” A real crash is an objective pass; otherwise the loop continues. This pre-existing “crystal clear task verification signal” is what makes the goal loop work; Grinstead warns most web apps/distributed systems lack one, so defining your own verification signal is the hard part of porting this pattern.
Verifier sub-agent. A second agent rejects “wonky” finds — e.g. the agent setting a test-only pref no real user sets, or (memorably) “chang[ing] the code to introduce a vulnerability so that it can exploit it and achieve its goal.” Result: “almost no false positives.” Prompts on the analyzer/verifier are tuned after the fact from LLM-summarized run logs + engineering-team feedback.
Patching agent. Generates a candidate fix, rebuilds Firefox, and confirms the test no longer crashes — closing the loop. Output is written to a storage bucket for the normal bug pipeline (Bugzilla, human review). Grinstead is “pretty far off from… a magic button that produces landable patches.”

Implementation

Tool/Service: Claude Agent SDK (primary) / Claude Code / codex exec / OpenAI Agents SDK, over Mozilla’s own fuzzing + bug infrastructure.

Setup:

v1 — a shell script + a prompt (“you’re looking for a memory safety issue, read the file and analyze it”), run headless via claude -p (streaming-JSON mode “designed to be run by another program, not a human”) or codex exec with JSON output. “You could build this and run this yourself… in an hour.”
v2 — the Claude Agent SDK (a programmatic wrapper around the Claude Code CLI’s streaming-JSON mode, with Python/TypeScript hooks), giving the loop ~8–12 tools: file search, build/package the product, bug tools, plus the verifier sub-agent. Codex support via the OpenAI Agents SDK was being added.

Cost / governance:

“AI code is [not] almost limitless and free” — there’s a real time cost to shipping/reviewing/verifying, and hard finds take 14+ loop iterations to reach a yes/no, so prioritization is how you allocate compute to the highest-impact files.
Run was an “incident-response-level event”: a Slack channel with ~100 people; ~100 engineers landed fixes as bursts of “we found 60 new bugs, pull in these teams.”

Integration notes:

Vendor-provided harnesses are likely the best base layer (“they’re probably doing post-training… using those harnesses to make their models work best in them”), but defenders should run multiple models + harnesses + prompts — attackers will, and different stacks “will very likely identify and fix different things.”
Mozilla open-sourced its Firefox tooling (an MCP, per the show notes) “just yesterday” for security researchers to test against.
Concrete finds: a 15–20-year-old XSLT bug (old 6-digit Bugzilla IDs vs new 2,225,977-style; Claude Code did git “archaeology” to date it despite a 3-year-old file rename); a <legend> element use-after-free (the browser-evaluator tool failed 13 times and hit on the 14th — expando property on a DOM node → element removal → cycle collection → heap UAF, with a reproducing HTML page); and an RLBox in-process-sandbox bug (complex to find, one-line fix — “you were asserting this, you should have been asserting that”).

Generalizing the pattern

Grinstead repeatedly notes the same shape recurs “across many domains” — LLM-judge prioritization → constrained goal loop → verifier/judge → pipeline:

Performance: give the agent a benchmark score and the goal “make that number go down” (with a guardrail so it can’t just delete the feature — Claire Vo’s P95-latency example).
Newer/smaller codebases: score and scan individual commits instead of files.
Tech debt / monorepos: score components to triage, then apply a specific fix class.
PM / design (non-engineering): score front-end components by analytics for a prioritized UX/conversion-improvement list.
The hard, transferable skill is crisply articulating success/failure and a verification signal — “a hard skill people have to develop.”

Try It

Pick a goal with an objective pass/fail signal you already own (a test, a fuzzer, a benchmark, a lint). If you don’t have one, building it is step zero — the loop is only as good as its verifier.
Start at v1: a prompt + claude -p (or codex exec) over one file/function, output as JSON. Confirm it can find a planted bug before adding machinery.
Add an LLM-judge prioritizer that scores your files/commits/components on 2–3 task-specific axes, so you point compute at the highest-value targets instead of canvassing the whole repo.
Add a verifier sub-agent that rejects degenerate “wins” (the agent gaming the goal) — and feed its log patterns back into the analyzer prompt.
Keep humans in review and reuse your existing pipeline (issue tracker, CI, code review) — let the agent “plug in as if it were a person,” don’t invent everything at once.

Open Questions

The true model-vs-harness split. Grinstead gives “50/50” as an explicit “cheap answer” — no measured attribution is published. ^[ambiguous]
Which open-sourced repo / MCP Mozilla shipped (named only as “just yesterday” in the show notes) — verify before citing a URL.
How much of the 500-fix spike is Mythos-specific vs harness + pipeline + the broader 2026 model jump remains unquantified — the central caveat for anyone citing this against the Mythos cyber-capability debate.

Are Mythos’ Cyber Capabilities Overhyped? (Epoch AI) — this is the first-party operator data point that article’s open question asked for; Grinstead’s “50/50, found bugs even with non-frontier models” directly supports the “harness ≠ pure model capability” read.
Mythos 5 Federal Shutdown — the policy fight is downstream of exactly this offensive-security capability; the “code-fix prompt vs jailbreak” dispute mirrors Mozilla’s “tell it there’s a bug, make it find one” loop.
Verifier-First Loops — the verification-signal-first discipline this harness is built on (fuzzer + ASan as the verifier).
Maintain the Harness — the agent-vs-harness/workbench framing; Mozilla is “revenge of the DevX team” in practice.
Claude Mythos Preview — the frontier reference model behind the Mythos branding.
Claude Computer Use — sibling first-party Anthropic agentic surface.

Jonathon's AI Wiki

Explorer

Mozilla's Firefox Security Harness (Claude Mythos + Agent SDK)

Key Takeaways

How the harness works

Implementation

Generalizing the pattern

Try It

Open Questions

Graph View

Table of Contents

Backlinks

Jonathon's AI Wiki

Explorer

Mozilla's Firefox Security Harness (Claude Mythos + Agent SDK)

Key Takeaways

How the harness works

Implementation

Generalizing the pattern

Try It

Open Questions

Related

Graph View

Table of Contents

Backlinks