Source: What legal agents inherit from coding agents — Lessons from Legora (YouTube nho1YAEPuwA), Jacob Emmeline (Staff Software Engineer, Legora), Code with Claude London 2026 (uploaded 2026-05-21). Transcript via local Whisper fallback (no YouTube captions).
Jacob Emmeline, a Staff Software Engineer at Legora — a Stockholm-founded, 1M to $100M ARR) — argues that the patterns, harness design, and tool-call shapes that made coding agents pull ahead of every other vertical six months ago are directly transferable to legal work. The thesis is structured around a three-bucket framework: reuse one-to-one, translate to your domain, invent for what’s genuinely domain-specific. The most distinctive claim is that mirroring a coding-agent harness in a legal-agent context lets the model “feel like it’s inside a coding agent harness” and inherits the gains from coding-focused RL and fine-tuning — Legora made .docx translation work end-to-end on Haiku once the read/edit/verify loop was structured like a coding agent’s.
Key Takeaways
- Six-months-ago realization. Legora rebuilt how they construct agents after noticing that coding agents had pulled dramatically ahead of every other vertical — chatbots → agents → background agents → beyond — while legal lagged. Other knowledge-work verticals had the same gap.
- Why coding and legal rhyme. Both rely heavily on prior work, both center on text-based documents, both have strict organizational conventions, and both have strong review culture (engineer reviews PR; partner reviews associate’s draft before client signoff). This isn’t unique to legal — it generalizes to most knowledge work.
- Three-bucket framework. (1) Reuse one-to-one — to-dos, planning mode, sub-agents, sandboxes, human-in-the-loop. “Exactly what you get for free with Anthropic Agent SDK or Managed Agents.” (2) Translate — patterns where the legal sub-problem rhymes with a coding sub-problem but needs a different tool implementation. (3) Invent — genuinely domain-specific things like citation-grounded answers and large-document-set due diligence.
- Reuse: planning mode. Same UX as Claude Code’s plan mode — lawyer drafts a plan, iterates, approves, then the agent executes. Solves exploration + context-gathering + upfront-decisions all at once. One-to-one translation of the human/agent collaboration UX.
- Reuse: approval of dangerous actions. Same as a coding agent asking before running an unsandboxed shell command — Legora’s agent asks before doing things like deleting client documents. Took the established UX as-is and skipped the iteration loop entirely.
- Translate:
.docxediting as the read/edit/verify loop. This is the load-bearing case..docxfiles are zip files containing a bunch of XML with metadata noise — not as simple as editing markdown. Legora’s original design used a top-level agent handing off to a reasoning model that produced edit markers, then individual models filling in the full edits — created handoff problems (e.g., editing model didn’t have the tools the top-level agent had been told to use). The new design copies coding-agent convergence: read tool → edit tool → verify step in a loop, with reading happening against an intermediate flat-text representation of the.docxand editing tools operating on that representation. The agent sees its own edits, keeps looping, and converges. - The Haiku-translates-a-10-page-doc moment. First POC test: a colleague who built Legora’s original Word editor handed Jacob a 10-page document and asked him to translate one paragraph from English to Swedish — historically a task that broke prior setups with exhaustiveness failures. The new harness running on Haiku (not a frontier model) translated paragraph-by-paragraph, occasionally re-read the whole document, caught forgotten paragraphs, edited them, and finished after ~10 minutes with every paragraph translated. This was the moment the team realized mirroring coding-agent harness design might actually work generically.
- Jacob’s mental model. “You want to have the model almost feel like it’s inside a coding agent harness, and it just does a legal task.” When your harness looks similar in tool design, the model produces similar trajectories and tool-calling patterns — so it inherits gains from RL and fine-tuning done on coding-agent harnesses essentially for free.
- Translate: linting for legal documents. ESLint-equivalent for
.docxfiles. Static checks like cross-reference integrity — if the agent deletes a paragraph referenced earlier in the contract, the linter catches it and feeds the agent a “you should probably update the section at the bottom” signal. Can be extended with LLM-based checks layered into the same lint feedback loop. - Invent: due diligence (Tabular Review). Domain-specific surface — company A buying company B forces a lawyer to review thousands of B’s contracts and binding documents. Legora’s existing Tabular Review tool gives a spreadsheet-like grid: each row is a document, each column is a structured-data extraction (party, document category, red flags). The agent now uses Tabular Review as a tool — same way a human lawyer would — to process folders of documents, generate extracted cell values, then filter the grid for relevance. Equivalent surfaces exist in other verticals: accountants need reconciliation, doctors have domain-specific verification tasks. “The last 20% that makes your agent really good for a specific domain.”
- Invent: citation grounding. Every answer in Legora must be grounded in citations so a lawyer can verify which document a claim came from — a verification UX coding agents don’t need.
- Demo: vacation-clause amendment. Employee Agreements project — agent given “add an extra week of Christmas vacation to every employee.” Agent searched the project (employment agreements + HR policy), wrote a plan (review all agreements, add Christmas shutdown clause, update policy manual, optionally draft announcement memo), and on approval executed the read/edit/verify loop — copying documents to a staging space, streaming edits, surfacing red-line diffs per document. One agreement already had the clause; the agent unified its dates with the others without being told.
- Demo: 100-doc due-diligence triage. Agent given a 100-file project with random documents (insurance policies, workers comp, contracts) and asked to identify contract categories, parties, and red flags, then sort employment agreements into a dedicated folder. Agent wrote a plan, created a Tabular Review for structured extraction, filtered down to employment agreements, and moved them — every citation traceable to the source document via highlights.
- Why coding is ahead. Two hypotheses: (1) engineers are more willing to try new tools, (2) coding gets focus because solving coding unlocks growth in every other niche of software engineering. “But if you’re building any other vertical, you don’t really care why it’s ahead — you can just keep looking at what coding agents ship and steal what’s usable.”
- The framework as ongoing strategy. Every time coding agents ship something new, vertical-agent builders should evaluate: reuse, translate, or invent. The model keeps applying.
Legora — what they build
Collaborative AI workspace for lawyers, built around end-to-end legal tasks rather than chatbot Q&A. Founded in Stockholm. Over 1,000 customers including some of the largest law firms in the world. Valued over 100M ARR in record time from a $1M starting point. Jacob is a Staff Software Engineer there — his role on the talk is to share the technical realization that drove their agent rebuild six months prior.
The platform centers on projects containing documents (employment agreements, HR policies, contracts), a chat box that addresses a project-scoped agent, Tabular Review for structured extraction across many documents, red-line review for diff-style document edits, and citation surfaces linking every claim back to the source paragraph.
What legal agents have in common with coding agents
Jacob’s parallels:
- Both rely heavily on prior work. Engineers reuse code, libraries, and patterns. Lawyers reuse precedent, prior contracts, and templates.
- Text-based documents are the medium. Source files for engineers; agreements, briefs, and policies for lawyers.
- Strict organizational conventions. Each firm/team has its own style, structure, and naming.
- Strong review culture. PR review before production. Partner sign-off before client delivery. The review surface is structurally the same shape.
These parallels are not unique to the engineering ↔ legal pair. They generalize across knowledge work — which is what makes the inheritance framework portable.
The three-bucket inheritance framework
Bucket 1 — Reuse one-to-one
Patterns coding agents converged on over years of iteration that turn out to be universal for agents in general. Free to inherit, no translation needed. Examples:
- To-dos and planning mode. Same lawyer-iterates-on-plan, agent-executes-when-approved UX.
- Sub-agents. Same orchestration shape.
- Sandboxes. Same isolation primitive.
- Human-in-the-loop approval for dangerous actions. “Should I delete this?” — the prompt-and-confirm pattern.
Jacob explicitly maps this bucket to Anthropic Agent SDK and Managed Agents — “the things you get for free.”
Bucket 2 — Translate
Patterns where your domain has a sub-problem that looks like a coding sub-problem but needs a different implementation. The structure transfers; the tool details don’t. Two worked examples:
Document editing → read/edit/verify loop
- Coding pattern. Read line-based text → edit via string replace / patches / line-based tools → verify (model reasons, or type-check / lint runs) → loop.
- Legora’s original (pre-realization) design. Top-level agent → handoff to reasoning model with current-document context → edit markers — “insert here on page 1, page 3, page 5” — without writing full edits to avoid laziness → second pass of individual models writing full edits per marker with style preservation. Worked for exhaustiveness but created brittleness: every new top-level-agent tool created handoff bugs where the editing model lacked tools the top-level agent had referenced.
- Legora’s translated design. Read tool + edit tool + verify step, operating on a flat intermediate representation of the
.docx. Agent reads, edits, sees its own edits, loops. Worked on Haiku on a 10-page translation task that prior frontier-model setups had struggled with. - The inheritance claim. Tool-design mirroring → similar trajectories → similar tool-calling → “free” benefits from coding-agent RL and fine-tuning.
Static type checking → legal-document linting
- Coding pattern. ESLint / type-checker provides mechanical feedback the agent can act on inside the loop.
- Legora’s translated design. Cross-reference integrity check (paragraph deletion breaks an earlier reference → linter raises it), formatting checks, plus LLM-based checks layered into the same lint surface. Same feedback-loop shape.
Bucket 3 — Invent
Genuinely domain-specific problems with no coding analog. Two examples from Legora:
- Citation grounding. Every answer must trace back to the source document and the specific paragraph. Lawyers verify; coding agents typically don’t need this.
- Mass document review (due diligence). Thousands of contracts to triage. Legora’s Tabular Review surface — grid of documents × structured-extraction columns, fully reviewable with per-cell citations and human verification tracking — has no coding-agent equivalent. The agent calls Tabular Review as a tool the same way a human lawyer would.
Jacob frames this as “the last 20% that makes your agent really good for a specific domain.” Other verticals will have parallel inventions: reconciliation for accountants, domain-specific verification tasks for doctors.
The Haiku moment — why harness mirroring matters
The most distinctive technical claim in the talk. After rebuilding .docx editing as a read/edit/verify loop on a flat intermediate representation, the team’s first test was a translation task historically known to break their prior architecture. They ran it on Haiku — not Sonnet, not Opus — to see whether the harness alone was carrying the gains.
It worked. The agent translated paragraph-by-paragraph, periodically re-read the document, caught forgotten paragraphs, edited them, and finished after about 10 minutes with the full document translated. The mental model Jacob landed on:
“You want to have the model almost feel like it’s inside a coding agent harness, and it just does a legal task. And then suddenly you get these benefits from all their reinforcement learning and fine-tuning that is done on the coding agent harnesses, because your harness looks very similar in tool design and leads to very similar trajectories and tool calling. And a lot of stuff you just get for free.”
This reframes harness design as a primary lever — bigger than model choice for some classes of legal task. Compatible with the broader Anthropic-internal framing from Picking the Right Model — build a private eval (cheapest-successful-outcome, not cheapest-per-token): Haiku becomes the right model once the harness carries enough structure.
Two demos walked through in the talk
Demo 1 — Employment agreements: amend every contract
- Project setup. Folder of fictional employee employment agreements + an HR policy file.
- Prompt. “I want to give every employee an extra week of vacation during Christmas — plan out the work.”
- Agent behavior. Searched the project for “employment agreements,” “vacation,” “time off” → wrote a streaming plan with three steps (review all agreements, amend with Christmas shutdown clause, update HR policy manual) + optional 4th (draft announcement memo).
- On approval. Read/edit/verify loop fires. Documents copied to a staging space for review before being written back. Per-document red-line diffs streamed in. One agreement already contained the clause — the agent unified its dates with the other agreements without instruction.
Demo 2 — 100-document due-diligence triage
- Project setup. ~100 random documents (insurance, workers comp, employment agreements, etc.).
- Prompt. Structured review — identify document categories, parties, red flags; move all employment agreements into a dedicated folder.
- Agent behavior. Wrote a plan, created a Tabular Review with the right extraction columns (document category with extraction prompt, parties, red flags), waited several minutes for the LLM to process each file, then filtered to employment agreements and moved them.
- Verification surface. Tabular Review opens any document with extracted data points on one side and the source document on the other. Each cell links back to the highlighted source span. Lawyers mark cells verified; progress tracked for collaboration between humans and AI on the same surface.
Where coding and legal diverge (what doesn’t translate)
- Format complexity.
.docxis a zip of XML with metadata noise; markdown is markdown. Direct line-based reading doesn’t work — Legora translates to a flat intermediate representation before letting the model touch it. - Verification surfaces. Coding agents rely on tests, type checkers, and runtime behavior. Legal agents rely on lawyer verification with citation traceability — Tabular Review’s verified-checkmark UX has no clean coding analog.
- Domain expertise weight. Due diligence requires legal judgment about red flags; coding rarely needs that level of domain-specific human-in-the-loop semantic review.
Why coding pulled ahead — Jacob’s hypotheses
Two candidate explanations he flags but says vertical builders don’t need to resolve:
- Engineer adoption. Engineers are unusually willing to try new tools and adopt new tech in daily work.
- Compounding focus. Solving coding unlocks growth in every other niche of software engineering, so investment compounds there faster than in any single non-coding vertical.
The closing strategic claim: “If you’re building any other vertical, you don’t really care why it’s ahead. You can just keep looking at what coding agents ship, reuse what’s usable, translate what’s similar, and invent the rest.”
Open problems Jacob touches on
- Generalizing the read/edit/verify pattern to formats beyond
.docx(slide decks, spreadsheets, PDFs with structure). - Scaling the lint surface — how much of legal correctness can be checked statically before LLM-based checks have to take over.
- The verification UX for mass-document tasks — Tabular Review works for due diligence, but other legal workflows may need different surfaces.
- How harness mirroring degrades when the domain diverges further from coding’s text-token-based structure.
Try It
- Run the three-bucket exercise on your vertical. Take your domain’s top three workflows. For each, list which sub-problems (a) reuse coding-agent UX one-to-one, (b) translate with different tool implementation, (c) need fresh invention. Most builders find more in bucket 1 than expected.
- Audit your existing agent harness for read/edit/verify shape. If your agent does multi-call handoffs between independent reasoning models (different context, different tools), test whether collapsing to a single read/edit/verify loop on a flat intermediate representation improves exhaustiveness. Jacob’s Haiku-translates-10-pages moment is the calibration target.
- Build a domain-specific linter. Pick the mechanical-correctness checks that don’t need a model — cross-references, formatting rules, structural integrity. Wire them into the agent’s verification step the way ESLint wires into coding agents. Add LLM-based checks on top of the same surface.
- Make the model “feel like” it’s in a coding-agent harness. Mirror tool-name conventions (
read,edit,verify), shape, and ordering. Test whether trajectories and tool-calling patterns shift toward coding-style behavior — and whether you can drop a model tier (Sonnet → Haiku) without losing quality. - Identify your invent-bucket surfaces early. What’s the equivalent of due diligence in your vertical? Accountants → reconciliation. Doctors → differential diagnosis review. Build that surface as a first-class tool the agent calls — not as an inline LLM call — so humans can verify and the agent can iterate.