How We Claude Code — Ara's Applied AI Workshop (Code with Claude London 2026)

Source: raw/How_we_Claude_Code.md — YouTube transcript (video ID IlqJqcl8ONE, fetched 2026-05-25 via inbox-refresh wiki-inbox playlist). Workshop title: How We Claude Code. Speaker: Ara, member of Anthropic’s Applied AI team (architect). Venue: Code with Claude London 2026. Companion repo: CWC workshops / cloud-with-code workshops / how-we-claude-code (organization name garbled by auto-captions; actual GitHub org likely code-with-claude-workshops or similar — see Open Questions). The talk is presented as the London adaptation of Thariq’s San Francisco talk “The Unreasonable Effectiveness of HTML Files” (~10 days earlier). See HTML Effectiveness (Thariq) for the SF source.

How the Claude Code team at Anthropic actually uses Claude Code in their daily work — distilled to three levels of practice across prompting, planning, and verification. The thesis: as models get more capable, you should let the model do more of the upfront work of extracting requirements from you, switch from markdown specs to HTML specs for review, and embed verification as a contract into the artifact itself so an agent can drive it end-to-end. Workshop is hands-on with a three-phase repo: (1) bill-splitting app via ask_user_question interview, (2) four HTML design directions generated and compared visually, (3) to-do app with agent-native verification matrix (storybook fixtures + DOM-published state + playwright MCP) runnable from three surfaces — human dashboard, agent-driven browser, or headless CI.

Key Takeaways

Three levels of practice. (1) Prompting: let Claude interview you. (2) Planning: HTML specs over markdown. (3) Verification: agent-native artifacts with DOM-published state. Each level builds on the previous.
Resist constraining the model. Riffs on Richard Sutton’s bitter lesson — pouring more data and compute at a problem beats hand-engineered constraints over time. Practical implication: the model is probably better at extracting requirements from you than you are at specifying them. Don’t oversp-specify. Specify areas of interest, not outcomes.
Use the ask_user_question tool explicitly in your prompts. Triggers Claude to interview you turn-by-turn instead of guessing from a thin brief. The workshop’s bill-splitting app demo opens with a prompt that names the ask_user_question tool; Claude then walks through a chain of focused yes/no + multiple-choice questions before writing the spec.
HTML over markdown for plans + design specs. Markdown files over ~200 lines won’t get read — by you or by your colleagues. HTML files are more information-dense, more ergonomic for human review, and let you take screenshots to feed back into Claude. Better than markdown across the entire spec-iteration loop.
Generate multiple design directions, compare visually. The workshop demos four HTML design candidates (brutalist, Tokyo fintech, etc.) for the same bill-splitting app. Reviewing four rendered HTML files side-by-side is dramatically faster than reading four markdown bullet-list specs.
Take screenshots as feedback to Claude. Especially in front-end work, “the alignment is slightly off here” is harder to articulate than it is to show. Opus 4.7 has a measurably better vision model than 4.6 — feed it screenshots aggressively when iterating on UI.
Embed verification into the artifact, not into a separate test suite. Components publish their state to the DOM via data-verify-* attributes. The same component matrix can be verified by a human dashboard, a Claude-Code-driven agent in a browser, or headless from CLI — three surfaces, one contract.
DOM-published state separates contract from internals. The component’s verification surface (data-attributes on visible DOM elements) is independent of the component’s React state internals. You can break the visible component and the verification contract catches it; you can also break the contract without breaking the visible component (which is what the workshop’s deliberately-planted failure demonstrates). Both are signals you want.
The cloud code team records every front-end change like this. Workshop calls this out as internal Anthropic practice — playwright sessions are recorded and stored (S3 / shared with colleagues), and the recordings become the verification evidence. Same cadence as the team’s shipping rhythm.
Recommended stack for this workflow. Opus 4.7 (better vision than 4.6), xhigh effort or max effort, auto-mode on, fast mode for spec iteration (costs more but iterates faster — “more tokens to generate the spec but fewer iterations overall”). Audience polled live: almost everyone uses auto-mode; fewer use fast mode; most set the effort parameter.
Token efficiency caveat on HTML specs. Q: aren’t HTML specs more token-inefficient than markdown? A: No, in the long run. A rich HTML spec results in fewer iteration cycles — the upfront token cost is amortized across fewer correction rounds.

Level 1 — Prompting: Let Claude interview you

The framing. Your users know what they want when they see it, but they’re often not very good at articulating it. Likewise, you probably know what you want when you see it — but Claude is likely better at extracting what you need from you than you are at specifying it. This is the practical handle on Sutton’s bitter lesson: don’t hand-engineer the spec; let the more-capable thing pull the spec out of you.

Bad prompting vs. good prompting. Bad: “Make it better.” Bad: “Build me a bill-splitting app.” These predefine the outcome at a level that’s both too vague and too narrow — Claude has nothing to anchor on.

Good prompting:

Focus on the audience. “I’m building a bill-splitting app for friends going out to dinner.” — names the user, not the feature.
Suggest areas to interrogate, not outcomes to deliver. “Ask me about how I want to handle uneven splits, how I want to handle currency, what to do when someone leaves the group.”
Explicitly name the ask_user_question tool. The workshop prompt says something like “Before writing the spec, use the ask_user_question tool to interview me about my requirements.” This is the trigger that unlocks the interview loop.

The interview surface in action. When ask_user_question is in play, Claude doesn’t just emit one open-ended question — it sends a structured form with focused options. The operator tabs through, picks options, and submits. Claude reads the answers, asks more, and converges on a spec.

The demo audience question: who has used auto mode? (many hands). Use auto mode — it suppresses the per-tool permission prompts so the interview converges without you approving every step. Who sets the effort parameter? (many hands). Use xhigh or max for this workflow — the upfront thinking cost amortizes across the rest of the session.

Level 2 — Planning: HTML files over markdown

The problem with markdown specs. Once a markdown plan goes past ~200 lines, nobody reads it. Not you, not your colleagues, not the reviewer. The format makes detail expensive — every additional point steals more attention from every other point.

Why HTML wins. Information-density per square inch of screen. You can render four design directions side-by-side and see the differences instantly. You can include real components, real interactions, real layouts. You can take a screenshot, annotate it, paste it back into Claude with “this part is off” — and Opus 4.7’s vision model resolves your annotation against the rendered HTML faster than a markdown change-request roundtrip would.

The workshop demo: four design directions in parallel. For the bill-splitting app, Ara prompts Opus 4.7 to generate four HTML design directions in different aesthetic registers. The audience clicks through each rendered HTML file in a browser:

Brutalist — heavy weight, monospace, all-caps headers.
Tokyo fintech — clean, dense data display, financial-product styling.
(Two more directions in the source, not extracted in detail from the transcript.)

Each HTML file is the full mock-up — clickable, interactive, with realistic content. Choosing one becomes a 30-second decision instead of a 30-minute reading session.

Then take screenshots to give richer feedback. Audience question: who regularly screenshots back to Claude? (many hands). You should be doing this. Especially for “the spacing is slightly off”-class feedback — easier to highlight a region in a screenshot than to describe a misalignment in prose. Opus 4.7’s vision model is materially better than 4.6’s at understanding annotated screenshots.

Level 3 — Verification: Agent-native artifacts

This is the level the workshop spends the most time on, and the part Ara flags as what the cloud code team actually does internally on every front-end change.

The thesis. Don’t bolt verification onto the artifact after the fact. Make verification part of the artifact’s contract. The component knows what state it’s in; let it publish that state somewhere a verifier can read independently of the component’s internals.

The DOM contract pattern

The workshop’s to-do list app is a small React app: add items, mark them done, drag to reorder, clear completed. Lots of internal state. Standard React state management.

The novel piece: every visible component carries data-verify-* attributes that publish the relevant state to the DOM:

data-verify-unit="total-stats" — names the contract this component participates in.
data-total="7", data-done="3", data-active="4" — the published numbers.

These values are written by the React component as it renders. They’re independent of the component’s internal useState / useReducer — if the React state changes, the data-attributes change to match. The state is announced to the DOM as part of the contract, not just used internally.

Why split contract from internals? Two failure surfaces become independently detectable:

Visible component breaks — UI shows wrong number; data-attributes also show wrong number; both verification surfaces fail.
Contract breaks but UI looks fine — UI shows right number; data-attributes show stale number; the contract fails the verification but the user-facing render still looks correct. This is the case the workshop’s planted failure demonstrates — sums don’t match between the published data-total and what the user would see add up.

Both are bugs worth catching. The split lets a verifier catch the second class — which is the kind of bug that ships to production undetected when you only test the human-visible surface.

Schemas, fixtures, invariants, probes

Each component in the workshop’s verification matrix carries four artifacts:

Schemas — the shape of the data-attributes the component publishes. (E.g., { data-total: number, data-done: number, data-active: number }.)
Fixtures — Storybook-style known-good input states the component can be tested against.
Invariants — properties that must always hold across all fixtures (e.g., data-total === data-done + data-active). This is where the planted failure trips — 3 + 4 ≠ 10.
Probes — additional test cases that push the component off the happy path (empty list, very long list, list with all items done, etc.).

This is testing-library-shaped but agent-native. A human can read the dashboard; an agent can read the same matrix from the DOM via playwright MCP; CI can run it headlessly. One contract, three execution surfaces.

Three execution surfaces

Surface	What it is	When to use
Human dashboard	A separate page in the app that renders the matrix of components + their published state + the schemas / fixtures / invariants. Human eyeballs it and clicks Run All.	Manual review of a new feature; design QA.
Agent-driven browser	Claude Code (Opus 4.7) connects to playwright MCP, reads the verification matrix from the DOM, runs each invariant, reports pass/fail. Operator types one prompt; Claude does the rest.	Mid-session verification during active development.
Headless CI	`run verify` from the CLI. Runs the same matrix without a browser UI. Output is the same pass/fail list.	CI checks; pre-commit hooks; scheduled regressions.

Same matrix; three operating modes. Verification doesn’t get rewritten when you move from one surface to the next — it’s the same DOM contract everywhere.

The recorded-verification artifact

The workshop’s most striking internal-practice signal: the Claude Code team records every front-end change like this. Playwright runs each verification step → each verification produces a video clip → clips are stored (S3 / shared with colleagues) → the recording itself becomes the evidence the change works. This is verbatim from the talk:

“The cloud code team uh records basically all the code changes that they do like this … all the front-end changes at least … especially at the pace of shipping we have at the moment.”

The recording bundle is the verification artifact. Not a green check in CI; a video evidence pack reviewers can replay.

Demo: catching the planted failure

The workshop seeds a deliberate failure: the total-stats component’s published data-total is hardcoded to 10, but data-done + data-active evaluates to 7 (3 + 4). The visible UI still renders something plausible — the user wouldn’t notice. But the invariant data-total === data-done + data-active fires red.

Catching this is then demonstrated in three ways:

Human dashboard — operator clicks Run All; the invariant fails; the dashboard surfaces “3 + 4 ≠ 10” with the offending component highlighted.
Agent-driven browser — Claude Code (Opus 4.7, playwright MCP connected) is asked “verify the to-do app.” Claude reads the matrix, runs each invariant, returns the same failure with a diagnostic of which fixture triggered it.
Headless CI — run verify runs the same matrix and outputs the failure to stdout. Exit code non-zero.

All three surfaces catch the same bug. None of them required hand-written tests for this specific component — the invariant comes from the verification matrix.

The orthogonal failure: contract change without UI change

Workshop then demonstrates the other direction. Operator deletes the data-verify-unit="total-stats" attribute from the component’s render output. The visible UI is unchanged — user-facing behavior identical. But the contract is now gone.

Run verification — every test that referenced total-stats fails because the unit is missing. The agent + dashboard + CLI all report the same failure: “contract total-stats not found in DOM.” Closes the gap between “app looks right” and “app is right.”

Workshop repo + reproduction

The repo (referenced as CWC workshops / cloud-with-code workshops / how-we-claude-code in the auto-captioned transcript — actual org name needs verification, see Open Questions) contains three phases:

Phase 1 — Interview-driven spec. The bill-splitting app prompt that explicitly invokes ask_user_question. Audience can clone, paste the prompt, watch the interview unfold against their own answers.
Phase 2 — Four HTML design directions. Prompt template for generating multiple design candidates as renderable HTML files. Run, click through, pick one.
Phase 3 — To-do app verification matrix. The real load-bearing demo. Storybook fixtures + DOM-published state + playwright MCP + three execution surfaces. Includes the deliberately-planted invariant failure. Detailed README + verification-specific README inside.

Workshop attendees clone the repo, follow the README, run each phase. The repo is positioned as the workshop’s homework — Ara explicitly: “I encourage you to spend more time on the repo.”

Why this matters operationally

The interview-driven pattern is the upgrade path from chat-as-driver to model-as-extractor. Stop trying to write better specs. Write better triggers for the model to interview you. ask_user_question is the named tool that unlocks this on Claude Code.
HTML specs change the iteration economy. Markdown specs cost cycles every time you revise. HTML specs cost more tokens upfront and fewer cycles downstream. The math favors HTML at any non-trivial scale.
Agent-native verification is the only verification model that scales with longer-running agents. If you can’t verify autonomously, the agent has to stop and ask. Workshop’s split-contract-from-internals + three-execution-surfaces pattern lets an agent verify the same matrix a human can — without rewriting tests.
The internal-practice signal is the headline. The Claude Code team at Anthropic records every front-end change this way. That’s not a future-state recommendation; it’s the current working pattern for the team building Claude Code itself. Worth treating as ground truth on workflow, not a speculative architecture sketch.
Pairs with the /workflows and /goal story. The verification matrix is the test set that long-running /goal sessions and chained /workflows need to verify their work against. Without an agent-readable verification contract, you can’t autonomously check the work — and chained agents either over-trust each other or grind to a halt asking. See workflows` walkthrough and goal` walkthrough for the orchestration side; this article is the verification side that completes the loop.

Try It

Run the workshop repo locally. Find the actual GitHub org (auto-captioned as “CWC workshops” — see Open Questions; likely code-with-claude-workshops/how-we-claude-code or similar). Clone, follow the README. Phase 1 is fast; Phase 3 is the meaningful exercise.
Add ask_user_question to your next prompt. Before describing what you want, name the tool: “Before writing anything, use the ask_user_question tool to interview me about the requirements you don’t have enough context on yet.” Pair with auto-mode on so the interview converges without permission prompts. See auto-mode for the safety rules to set first.
Replace your next markdown spec with three HTML candidates. Same brief, three rendered design directions. Open all three in browser tabs. Pick by reaction. Note how much faster the decision feels than reading three markdown specs.
Pick one component in your app and add a data-verify-* contract. Doesn’t have to be the whole app. One component, one published-state set of data-attributes, one invariant. Run the invariant from playwright MCP (or just devtools console). Feel the difference between “the UI looks right” and “the contract holds.”
Set the recommended stack. Opus 4.7 + xhigh (or max) effort + auto-mode + fast mode if you have the budget. The workshop’s claim is that fast mode pays off on spec iteration despite costing more per token — fewer rounds beats cheaper rounds.
Record a playwright verification run when you have a working verification matrix. Even if you don’t share it externally, the recording is the artifact that proves the change works — same way the Claude Code team uses it internally.

Open Questions

GitHub org / repo URL. Auto-captioning rendered the workshop repo location as “CWC workshops / cloud-with-code workshops / how-we-claude-code” which is almost certainly garbled. Likely candidates: code-with-claude-workshops org on GitHub, repo how-we-claude-code. Worth a direct fetch on next ingest to surface the canonical URL.
Tar’s name / handle. Speaker references “Tar” as the SF talk’s author and source of “the unreasonable effectiveness of HTML files” blog post. Almost certainly Thariq — see html-effectiveness Thariq — but the transcript renders the name without the q. Confirm spelling on first refresh.
Workshop slide deck / blog post URL. Ara references Anthropic engineering blog posts on harnesses, long-running agents, etc. Worth pulling the specific blog index referenced.
data-verify-* attribute schema. The workshop describes a generic pattern (data-verify-unit, data-total, etc.) but doesn’t formalize a schema for what attributes are mandatory vs. optional, or how nested components publish state. Worth a code-side pass when the repo is fetched.
Reusability of the verification matrix across non-React stacks. Workshop is React + Storybook. The DOM contract pattern is framework-agnostic in principle, but the fixtures + invariants tooling shown is Storybook-specific. Open: what does this look like on Vue / Svelte / SwiftUI / vanilla HTML?
The drag-and-drop interaction state. To-do app supports drag-to-reorder. How is transient state (mid-drag) verified — is it represented in the DOM contract, or only the post-drag-settled state? Source doesn’t clarify.

HTML Effectiveness (Thariq) — the SF talk that motivates Level 2 (HTML over markdown). This London workshop is the London adaptation.
Code with Claude London 2026 — Opening Keynote — the conference this workshop runs at.
Managed Agents Self-Hosted Sandboxes + MCP Tunnels — paired London 2026 launch surface.
workflows` Walkthrough — the deterministic orchestration primitive this verification pattern feeds. Workflows verify against agent-native contracts.
goal` Walkthrough — long-running agent loops that need autonomous verification; this workshop’s matrix is the verification half.
CLI Reference — auto-mode, fast-mode, effort parameter all referenced by Ara as part of the recommended stack.
Extended Thinking — xhigh / max effort referenced as the workshop’s recommended setting.
Picking the Right Model — Build a Private Eval — eval-driven discipline; verification matrix is the same idea at component level.
Essential MCP Servers — playwright MCP is the agent-side driver of Level 3.
Troubleshooting Claude — context-rot mechanism that motivates moving from “long markdown specs” to “dense HTML specs” + verification contracts.

Jonathon's AI Wiki

Explorer

How We Claude Code — Ara's Applied AI Workshop (Code with Claude London 2026)

Key Takeaways

Level 1 — Prompting: Let Claude interview you

Level 2 — Planning: HTML files over markdown

Level 3 — Verification: Agent-native artifacts

The DOM contract pattern

Schemas, fixtures, invariants, probes

Three execution surfaces

The recorded-verification artifact

Demo: catching the planted failure

The orthogonal failure: contract change without UI change

Workshop repo + reproduction

Why this matters operationally

Try It

Open Questions

Graph View

Table of Contents

Backlinks

Jonathon's AI Wiki

Explorer

How We Claude Code — Ara's Applied AI Workshop (Code with Claude London 2026)

Key Takeaways

Level 1 — Prompting: Let Claude interview you

Level 2 — Planning: HTML files over markdown

Level 3 — Verification: Agent-native artifacts

The DOM contract pattern

Schemas, fixtures, invariants, probes

Three execution surfaces

The recorded-verification artifact

Demo: catching the planted failure

The orthogonal failure: contract change without UI change

Workshop repo + reproduction

Why this matters operationally

Try It

Open Questions

Related

Graph View

Table of Contents

Backlinks