Source: raw/How_we_Claude_Code.md — YouTube transcript (video ID IlqJqcl8ONE, fetched 2026-05-25 via inbox-refresh wiki-inbox playlist). Workshop title: How We Claude Code. Speaker: Ara, member of Anthropic’s Applied AI team (architect). Venue: Code with Claude London 2026. Companion repo: CWC workshops / cloud-with-code workshops / how-we-claude-code (organization name garbled by auto-captions; actual GitHub org likely code-with-claude-workshops or similar — see Open Questions). The talk is presented as the London adaptation of Thariq’s San Francisco talk “The Unreasonable Effectiveness of HTML Files” (~10 days earlier). See HTML Effectiveness (Thariq) for the SF source.
How the Claude Code team at Anthropic actually uses Claude Code in their daily work — distilled to three levels of practice across prompting, planning, and verification. The thesis: as models get more capable, you should let the model do more of the upfront work of extracting requirements from you, switch from markdown specs to HTML specs for review, and embed verification as a contract into the artifact itself so an agent can drive it end-to-end. Workshop is hands-on with a three-phase repo: (1) bill-splitting app via ask_user_question interview, (2) four HTML design directions generated and compared visually, (3) to-do app with agent-native verification matrix (storybook fixtures + DOM-published state + playwright MCP) runnable from three surfaces — human dashboard, agent-driven browser, or headless CI.
Key Takeaways
- Three levels of practice. (1) Prompting: let Claude interview you. (2) Planning: HTML specs over markdown. (3) Verification: agent-native artifacts with DOM-published state. Each level builds on the previous.
- Resist constraining the model. Riffs on Richard Sutton’s bitter lesson — pouring more data and compute at a problem beats hand-engineered constraints over time. Practical implication: the model is probably better at extracting requirements from you than you are at specifying them. Don’t oversp-specify. Specify areas of interest, not outcomes.
- Use the
ask_user_questiontool explicitly in your prompts. Triggers Claude to interview you turn-by-turn instead of guessing from a thin brief. The workshop’s bill-splitting app demo opens with a prompt that names theask_user_questiontool; Claude then walks through a chain of focused yes/no + multiple-choice questions before writing the spec. - HTML over markdown for plans + design specs. Markdown files over ~200 lines won’t get read — by you or by your colleagues. HTML files are more information-dense, more ergonomic for human review, and let you take screenshots to feed back into Claude. Better than markdown across the entire spec-iteration loop.
- Generate multiple design directions, compare visually. The workshop demos four HTML design candidates (brutalist, Tokyo fintech, etc.) for the same bill-splitting app. Reviewing four rendered HTML files side-by-side is dramatically faster than reading four markdown bullet-list specs.
- Take screenshots as feedback to Claude. Especially in front-end work, “the alignment is slightly off here” is harder to articulate than it is to show. Opus 4.7 has a measurably better vision model than 4.6 — feed it screenshots aggressively when iterating on UI.
- Embed verification into the artifact, not into a separate test suite. Components publish their state to the DOM via
data-verify-*attributes. The same component matrix can be verified by a human dashboard, a Claude-Code-driven agent in a browser, or headless from CLI — three surfaces, one contract. - DOM-published state separates contract from internals. The component’s verification surface (data-attributes on visible DOM elements) is independent of the component’s React state internals. You can break the visible component and the verification contract catches it; you can also break the contract without breaking the visible component (which is what the workshop’s deliberately-planted failure demonstrates). Both are signals you want.
- The cloud code team records every front-end change like this. Workshop calls this out as internal Anthropic practice — playwright sessions are recorded and stored (S3 / shared with colleagues), and the recordings become the verification evidence. Same cadence as the team’s shipping rhythm.
- Recommended stack for this workflow. Opus 4.7 (better vision than 4.6),
xhigheffort ormaxeffort, auto-mode on, fast mode for spec iteration (costs more but iterates faster — “more tokens to generate the spec but fewer iterations overall”). Audience polled live: almost everyone uses auto-mode; fewer use fast mode; most set the effort parameter. - Token efficiency caveat on HTML specs. Q: aren’t HTML specs more token-inefficient than markdown? A: No, in the long run. A rich HTML spec results in fewer iteration cycles — the upfront token cost is amortized across fewer correction rounds.
Level 1 — Prompting: Let Claude interview you
The framing. Your users know what they want when they see it, but they’re often not very good at articulating it. Likewise, you probably know what you want when you see it — but Claude is likely better at extracting what you need from you than you are at specifying it. This is the practical handle on Sutton’s bitter lesson: don’t hand-engineer the spec; let the more-capable thing pull the spec out of you.
Bad prompting vs. good prompting. Bad: “Make it better.” Bad: “Build me a bill-splitting app.” These predefine the outcome at a level that’s both too vague and too narrow — Claude has nothing to anchor on.
Good prompting:
- Focus on the audience. “I’m building a bill-splitting app for friends going out to dinner.” — names the user, not the feature.
- Suggest areas to interrogate, not outcomes to deliver. “Ask me about how I want to handle uneven splits, how I want to handle currency, what to do when someone leaves the group.”
- Explicitly name the
ask_user_questiontool. The workshop prompt says something like “Before writing the spec, use theask_user_questiontool to interview me about my requirements.” This is the trigger that unlocks the interview loop.
The interview surface in action. When ask_user_question is in play, Claude doesn’t just emit one open-ended question — it sends a structured form with focused options. The operator tabs through, picks options, and submits. Claude reads the answers, asks more, and converges on a spec.
The demo audience question: who has used auto mode? (many hands). Use auto mode — it suppresses the per-tool permission prompts so the interview converges without you approving every step. Who sets the effort parameter? (many hands). Use xhigh or max for this workflow — the upfront thinking cost amortizes across the rest of the session.
Level 2 — Planning: HTML files over markdown
The problem with markdown specs. Once a markdown plan goes past ~200 lines, nobody reads it. Not you, not your colleagues, not the reviewer. The format makes detail expensive — every additional point steals more attention from every other point.
Why HTML wins. Information-density per square inch of screen. You can render four design directions side-by-side and see the differences instantly. You can include real components, real interactions, real layouts. You can take a screenshot, annotate it, paste it back into Claude with “this part is off” — and Opus 4.7’s vision model resolves your annotation against the rendered HTML faster than a markdown change-request roundtrip would.
The workshop demo: four design directions in parallel. For the bill-splitting app, Ara prompts Opus 4.7 to generate four HTML design directions in different aesthetic registers. The audience clicks through each rendered HTML file in a browser:
- Brutalist — heavy weight, monospace, all-caps headers.
- Tokyo fintech — clean, dense data display, financial-product styling.
- (Two more directions in the source, not extracted in detail from the transcript.)
Each HTML file is the full mock-up — clickable, interactive, with realistic content. Choosing one becomes a 30-second decision instead of a 30-minute reading session.
Then take screenshots to give richer feedback. Audience question: who regularly screenshots back to Claude? (many hands). You should be doing this. Especially for “the spacing is slightly off”-class feedback — easier to highlight a region in a screenshot than to describe a misalignment in prose. Opus 4.7’s vision model is materially better than 4.6’s at understanding annotated screenshots.
Level 3 — Verification: Agent-native artifacts
This is the level the workshop spends the most time on, and the part Ara flags as what the cloud code team actually does internally on every front-end change.
The thesis. Don’t bolt verification onto the artifact after the fact. Make verification part of the artifact’s contract. The component knows what state it’s in; let it publish that state somewhere a verifier can read independently of the component’s internals.
The DOM contract pattern
The workshop’s to-do list app is a small React app: add items, mark them done, drag to reorder, clear completed. Lots of internal state. Standard React state management.
The novel piece: every visible component carries data-verify-* attributes that publish the relevant state to the DOM:
data-verify-unit="total-stats"— names the contract this component participates in.data-total="7",data-done="3",data-active="4"— the published numbers.
These values are written by the React component as it renders. They’re independent of the component’s internal useState / useReducer — if the React state changes, the data-attributes change to match. The state is announced to the DOM as part of the contract, not just used internally.
Why split contract from internals? Two failure surfaces become independently detectable:
- Visible component breaks — UI shows wrong number; data-attributes also show wrong number; both verification surfaces fail.
- Contract breaks but UI looks fine — UI shows right number; data-attributes show stale number; the contract fails the verification but the user-facing render still looks correct. This is the case the workshop’s planted failure demonstrates — sums don’t match between the published
data-totaland what the user would see add up.
Both are bugs worth catching. The split lets a verifier catch the second class — which is the kind of bug that ships to production undetected when you only test the human-visible surface.
Schemas, fixtures, invariants, probes
Each component in the workshop’s verification matrix carries four artifacts:
- Schemas — the shape of the data-attributes the component publishes. (E.g.,
{ data-total: number, data-done: number, data-active: number }.) - Fixtures — Storybook-style known-good input states the component can be tested against.
- Invariants — properties that must always hold across all fixtures (e.g.,
data-total === data-done + data-active). This is where the planted failure trips — 3 + 4 ≠ 10. - Probes — additional test cases that push the component off the happy path (empty list, very long list, list with all items done, etc.).
This is testing-library-shaped but agent-native. A human can read the dashboard; an agent can read the same matrix from the DOM via playwright MCP; CI can run it headlessly. One contract, three execution surfaces.
Three execution surfaces
| Surface | What it is | When to use |
|---|---|---|
| Human dashboard | A separate page in the app that renders the matrix of components + their published state + the schemas / fixtures / invariants. Human eyeballs it and clicks Run All. | Manual review of a new feature; design QA. |
| Agent-driven browser | Claude Code (Opus 4.7) connects to playwright MCP, reads the verification matrix from the DOM, runs each invariant, reports pass/fail. Operator types one prompt; Claude does the rest. | Mid-session verification during active development. |
| Headless CI | run verify from the CLI. Runs the same matrix without a browser UI. Output is the same pass/fail list. | CI checks; pre-commit hooks; scheduled regressions. |
Same matrix; three operating modes. Verification doesn’t get rewritten when you move from one surface to the next — it’s the same DOM contract everywhere.
The recorded-verification artifact
The workshop’s most striking internal-practice signal: the Claude Code team records every front-end change like this. Playwright runs each verification step → each verification produces a video clip → clips are stored (S3 / shared with colleagues) → the recording itself becomes the evidence the change works. This is verbatim from the talk:
“The cloud code team uh records basically all the code changes that they do like this … all the front-end changes at least … especially at the pace of shipping we have at the moment.”
The recording bundle is the verification artifact. Not a green check in CI; a video evidence pack reviewers can replay.
Demo: catching the planted failure
The workshop seeds a deliberate failure: the total-stats component’s published data-total is hardcoded to 10, but data-done + data-active evaluates to 7 (3 + 4). The visible UI still renders something plausible — the user wouldn’t notice. But the invariant data-total === data-done + data-active fires red.
Catching this is then demonstrated in three ways:
- Human dashboard — operator clicks Run All; the invariant fails; the dashboard surfaces “3 + 4 ≠ 10” with the offending component highlighted.
- Agent-driven browser — Claude Code (Opus 4.7, playwright MCP connected) is asked “verify the to-do app.” Claude reads the matrix, runs each invariant, returns the same failure with a diagnostic of which fixture triggered it.
- Headless CI —
run verifyruns the same matrix and outputs the failure to stdout. Exit code non-zero.
All three surfaces catch the same bug. None of them required hand-written tests for this specific component — the invariant comes from the verification matrix.
The orthogonal failure: contract change without UI change
Workshop then demonstrates the other direction. Operator deletes the data-verify-unit="total-stats" attribute from the component’s render output. The visible UI is unchanged — user-facing behavior identical. But the contract is now gone.
Run verification — every test that referenced total-stats fails because the unit is missing. The agent + dashboard + CLI all report the same failure: “contract total-stats not found in DOM.” Closes the gap between “app looks right” and “app is right.”
Workshop repo + reproduction
The repo (referenced as CWC workshops / cloud-with-code workshops / how-we-claude-code in the auto-captioned transcript — actual org name needs verification, see Open Questions) contains three phases:
- Phase 1 — Interview-driven spec. The bill-splitting app prompt that explicitly invokes
ask_user_question. Audience can clone, paste the prompt, watch the interview unfold against their own answers. - Phase 2 — Four HTML design directions. Prompt template for generating multiple design candidates as renderable HTML files. Run, click through, pick one.
- Phase 3 — To-do app verification matrix. The real load-bearing demo. Storybook fixtures + DOM-published state + playwright MCP + three execution surfaces. Includes the deliberately-planted invariant failure. Detailed README + verification-specific README inside.
Workshop attendees clone the repo, follow the README, run each phase. The repo is positioned as the workshop’s homework — Ara explicitly: “I encourage you to spend more time on the repo.”
Why this matters operationally
- The interview-driven pattern is the upgrade path from chat-as-driver to model-as-extractor. Stop trying to write better specs. Write better triggers for the model to interview you.
ask_user_questionis the named tool that unlocks this on Claude Code. - HTML specs change the iteration economy. Markdown specs cost cycles every time you revise. HTML specs cost more tokens upfront and fewer cycles downstream. The math favors HTML at any non-trivial scale.
- Agent-native verification is the only verification model that scales with longer-running agents. If you can’t verify autonomously, the agent has to stop and ask. Workshop’s split-contract-from-internals + three-execution-surfaces pattern lets an agent verify the same matrix a human can — without rewriting tests.
- The internal-practice signal is the headline. The Claude Code team at Anthropic records every front-end change this way. That’s not a future-state recommendation; it’s the current working pattern for the team building Claude Code itself. Worth treating as ground truth on workflow, not a speculative architecture sketch.
- Pairs with the
/workflowsand/goalstory. The verification matrix is the test set that long-running/goalsessions and chained/workflowsneed to verify their work against. Without an agent-readable verification contract, you can’t autonomously check the work — and chained agents either over-trust each other or grind to a halt asking. See [[claude-ai/claude-code-workflows-tool-walkthrough|/workflowswalkthrough]] and [[claude-ai/claude-code-goal-command-walkthrough|/goalwalkthrough]] for the orchestration side; this article is the verification side that completes the loop.
Try It
- Run the workshop repo locally. Find the actual GitHub org (auto-captioned as “CWC workshops” — see Open Questions; likely
code-with-claude-workshops/how-we-claude-codeor similar). Clone, follow the README. Phase 1 is fast; Phase 3 is the meaningful exercise. - Add
ask_user_questionto your next prompt. Before describing what you want, name the tool: “Before writing anything, use theask_user_questiontool to interview me about the requirements you don’t have enough context on yet.” Pair with auto-mode on so the interview converges without permission prompts. See auto-mode for the safety rules to set first. - Replace your next markdown spec with three HTML candidates. Same brief, three rendered design directions. Open all three in browser tabs. Pick by reaction. Note how much faster the decision feels than reading three markdown specs.
- Pick one component in your app and add a
data-verify-*contract. Doesn’t have to be the whole app. One component, one published-state set of data-attributes, one invariant. Run the invariant from playwright MCP (or just devtools console). Feel the difference between “the UI looks right” and “the contract holds.” - Set the recommended stack. Opus 4.7 +
xhigh(ormax) effort + auto-mode + fast mode if you have the budget. The workshop’s claim is that fast mode pays off on spec iteration despite costing more per token — fewer rounds beats cheaper rounds. - Record a playwright verification run when you have a working verification matrix. Even if you don’t share it externally, the recording is the artifact that proves the change works — same way the Claude Code team uses it internally.
Open Questions
- GitHub org / repo URL. Auto-captioning rendered the workshop repo location as “CWC workshops / cloud-with-code workshops / how-we-claude-code” which is almost certainly garbled. Likely candidates:
code-with-claude-workshopsorg on GitHub, repohow-we-claude-code. Worth a direct fetch on next ingest to surface the canonical URL. - Tar’s name / handle. Speaker references “Tar” as the SF talk’s author and source of “the unreasonable effectiveness of HTML files” blog post. Almost certainly Thariq — see html-effectiveness Thariq — but the transcript renders the name without the q. Confirm spelling on first refresh.
- Workshop slide deck / blog post URL. Ara references Anthropic engineering blog posts on harnesses, long-running agents, etc. Worth pulling the specific blog index referenced.
data-verify-*attribute schema. The workshop describes a generic pattern (data-verify-unit,data-total, etc.) but doesn’t formalize a schema for what attributes are mandatory vs. optional, or how nested components publish state. Worth a code-side pass when the repo is fetched.- Reusability of the verification matrix across non-React stacks. Workshop is React + Storybook. The DOM contract pattern is framework-agnostic in principle, but the fixtures + invariants tooling shown is Storybook-specific. Open: what does this look like on Vue / Svelte / SwiftUI / vanilla HTML?
- The drag-and-drop interaction state. To-do app supports drag-to-reorder. How is transient state (mid-drag) verified — is it represented in the DOM contract, or only the post-drag-settled state? Source doesn’t clarify.
Related
- HTML Effectiveness (Thariq) — the SF talk that motivates Level 2 (HTML over markdown). This London workshop is the London adaptation.
- Code with Claude London 2026 — Opening Keynote — the conference this workshop runs at.
- Managed Agents Self-Hosted Sandboxes + MCP Tunnels — paired London 2026 launch surface.
- [[claude-ai/claude-code-workflows-tool-walkthrough|Claude Code
/workflowsWalkthrough]] — the deterministic orchestration primitive this verification pattern feeds. Workflows verify against agent-native contracts. - [[claude-ai/claude-code-goal-command-walkthrough|Claude Code
/goalWalkthrough]] — long-running agent loops that need autonomous verification; this workshop’s matrix is the verification half. - CLI Reference —
auto-mode,fast-mode,effortparameter all referenced by Ara as part of the recommended stack. - Extended Thinking —
xhigh/maxeffort referenced as the workshop’s recommended setting. - Picking the Right Model — Build a Private Eval — eval-driven discipline; verification matrix is the same idea at component level.
- Essential MCP Servers — playwright MCP is the agent-side driver of Level 3.
- Troubleshooting Claude — context-rot mechanism that motivates moving from “long markdown specs” to “dense HTML specs” + verification contracts.