The Architect's Playbook

Source: raw/The Architect’s Playbook.pdf

“Enterprise LLM Architecture: Design Patterns, Anti-Patterns, and System Workflows for Production Deployments.” A richly illustrated reference guide from Anthropic that organizes production patterns across four architectural domains: Structured Data Extraction, Customer Support Orchestration, Developer Productivity, and Multi-Agent Systems. This is the visual/conceptual companion to the Official Exam Guide — where the exam guide lists task statements, the Playbook shows you how each pattern works with diagrams.

The Four Domains of AI Architecture

The Playbook organizes patterns across four production domains, each with distinct constraints:

Structured Data Extraction — high volume, strict schemas, batch pipelines
Customer Support Orchestration — stateful, human-in-the-loop, policy constraints
Developer Productivity — dynamic tasks, iterative context, advanced tool use
Multi-Agent Systems — parallel processing, shared memory, cross-agent synthesis

The Hierarchy of Constraints

Every production LLM system faces four competing constraints. The Playbook defines how to mitigate each:

Constraint	Mitigation Strategy
Latency	Parallelization and caching
Accuracy	Structured intermediates and few-shot prompts
Cost	Batch APIs and context pruning
Compliance	Application-layer intercepts (NOT prompts)

The critical insight: compliance is never solved by prompts. Even emphatic system prompts (“CRITICAL POLICY: NEVER process > $500”) still yield a 3% failure rate. Application-layer hooks that intercept tool calls are the only reliable enforcement mechanism.

Patterns and Anti-Patterns

Routing for Cost and SLA

Rule: Never default to real-time for asynchronous needs.

Workload Type	Approach
Urgent exceptions	Real-time Messages API (high cost, instant latency)
Standard workflows	Message Batches API (50% cost savings)
Continuous arrival (30h SLA)	Submit batches every 6 hours containing documents from that window

Designing Resilient Schemas

Anti-pattern: Fragile enum expansion — continuously adding new enum values as edge cases arise. Eventually breaks validation for previously unseen inputs
Pattern: Resilient catch-all — add "other" to the enum paired with a detail string field. Captures unexpected values without validation failure
Data Evolution Rule: For amended documents, redesign schemas so amended fields capture multiple values with source location and effective date, rather than overwriting originals

Enforcing Mathematical Consistency

18% of invoice extractions show line items that do not match the grand total due to OCR or extraction errors.

Pattern: Schema Redundancy — extract both calculated_total (model sums items) and stated_total (extracted directly from document). Flag for human review ONLY when calculated_total != stated_total.

Normalization and Null Handling

Problem: When fields are nullable, models may invent plausible data (e.g., attendee_count: "500") if not explicitly instructed to return null
Pattern: Add explicit null instructions: “If attendee count is not mentioned in the text, return null”
Problem: Inconsistent formats (“cotton blend” vs “Cotton/Polyester Mix”)
Solution: Few-shot standardization — provide 2-3 complete input-output pairs showing standardized formats. Do not rely on temperature 0 alone

The Limits of Automated Retry

Effective: Formatting errors (nested objects vs flat arrays, locale-formatted strings). Appending specific validation errors to the prompt resolves most failures in 2-3 attempts
Ineffective: Missing information (e.g., source says “et al.” and points to an unprovided external document). No amount of retrying will produce information that does not exist in the source
Rule: Recognize when to fail fast. Retries for missing information waste tokens

Calibrating Human-in-the-Loop

Have the model output field-level confidence scores
Automate extractions with confidence above 90%
Critical validation step: Analyze accuracy by document type AND field to verify high-confidence extractions perform consistently across all segments, not just in aggregate
Aggregate metrics mask field-level issues — a 95% overall accuracy can hide a 60% accuracy on a specific field type

Zero-Tolerance Compliance

The trap: Relying on emphatic system prompts (“CRITICAL POLICY: NEVER process > $500”) still yields a ~3% failure rate
The standard: Implement an application-layer hook to intercept tool calls. When the process amount exceeds the threshold, block it server-side and invoke escalation
Model discretion is removed. The enforcement is deterministic

Resuming Asynchronous Sessions

Problem: Resuming a session hours later leads to the model confidently stating outdated status from previous tool calls
Pattern: Resume with full conversation history but programmatically filter out previous tool_result messages. Keep only human/assistant turns so the agent is forced to re-fetch current data
This ensures returning customers always receive fresh, current information

Tool Context Pruning

The bloat: Repeatedly calling lookup_order fills the context window with verbose shipping and payment data when only the return status is needed
Pattern: Application-side filtering — extract only relevant fields (items, purchase data, return window, status) from each API response before it enters the conversation
This prevents context accumulation from verbose tool responses

Graceful Tool Failure

Anti-pattern: Throwing application exceptions that crash the agent, or returning empty strings
Correct pattern: Return the error message in the tool result content with the isError flag set to true. Include errorCategory (transient/validation/permission) and isRetryable boolean
Agent receives structured error information and can communicate appropriately to the user

The Escalation Handoff

Two escalation paths based on trigger type:

“I want a human NOW” — immediate escalation. Honor it immediately. Do not ask for more clarification
Complex policy issue — context gathering first. Ensure account context tools (get_customer) are called before escalating
The payload: Do not dump raw transcripts. Pass a structured summary: Customer ID, Root Cause, Amount, Recommended Action

Compressing Long Sessions

The challenge: A single session covers a refund inquiry, a subscription question, and a payment update across 48 turns. Context limits approach
The strategy: Summarize earlier, resolved turns into a narrative description. Preserve full verbatim message history only for the active, unresolved issue
This dramatically reduces context usage while maintaining accuracy for the current problem

MCP Tool Specificity

The trap: Providing a broad custom tool (analyze_dependencies) alongside built-in tools like Grep. The agent defaults to Grep
The fix: Split broad tools into highly granular, single-purpose tools (list_imports, resolve_transitive_deps, detect_circular_deps). Enhance descriptions to explicitly detail capabilities and when to prefer them over text manipulation

Directed Codebase Exploration

Anti-pattern: Reading all 15 files sequentially. Overloads context with unrelated data
Pattern: Dynamic investigation — (1) Analyze imports and base interfaces, (2) trace specific implementations, (3) dynamically generate subtasks based on findings
For new engineers on 800+ file codebases: read CLAUDE.md/README first, then ask the human for priority files
For intermittent bugs: have the agent dynamically generate investigation subtasks, adapting as errors emerge

Branching Reality (fork_session)

Problem: Exploring two distinct refactoring approaches in a single thread confuses the agent and mixes context
Pattern: Use fork_session to create two separate branches from a foundational analysis. Each branch explores independently without context contamination
This enables A/B comparison of approaches (e.g., microservice extraction vs in-place refactor)

The Scratchpad Pattern

The decay: In extended sessions (30+ minutes), accumulated token bloat causes inconsistent answers about early discoveries
Pattern: Agent maintains a scratchpad.md recording key findings, architectural maps, and decisions. References this structured file for subsequent questions
Prevents context decay in long exploration sessions

Resumption in Dynamic Environments

Scenario: Engineer resumes exploration, but 3 of 12 files have been altered by a teammate’s PR overnight
Pattern: Resume from previous transcript, but explicitly inform the agent which specific files or functions changed for targeted re-analysis. Do not force a complete re-read
Command: resume_session --update_context={files:['File C', 'File D', 'File E'], changes:'renamed utility functions'}

Shared Memory Architecture

Anti-pattern: Daisy-chaining conversation logs between subagents. Each agent sees all previous conversation, leading to context bloat
Pattern: Shared vector store. Subagents index their findings; subsequent agents retrieve via semantic search. Only relevant findings are surfaced, not entire conversation histories

Forcing Execution Order

Problem: Relying on prompt instructions (“call extract_metadata first”) — “prompt begging”
Pattern: Use tool_choice forced selection for the first API call to guarantee pipeline order. Process subsequent steps in follow-up turns

Structured Intermediate Representations

When subagents produce heterogeneous outputs (financial JSON, news prose, patent lists), a format conversion layer standardizes everything into a common format
Common format: {claim, evidence, source, confidence}
This enables the synthesis agent to process findings uniformly regardless of source type

Parallelization and Caching

Parallel subagents for independent data retrieval: e.g., 12 legal precedents in ~30 seconds vs 3+ minutes sequential
Apply prompt caching on the synthesis subagent for 80K+ token accumulated findings
Combine parallelization (speed) with caching (cost reduction) for maximum efficiency

Goal-Oriented Delegation

Anti-pattern: Procedural micromanagement — telling subagents exactly which steps to take
Pattern: Goal-oriented — specify research goals + quality criteria, let the subagent determine its own strategy
Goal-oriented delegation produces better results because the subagent can adapt to what it finds

The Architect’s Reference Matrix

Maps the recommended solution pattern to each domain and constraint combination. Green cells in the original Playbook indicate the primary solution for that intersection:

Challenge	Data Extraction	Customer Support	Dev Productivity	Multi-Agent
Token Bloat	—	Filter Stale Results	Scratchpad File	Shared Vector Store
Latency	Batch Routing	—	—	Parallelization & Caching
Compliance/Control	—	App-Layer Intercepts	—	`tool_choice` Enforcement
Accuracy	Schema Redundancy	—	Granular MCP Tools	Structured Intermediates

Use this matrix as a study aid: for each green cell, understand the pattern and why it is the primary solution for that domain-constraint pair. Empty cells indicate the constraint is not the primary concern for that domain.

Production Architecture Blueprint

The Playbook concludes with a layered production architecture showing how all patterns fit together:

Layer 1: Ingestion & Routing — Pattern router classifies incoming work as real-time or batch. Intelligence is at the edges — the router applies strict typing in the middle
Execution Layer — Granular tools (Tool A, B, C…) paired with application-layer intercepts (validation guardrails, policy enforcement, schema checks). The intercepts guard the core — every tool call passes through them
State Management — Pruning logic and shared vector store sit beneath the execution layer, sustaining context window management across the lifecycle
Synthesis — Result aggregation, formatting, and delivery. The final output layer

The key insight: application intercepts sit between the tools and the output, not as a separate layer but integrated into the execution path. This makes compliance enforcement unavoidable rather than optional.

Key Takeaways

The Hierarchy of Constraints (latency, accuracy, cost, compliance) is the foundational framework — every architectural decision should be evaluated against it
Compliance is NEVER enforced via prompts. Application-layer intercepts are the only reliable mechanism. Emphatic prompts still fail ~3% of the time
Schema redundancy (calculated_total vs stated_total) is the standard pattern for catching extraction errors
Retries are effective for formatting errors but useless for missing information — know when to fail fast
Tool descriptions are more important than tool implementations for agent reliability — agents cannot use tools they cannot find
The Scratchpad Pattern prevents context decay in long sessions and is a practical must-have for developer productivity agents
Shared vector stores beat daisy-chained conversation logs for multi-agent memory
Goal-oriented delegation outperforms procedural micromanagement for subagent prompts
The Reference Matrix is an excellent exam study aid — it maps every solution pattern to its domain

Try It

Map each Playbook pattern to its exam task statement in the Official Exam Guide. For example, “Zero-Tolerance Compliance” maps to Task 1.5 (hooks for tool call interception) and Task 1.4 (enforcement patterns)
Build a small customer support agent that implements at least 5 patterns from this Playbook: escalation handoff, tool context pruning, graceful tool failure, session compression, and compliance intercepts
Draw the Production Architecture Blueprint from memory — this tests whether you understand the layered architecture
For each anti-pattern, write down a scenario where the anti-pattern would actually be the right choice. If you cannot, that confirms the pattern is robust
Create flashcards for the Reference Matrix — quiz yourself on which solution applies to which domain and challenge

Jonathon's AI Wiki

Explorer

The Architect's Playbook

The Four Domains of AI Architecture

The Hierarchy of Constraints

Patterns and Anti-Patterns

Routing for Cost and SLA

Designing Resilient Schemas

Enforcing Mathematical Consistency

Normalization and Null Handling

The Limits of Automated Retry

Calibrating Human-in-the-Loop

Zero-Tolerance Compliance

Resuming Asynchronous Sessions

Tool Context Pruning

Graceful Tool Failure

The Escalation Handoff

Compressing Long Sessions

MCP Tool Specificity

Directed Codebase Exploration

Branching Reality (fork_session)

The Scratchpad Pattern

Resumption in Dynamic Environments

Shared Memory Architecture

Forcing Execution Order

Structured Intermediate Representations

Parallelization and Caching

Goal-Oriented Delegation

The Architect’s Reference Matrix

Production Architecture Blueprint

Key Takeaways

Try It

Graph View

Table of Contents

Backlinks

Jonathon's AI Wiki

Explorer

The Architect's Playbook

The Four Domains of AI Architecture

The Hierarchy of Constraints

Patterns and Anti-Patterns

Routing for Cost and SLA

Designing Resilient Schemas

Enforcing Mathematical Consistency

Normalization and Null Handling

The Limits of Automated Retry

Calibrating Human-in-the-Loop

Zero-Tolerance Compliance

Resuming Asynchronous Sessions

Tool Context Pruning

Graceful Tool Failure

The Escalation Handoff

Compressing Long Sessions

MCP Tool Specificity

Directed Codebase Exploration

Branching Reality (fork_session)

The Scratchpad Pattern

Resumption in Dynamic Environments

Shared Memory Architecture

Forcing Execution Order

Structured Intermediate Representations

Parallelization and Caching

Goal-Oriented Delegation

The Architect’s Reference Matrix

Production Architecture Blueprint

Key Takeaways

Related

Try It

Graph View

Table of Contents

Backlinks