Source: raw/The Architect’s Playbook.pdf
“Enterprise LLM Architecture: Design Patterns, Anti-Patterns, and System Workflows for Production Deployments.” A richly illustrated reference guide from Anthropic that organizes production patterns across four architectural domains: Structured Data Extraction, Customer Support Orchestration, Developer Productivity, and Multi-Agent Systems. This is the visual/conceptual companion to the Official Exam Guide — where the exam guide lists task statements, the Playbook shows you how each pattern works with diagrams.
The Four Domains of AI Architecture
The Playbook organizes patterns across four production domains, each with distinct constraints:
- Structured Data Extraction — high volume, strict schemas, batch pipelines
- Customer Support Orchestration — stateful, human-in-the-loop, policy constraints
- Developer Productivity — dynamic tasks, iterative context, advanced tool use
- Multi-Agent Systems — parallel processing, shared memory, cross-agent synthesis
The Hierarchy of Constraints
Every production LLM system faces four competing constraints. The Playbook defines how to mitigate each:
| Constraint | Mitigation Strategy |
|---|---|
| Latency | Parallelization and caching |
| Accuracy | Structured intermediates and few-shot prompts |
| Cost | Batch APIs and context pruning |
| Compliance | Application-layer intercepts (NOT prompts) |
The critical insight: compliance is never solved by prompts. Even emphatic system prompts (“CRITICAL POLICY: NEVER process > $500”) still yield a 3% failure rate. Application-layer hooks that intercept tool calls are the only reliable enforcement mechanism.
Patterns and Anti-Patterns
Routing for Cost and SLA
Rule: Never default to real-time for asynchronous needs.
| Workload Type | Approach |
|---|---|
| Urgent exceptions | Real-time Messages API (high cost, instant latency) |
| Standard workflows | Message Batches API (50% cost savings) |
| Continuous arrival (30h SLA) | Submit batches every 6 hours containing documents from that window |
Designing Resilient Schemas
- Anti-pattern: Fragile enum expansion — continuously adding new enum values as edge cases arise. Eventually breaks validation for previously unseen inputs
- Pattern: Resilient catch-all — add
"other"to the enum paired with adetailstring field. Captures unexpected values without validation failure - Data Evolution Rule: For amended documents, redesign schemas so amended fields capture multiple values with source location and effective date, rather than overwriting originals
Enforcing Mathematical Consistency
18% of invoice extractions show line items that do not match the grand total due to OCR or extraction errors.
Pattern: Schema Redundancy — extract both calculated_total (model sums items) and stated_total (extracted directly from document). Flag for human review ONLY when calculated_total != stated_total.
Normalization and Null Handling
- Problem: When fields are nullable, models may invent plausible data (e.g.,
attendee_count: "500") if not explicitly instructed to return null - Pattern: Add explicit null instructions: “If attendee count is not mentioned in the text, return null”
- Problem: Inconsistent formats (“cotton blend” vs “Cotton/Polyester Mix”)
- Solution: Few-shot standardization — provide 2-3 complete input-output pairs showing standardized formats. Do not rely on temperature 0 alone
The Limits of Automated Retry
- Effective: Formatting errors (nested objects vs flat arrays, locale-formatted strings). Appending specific validation errors to the prompt resolves most failures in 2-3 attempts
- Ineffective: Missing information (e.g., source says “et al.” and points to an unprovided external document). No amount of retrying will produce information that does not exist in the source
- Rule: Recognize when to fail fast. Retries for missing information waste tokens
Calibrating Human-in-the-Loop
- Have the model output field-level confidence scores
- Automate extractions with confidence above 90%
- Critical validation step: Analyze accuracy by document type AND field to verify high-confidence extractions perform consistently across all segments, not just in aggregate
- Aggregate metrics mask field-level issues — a 95% overall accuracy can hide a 60% accuracy on a specific field type
Zero-Tolerance Compliance
- The trap: Relying on emphatic system prompts (“CRITICAL POLICY: NEVER process > $500”) still yields a ~3% failure rate
- The standard: Implement an application-layer hook to intercept tool calls. When the process amount exceeds the threshold, block it server-side and invoke escalation
- Model discretion is removed. The enforcement is deterministic
Resuming Asynchronous Sessions
- Problem: Resuming a session hours later leads to the model confidently stating outdated status from previous tool calls
- Pattern: Resume with full conversation history but programmatically filter out previous
tool_resultmessages. Keep only human/assistant turns so the agent is forced to re-fetch current data - This ensures returning customers always receive fresh, current information
Tool Context Pruning
- The bloat: Repeatedly calling
lookup_orderfills the context window with verbose shipping and payment data when only the return status is needed - Pattern: Application-side filtering — extract only relevant fields (items, purchase data, return window, status) from each API response before it enters the conversation
- This prevents context accumulation from verbose tool responses
Graceful Tool Failure
- Anti-pattern: Throwing application exceptions that crash the agent, or returning empty strings
- Correct pattern: Return the error message in the tool result content with the
isErrorflag set to true. IncludeerrorCategory(transient/validation/permission) andisRetryableboolean - Agent receives structured error information and can communicate appropriately to the user
The Escalation Handoff
Two escalation paths based on trigger type:
- “I want a human NOW” — immediate escalation. Honor it immediately. Do not ask for more clarification
- Complex policy issue — context gathering first. Ensure account context tools (get_customer) are called before escalating
- The payload: Do not dump raw transcripts. Pass a structured summary: Customer ID, Root Cause, Amount, Recommended Action
Compressing Long Sessions
- The challenge: A single session covers a refund inquiry, a subscription question, and a payment update across 48 turns. Context limits approach
- The strategy: Summarize earlier, resolved turns into a narrative description. Preserve full verbatim message history only for the active, unresolved issue
- This dramatically reduces context usage while maintaining accuracy for the current problem
MCP Tool Specificity
- The trap: Providing a broad custom tool (
analyze_dependencies) alongside built-in tools like Grep. The agent defaults to Grep - The fix: Split broad tools into highly granular, single-purpose tools (
list_imports,resolve_transitive_deps,detect_circular_deps). Enhance descriptions to explicitly detail capabilities and when to prefer them over text manipulation
Directed Codebase Exploration
- Anti-pattern: Reading all 15 files sequentially. Overloads context with unrelated data
- Pattern: Dynamic investigation — (1) Analyze imports and base interfaces, (2) trace specific implementations, (3) dynamically generate subtasks based on findings
- For new engineers on 800+ file codebases: read CLAUDE.md/README first, then ask the human for priority files
- For intermittent bugs: have the agent dynamically generate investigation subtasks, adapting as errors emerge
Branching Reality (fork_session)
- Problem: Exploring two distinct refactoring approaches in a single thread confuses the agent and mixes context
- Pattern: Use
fork_sessionto create two separate branches from a foundational analysis. Each branch explores independently without context contamination - This enables A/B comparison of approaches (e.g., microservice extraction vs in-place refactor)
The Scratchpad Pattern
- The decay: In extended sessions (30+ minutes), accumulated token bloat causes inconsistent answers about early discoveries
- Pattern: Agent maintains a
scratchpad.mdrecording key findings, architectural maps, and decisions. References this structured file for subsequent questions - Prevents context decay in long exploration sessions
Resumption in Dynamic Environments
- Scenario: Engineer resumes exploration, but 3 of 12 files have been altered by a teammate’s PR overnight
- Pattern: Resume from previous transcript, but explicitly inform the agent which specific files or functions changed for targeted re-analysis. Do not force a complete re-read
- Command:
resume_session --update_context={files:['File C', 'File D', 'File E'], changes:'renamed utility functions'}
Shared Memory Architecture
- Anti-pattern: Daisy-chaining conversation logs between subagents. Each agent sees all previous conversation, leading to context bloat
- Pattern: Shared vector store. Subagents index their findings; subsequent agents retrieve via semantic search. Only relevant findings are surfaced, not entire conversation histories
Forcing Execution Order
- Problem: Relying on prompt instructions (“call extract_metadata first”) — “prompt begging”
- Pattern: Use
tool_choiceforced selection for the first API call to guarantee pipeline order. Process subsequent steps in follow-up turns
Structured Intermediate Representations
- When subagents produce heterogeneous outputs (financial JSON, news prose, patent lists), a format conversion layer standardizes everything into a common format
- Common format:
{claim, evidence, source, confidence} - This enables the synthesis agent to process findings uniformly regardless of source type
Parallelization and Caching
- Parallel subagents for independent data retrieval: e.g., 12 legal precedents in ~30 seconds vs 3+ minutes sequential
- Apply prompt caching on the synthesis subagent for 80K+ token accumulated findings
- Combine parallelization (speed) with caching (cost reduction) for maximum efficiency
Goal-Oriented Delegation
- Anti-pattern: Procedural micromanagement — telling subagents exactly which steps to take
- Pattern: Goal-oriented — specify research goals + quality criteria, let the subagent determine its own strategy
- Goal-oriented delegation produces better results because the subagent can adapt to what it finds
The Architect’s Reference Matrix
Maps the recommended solution pattern to each domain and constraint combination. Green cells in the original Playbook indicate the primary solution for that intersection:
| Challenge | Data Extraction | Customer Support | Dev Productivity | Multi-Agent |
|---|---|---|---|---|
| Token Bloat | — | Filter Stale Results | Scratchpad File | Shared Vector Store |
| Latency | Batch Routing | — | — | Parallelization & Caching |
| Compliance/Control | — | App-Layer Intercepts | — | tool_choice Enforcement |
| Accuracy | Schema Redundancy | — | Granular MCP Tools | Structured Intermediates |
Use this matrix as a study aid: for each green cell, understand the pattern and why it is the primary solution for that domain-constraint pair. Empty cells indicate the constraint is not the primary concern for that domain.
Production Architecture Blueprint
The Playbook concludes with a layered production architecture showing how all patterns fit together:
- Layer 1: Ingestion & Routing — Pattern router classifies incoming work as real-time or batch. Intelligence is at the edges — the router applies strict typing in the middle
- Execution Layer — Granular tools (Tool A, B, C…) paired with application-layer intercepts (validation guardrails, policy enforcement, schema checks). The intercepts guard the core — every tool call passes through them
- State Management — Pruning logic and shared vector store sit beneath the execution layer, sustaining context window management across the lifecycle
- Synthesis — Result aggregation, formatting, and delivery. The final output layer
The key insight: application intercepts sit between the tools and the output, not as a separate layer but integrated into the execution path. This makes compliance enforcement unavoidable rather than optional.
Key Takeaways
- The Hierarchy of Constraints (latency, accuracy, cost, compliance) is the foundational framework — every architectural decision should be evaluated against it
- Compliance is NEVER enforced via prompts. Application-layer intercepts are the only reliable mechanism. Emphatic prompts still fail ~3% of the time
- Schema redundancy (calculated_total vs stated_total) is the standard pattern for catching extraction errors
- Retries are effective for formatting errors but useless for missing information — know when to fail fast
- Tool descriptions are more important than tool implementations for agent reliability — agents cannot use tools they cannot find
- The Scratchpad Pattern prevents context decay in long sessions and is a practical must-have for developer productivity agents
- Shared vector stores beat daisy-chained conversation logs for multi-agent memory
- Goal-oriented delegation outperforms procedural micromanagement for subagent prompts
- The Reference Matrix is an excellent exam study aid — it maps every solution pattern to its domain
Related
- CCA-F Official Exam Guide
- CCA-F Technical Reference
- CCA-F Practice Exam (60 Questions)
- CCA-F Study Guide
- CCA-F Practice Questions by Domain
- Anthropic Claude Cookbooks
- Claude Code Subagents
- Essential MCP Servers for 2026
- Skill Design Patterns
- Claude Agent Hierarchy
Try It
- Map each Playbook pattern to its exam task statement in the Official Exam Guide. For example, “Zero-Tolerance Compliance” maps to Task 1.5 (hooks for tool call interception) and Task 1.4 (enforcement patterns)
- Build a small customer support agent that implements at least 5 patterns from this Playbook: escalation handoff, tool context pruning, graceful tool failure, session compression, and compliance intercepts
- Draw the Production Architecture Blueprint from memory — this tests whether you understand the layered architecture
- For each anti-pattern, write down a scenario where the anti-pattern would actually be the right choice. If you cannot, that confirms the pattern is robust
- Create flashcards for the Reference Matrix — quiz yourself on which solution applies to which domain and challenge