Source: raw/The Architect’s Playbook.pdf

“Enterprise LLM Architecture: Design Patterns, Anti-Patterns, and System Workflows for Production Deployments.” A richly illustrated reference guide from Anthropic that organizes production patterns across four architectural domains: Structured Data Extraction, Customer Support Orchestration, Developer Productivity, and Multi-Agent Systems. This is the visual/conceptual companion to the Official Exam Guide — where the exam guide lists task statements, the Playbook shows you how each pattern works with diagrams.

The Four Domains of AI Architecture

The Playbook organizes patterns across four production domains, each with distinct constraints:

  • Structured Data Extraction — high volume, strict schemas, batch pipelines
  • Customer Support Orchestration — stateful, human-in-the-loop, policy constraints
  • Developer Productivity — dynamic tasks, iterative context, advanced tool use
  • Multi-Agent Systems — parallel processing, shared memory, cross-agent synthesis

The Hierarchy of Constraints

Every production LLM system faces four competing constraints. The Playbook defines how to mitigate each:

ConstraintMitigation Strategy
LatencyParallelization and caching
AccuracyStructured intermediates and few-shot prompts
CostBatch APIs and context pruning
ComplianceApplication-layer intercepts (NOT prompts)

The critical insight: compliance is never solved by prompts. Even emphatic system prompts (“CRITICAL POLICY: NEVER process > $500”) still yield a 3% failure rate. Application-layer hooks that intercept tool calls are the only reliable enforcement mechanism.

Patterns and Anti-Patterns

Routing for Cost and SLA

Rule: Never default to real-time for asynchronous needs.

Workload TypeApproach
Urgent exceptionsReal-time Messages API (high cost, instant latency)
Standard workflowsMessage Batches API (50% cost savings)
Continuous arrival (30h SLA)Submit batches every 6 hours containing documents from that window

Designing Resilient Schemas

  • Anti-pattern: Fragile enum expansion — continuously adding new enum values as edge cases arise. Eventually breaks validation for previously unseen inputs
  • Pattern: Resilient catch-all — add "other" to the enum paired with a detail string field. Captures unexpected values without validation failure
  • Data Evolution Rule: For amended documents, redesign schemas so amended fields capture multiple values with source location and effective date, rather than overwriting originals

Enforcing Mathematical Consistency

18% of invoice extractions show line items that do not match the grand total due to OCR or extraction errors.

Pattern: Schema Redundancy — extract both calculated_total (model sums items) and stated_total (extracted directly from document). Flag for human review ONLY when calculated_total != stated_total.

Normalization and Null Handling

  • Problem: When fields are nullable, models may invent plausible data (e.g., attendee_count: "500") if not explicitly instructed to return null
  • Pattern: Add explicit null instructions: “If attendee count is not mentioned in the text, return null”
  • Problem: Inconsistent formats (“cotton blend” vs “Cotton/Polyester Mix”)
  • Solution: Few-shot standardization — provide 2-3 complete input-output pairs showing standardized formats. Do not rely on temperature 0 alone

The Limits of Automated Retry

  • Effective: Formatting errors (nested objects vs flat arrays, locale-formatted strings). Appending specific validation errors to the prompt resolves most failures in 2-3 attempts
  • Ineffective: Missing information (e.g., source says “et al.” and points to an unprovided external document). No amount of retrying will produce information that does not exist in the source
  • Rule: Recognize when to fail fast. Retries for missing information waste tokens

Calibrating Human-in-the-Loop

  • Have the model output field-level confidence scores
  • Automate extractions with confidence above 90%
  • Critical validation step: Analyze accuracy by document type AND field to verify high-confidence extractions perform consistently across all segments, not just in aggregate
  • Aggregate metrics mask field-level issues — a 95% overall accuracy can hide a 60% accuracy on a specific field type

Zero-Tolerance Compliance

  • The trap: Relying on emphatic system prompts (“CRITICAL POLICY: NEVER process > $500”) still yields a ~3% failure rate
  • The standard: Implement an application-layer hook to intercept tool calls. When the process amount exceeds the threshold, block it server-side and invoke escalation
  • Model discretion is removed. The enforcement is deterministic

Resuming Asynchronous Sessions

  • Problem: Resuming a session hours later leads to the model confidently stating outdated status from previous tool calls
  • Pattern: Resume with full conversation history but programmatically filter out previous tool_result messages. Keep only human/assistant turns so the agent is forced to re-fetch current data
  • This ensures returning customers always receive fresh, current information

Tool Context Pruning

  • The bloat: Repeatedly calling lookup_order fills the context window with verbose shipping and payment data when only the return status is needed
  • Pattern: Application-side filtering — extract only relevant fields (items, purchase data, return window, status) from each API response before it enters the conversation
  • This prevents context accumulation from verbose tool responses

Graceful Tool Failure

  • Anti-pattern: Throwing application exceptions that crash the agent, or returning empty strings
  • Correct pattern: Return the error message in the tool result content with the isError flag set to true. Include errorCategory (transient/validation/permission) and isRetryable boolean
  • Agent receives structured error information and can communicate appropriately to the user

The Escalation Handoff

Two escalation paths based on trigger type:

  • “I want a human NOW” — immediate escalation. Honor it immediately. Do not ask for more clarification
  • Complex policy issue — context gathering first. Ensure account context tools (get_customer) are called before escalating
  • The payload: Do not dump raw transcripts. Pass a structured summary: Customer ID, Root Cause, Amount, Recommended Action

Compressing Long Sessions

  • The challenge: A single session covers a refund inquiry, a subscription question, and a payment update across 48 turns. Context limits approach
  • The strategy: Summarize earlier, resolved turns into a narrative description. Preserve full verbatim message history only for the active, unresolved issue
  • This dramatically reduces context usage while maintaining accuracy for the current problem

MCP Tool Specificity

  • The trap: Providing a broad custom tool (analyze_dependencies) alongside built-in tools like Grep. The agent defaults to Grep
  • The fix: Split broad tools into highly granular, single-purpose tools (list_imports, resolve_transitive_deps, detect_circular_deps). Enhance descriptions to explicitly detail capabilities and when to prefer them over text manipulation

Directed Codebase Exploration

  • Anti-pattern: Reading all 15 files sequentially. Overloads context with unrelated data
  • Pattern: Dynamic investigation — (1) Analyze imports and base interfaces, (2) trace specific implementations, (3) dynamically generate subtasks based on findings
  • For new engineers on 800+ file codebases: read CLAUDE.md/README first, then ask the human for priority files
  • For intermittent bugs: have the agent dynamically generate investigation subtasks, adapting as errors emerge

Branching Reality (fork_session)

  • Problem: Exploring two distinct refactoring approaches in a single thread confuses the agent and mixes context
  • Pattern: Use fork_session to create two separate branches from a foundational analysis. Each branch explores independently without context contamination
  • This enables A/B comparison of approaches (e.g., microservice extraction vs in-place refactor)

The Scratchpad Pattern

  • The decay: In extended sessions (30+ minutes), accumulated token bloat causes inconsistent answers about early discoveries
  • Pattern: Agent maintains a scratchpad.md recording key findings, architectural maps, and decisions. References this structured file for subsequent questions
  • Prevents context decay in long exploration sessions

Resumption in Dynamic Environments

  • Scenario: Engineer resumes exploration, but 3 of 12 files have been altered by a teammate’s PR overnight
  • Pattern: Resume from previous transcript, but explicitly inform the agent which specific files or functions changed for targeted re-analysis. Do not force a complete re-read
  • Command: resume_session --update_context={files:['File C', 'File D', 'File E'], changes:'renamed utility functions'}

Shared Memory Architecture

  • Anti-pattern: Daisy-chaining conversation logs between subagents. Each agent sees all previous conversation, leading to context bloat
  • Pattern: Shared vector store. Subagents index their findings; subsequent agents retrieve via semantic search. Only relevant findings are surfaced, not entire conversation histories

Forcing Execution Order

  • Problem: Relying on prompt instructions (“call extract_metadata first”) — “prompt begging”
  • Pattern: Use tool_choice forced selection for the first API call to guarantee pipeline order. Process subsequent steps in follow-up turns

Structured Intermediate Representations

  • When subagents produce heterogeneous outputs (financial JSON, news prose, patent lists), a format conversion layer standardizes everything into a common format
  • Common format: {claim, evidence, source, confidence}
  • This enables the synthesis agent to process findings uniformly regardless of source type

Parallelization and Caching

  • Parallel subagents for independent data retrieval: e.g., 12 legal precedents in ~30 seconds vs 3+ minutes sequential
  • Apply prompt caching on the synthesis subagent for 80K+ token accumulated findings
  • Combine parallelization (speed) with caching (cost reduction) for maximum efficiency

Goal-Oriented Delegation

  • Anti-pattern: Procedural micromanagement — telling subagents exactly which steps to take
  • Pattern: Goal-oriented — specify research goals + quality criteria, let the subagent determine its own strategy
  • Goal-oriented delegation produces better results because the subagent can adapt to what it finds

The Architect’s Reference Matrix

Maps the recommended solution pattern to each domain and constraint combination. Green cells in the original Playbook indicate the primary solution for that intersection:

ChallengeData ExtractionCustomer SupportDev ProductivityMulti-Agent
Token BloatFilter Stale ResultsScratchpad FileShared Vector Store
LatencyBatch RoutingParallelization & Caching
Compliance/ControlApp-Layer Interceptstool_choice Enforcement
AccuracySchema RedundancyGranular MCP ToolsStructured Intermediates

Use this matrix as a study aid: for each green cell, understand the pattern and why it is the primary solution for that domain-constraint pair. Empty cells indicate the constraint is not the primary concern for that domain.

Production Architecture Blueprint

The Playbook concludes with a layered production architecture showing how all patterns fit together:

  1. Layer 1: Ingestion & Routing — Pattern router classifies incoming work as real-time or batch. Intelligence is at the edges — the router applies strict typing in the middle
  2. Execution Layer — Granular tools (Tool A, B, C…) paired with application-layer intercepts (validation guardrails, policy enforcement, schema checks). The intercepts guard the core — every tool call passes through them
  3. State Management — Pruning logic and shared vector store sit beneath the execution layer, sustaining context window management across the lifecycle
  4. Synthesis — Result aggregation, formatting, and delivery. The final output layer

The key insight: application intercepts sit between the tools and the output, not as a separate layer but integrated into the execution path. This makes compliance enforcement unavoidable rather than optional.

Key Takeaways

  • The Hierarchy of Constraints (latency, accuracy, cost, compliance) is the foundational framework — every architectural decision should be evaluated against it
  • Compliance is NEVER enforced via prompts. Application-layer intercepts are the only reliable mechanism. Emphatic prompts still fail ~3% of the time
  • Schema redundancy (calculated_total vs stated_total) is the standard pattern for catching extraction errors
  • Retries are effective for formatting errors but useless for missing information — know when to fail fast
  • Tool descriptions are more important than tool implementations for agent reliability — agents cannot use tools they cannot find
  • The Scratchpad Pattern prevents context decay in long sessions and is a practical must-have for developer productivity agents
  • Shared vector stores beat daisy-chained conversation logs for multi-agent memory
  • Goal-oriented delegation outperforms procedural micromanagement for subagent prompts
  • The Reference Matrix is an excellent exam study aid — it maps every solution pattern to its domain

Try It

  1. Map each Playbook pattern to its exam task statement in the Official Exam Guide. For example, “Zero-Tolerance Compliance” maps to Task 1.5 (hooks for tool call interception) and Task 1.4 (enforcement patterns)
  2. Build a small customer support agent that implements at least 5 patterns from this Playbook: escalation handoff, tool context pruning, graceful tool failure, session compression, and compliance intercepts
  3. Draw the Production Architecture Blueprint from memory — this tests whether you understand the layered architecture
  4. For each anti-pattern, write down a scenario where the anti-pattern would actually be the right choice. If you cannot, that confirms the pattern is robust
  5. Create flashcards for the Reference Matrix — quiz yourself on which solution applies to which domain and challenge