Source: Claude Docs Extended Thinking 2026 04 17 (Anthropic platform docs — https://platform.claude.com/docs/en/build-with-claude/extended-thinking.md)

Anthropic’s authoritative reference for the thinking API parameter. Covers the per-model compatibility matrix (crucial — Opus 4.7 rejects manual thinking budgets with a 400), how thinking blocks are returned, summarized vs omitted display modes, streaming behavior, interactions with tool use, and how thinking changes prompt-cache invalidation.

Key Takeaways

  • Opus 4.7+ rejects manual thinking budgets. thinking: {type: "enabled", budget_tokens: N} returns a 400 error on Opus 4.7 and later. Use thinking: {type: "adaptive"} with the effort parameter instead. Any harness passing budget_tokens to Opus 4.7 will break.
  • Per-model thinking support matrix:
    • Opus 4.7+ — adaptive only
    • Mythos Preview — adaptive default; manual type: "enabled" also accepted; display defaults to "omitted"
    • Opus 4.6 — adaptive recommended; manual mode deprecated but functional
    • Sonnet 4.6 — adaptive recommended; manual + interleaved deprecated but functional
    • Sonnet 3.7 and Claude 4 — manual extended thinking still supported
  • Response contains thinking then text blocks. content array: {type: "thinking", thinking: "...", signature: "..."} followed by {type: "text", text: "..."}. The signature is encrypted thinking retained even when display is omitted — required for multi-turn continuity.
  • Summarized thinking is the default on Claude 4. You’re billed for full thinking tokens but see a summary. Billed output token count will NOT match visible response tokens. Claude Mythos Preview summarizes from the first token (no verbose preamble).
  • display: "omitted" returns empty thinking field but keeps signature. Default on Opus 4.7/Mythos. Faster time-to-first-text-token during streaming; still billed for full thinking tokens (cuts latency, not cost). Not in any SDK type definitions yet — Python forwards unrecognized dict keys; TypeScript needs type assertion; other SDKs need direct HTTP.
  • You cannot toggle thinking mid-turn. Entire assistant turn (including all tool-use loops) must operate in a single thinking mode. Attempting to toggle mid-turn is silently degraded — not erroring, just disabling thinking.
  • Tool-use constraints: only tool_choice: {"type": "auto"} (default) or tool_choice: {"type": "none"}. any or named tool choice causes errors. When passing tool results back, thinking blocks MUST be passed unmodified to preserve reasoning continuity.
  • Interleaved thinking: enables reasoning between tool calls. Automatic on Mythos and Opus 4.7 (with adaptive). On Opus 4.6 / Sonnet 4.6: automatic with adaptive (beta header deprecated). Other Claude 4 models: add interleaved-thinking-2025-05-14 beta header.
  • budget_tokens can exceed max_tokens when using interleaved thinking — represents total thinking across all blocks in the assistant turn.
  • Prompt-cache invalidation: changing thinking parameters (enabled/disabled or budget) invalidates message cache breakpoints. System prompts and tools remain cached. Interleaved thinking amplifies invalidation.
  • Output limits: Mythos / Opus 4.7 / Opus 4.6 → 128k output tokens. Sonnet 4.6 / Haiku 4.5 → 64k. Batch API with output-300k-2026-03-24 beta header → 300k for Opus 4.7/4.6 and Sonnet 4.6.
  • ZDR eligible. Extended Thinking is covered by Zero Data Retention arrangements (unlike Agent Skills, which is not ZDR eligible).

Minimal usage

import anthropic
client = anthropic.Anthropic()
 
# Claude 4.x (with manual budget, deprecated but functional on 4.6/Sonnet 4.6)
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 10000},
    messages=[{"role": "user", "content": "Prove there are infinitely many primes n mod 4 == 3."}],
)
 
# Opus 4.7 (adaptive ONLY — manual rejected)
response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=64000,
    thinking={"type": "adaptive"},
    output_config={"effort": "high"},  # or xhigh, max, medium, low
    messages=[{"role": "user", "content": "..."}],
)

Display modes at a glance

ModeBehaviorDefault forUse when
"summarized"Thinking blocks contain summarized thinking textClaude 4 (default)You want to show thinking to users
"omitted"Empty thinking field, signature preservedOpus 4.7, Mythos PreviewYou don’t need visible thinking; optimize streaming latency

Tool-use checklist

When extended thinking is on and the turn includes tools:

  1. tool_choice must be auto or none — not any, not a named tool
  2. Pass thinking blocks back to the API unmodified in the continuation request
  3. Do not toggle thinking mid-turn; plan the strategy at turn start
  4. On 4.6 / Sonnet 4.6 with adaptive, interleaved thinking is automatic — no beta header needed

Prompt-cache interaction rules

  • Same thinking params → same cache behavior
  • Different thinking.type, budget_tokens, or display → message cache breakpoints invalidated
  • Thinking blocks from previous turns get stripped from context on non-tool-result user turns (Opus 4.5+ keeps blocks by default)
  • System prompts and tools keep their cache regardless of thinking changes

Opus 4.7 migration checklist

  1. Grep harnesses and SDK wrappers for budget_tokens — remove when calling Opus 4.7
  2. Replace with thinking={"type": "adaptive"} and output_config={"effort": "high"} (or tune effort per Opus 4.7 best practices)
  3. If streaming UI relied on thinking_delta events, verify behavior with display default (omitted on Opus 4.7 → no thinking_delta)
  4. Re-measure cache hit rates — thinking-parameter changes will invalidate message breakpoints
  5. For long tool-use loops, confirm thinking blocks are being round-tripped unmodified in your continuation calls

Open Questions

  • Mythos Preview migration path. Is Mythos destined to replace Opus 4.7, or is it a parallel research branch? Docs don’t clarify.
  • Claude Sonnet 3.7 full thinking access. “For full thinking on Claude 4 models, contact sales.” — is there a public path or Enterprise-only?
  • display SDK support timeline. No SDK currently types it. When does native support land in Python/TypeScript/Go/Java/Ruby?
  • budget_tokens > max_tokens with interleaved thinking. Is there any hard cap, or is it bounded by the model’s context window?
  • Batch API + Extended Thinking. The 300k output beta is noted but interaction with thinking tokens inside batch isn’t spelled out.

Try It

  1. Port one extended-thinking workflow to adaptive. Pick the simplest one in your codebase, remove budget_tokens, add thinking={"type": "adaptive"} and an explicit effort. Measure tokens and wall-clock.
  2. A/B the display modes. For a streaming UI, switch between "summarized" and "omitted" and measure time-to-first-text-token. Most user-facing chat should land on "omitted".
  3. Harden your tool-use round-trip. Add a test that asserts thinking blocks are passed back unmodified. This prevents the subtle bug where a serializer strips them and Claude loses reasoning continuity.
  4. Cache-hit measurement. Toggle budget_tokens between two calls with an otherwise identical prompt — confirm a cache miss on the messages breakpoint. Do this before rolling out any thinking-config change that touches production.
  5. Read Opus 4.7 best practices for the effort-level guidance that replaces budget_tokens on Opus 4.7.