Source: Claude Docs Extended Thinking 2026 04 17 (Anthropic platform docs — https://platform.claude.com/docs/en/build-with-claude/extended-thinking.md)
Anthropic’s authoritative reference for the thinking API parameter. Covers the per-model compatibility matrix (crucial — Opus 4.7 rejects manual thinking budgets with a 400), how thinking blocks are returned, summarized vs omitted display modes, streaming behavior, interactions with tool use, and how thinking changes prompt-cache invalidation.
Key Takeaways
- Opus 4.7+ rejects manual thinking budgets.
thinking: {type: "enabled", budget_tokens: N}returns a 400 error on Opus 4.7 and later. Usethinking: {type: "adaptive"}with the effort parameter instead. Any harness passingbudget_tokensto Opus 4.7 will break. - Per-model thinking support matrix:
- Opus 4.7+ — adaptive only
- Mythos Preview — adaptive default; manual
type: "enabled"also accepted;displaydefaults to"omitted" - Opus 4.6 — adaptive recommended; manual mode deprecated but functional
- Sonnet 4.6 — adaptive recommended; manual + interleaved deprecated but functional
- Sonnet 3.7 and Claude 4 — manual extended thinking still supported
- Response contains thinking then text blocks.
contentarray:{type: "thinking", thinking: "...", signature: "..."}followed by{type: "text", text: "..."}. Thesignatureis encrypted thinking retained even when display is omitted — required for multi-turn continuity. - Summarized thinking is the default on Claude 4. You’re billed for full thinking tokens but see a summary. Billed output token count will NOT match visible response tokens. Claude Mythos Preview summarizes from the first token (no verbose preamble).
display: "omitted"returns emptythinkingfield but keepssignature. Default on Opus 4.7/Mythos. Faster time-to-first-text-token during streaming; still billed for full thinking tokens (cuts latency, not cost). Not in any SDK type definitions yet — Python forwards unrecognized dict keys; TypeScript needs type assertion; other SDKs need direct HTTP.- You cannot toggle thinking mid-turn. Entire assistant turn (including all tool-use loops) must operate in a single thinking mode. Attempting to toggle mid-turn is silently degraded — not erroring, just disabling thinking.
- Tool-use constraints: only
tool_choice: {"type": "auto"}(default) ortool_choice: {"type": "none"}.anyor named tool choice causes errors. When passing tool results back, thinking blocks MUST be passed unmodified to preserve reasoning continuity. - Interleaved thinking: enables reasoning between tool calls. Automatic on Mythos and Opus 4.7 (with adaptive). On Opus 4.6 / Sonnet 4.6: automatic with adaptive (beta header deprecated). Other Claude 4 models: add
interleaved-thinking-2025-05-14beta header. budget_tokenscan exceedmax_tokenswhen using interleaved thinking — represents total thinking across all blocks in the assistant turn.- Prompt-cache invalidation: changing thinking parameters (enabled/disabled or budget) invalidates message cache breakpoints. System prompts and tools remain cached. Interleaved thinking amplifies invalidation.
- Output limits: Mythos / Opus 4.7 / Opus 4.6 → 128k output tokens. Sonnet 4.6 / Haiku 4.5 → 64k. Batch API with
output-300k-2026-03-24beta header → 300k for Opus 4.7/4.6 and Sonnet 4.6. - ZDR eligible. Extended Thinking is covered by Zero Data Retention arrangements (unlike Agent Skills, which is not ZDR eligible).
Minimal usage
import anthropic
client = anthropic.Anthropic()
# Claude 4.x (with manual budget, deprecated but functional on 4.6/Sonnet 4.6)
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000},
messages=[{"role": "user", "content": "Prove there are infinitely many primes n mod 4 == 3."}],
)
# Opus 4.7 (adaptive ONLY — manual rejected)
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=64000,
thinking={"type": "adaptive"},
output_config={"effort": "high"}, # or xhigh, max, medium, low
messages=[{"role": "user", "content": "..."}],
)Display modes at a glance
| Mode | Behavior | Default for | Use when |
|---|---|---|---|
"summarized" | Thinking blocks contain summarized thinking text | Claude 4 (default) | You want to show thinking to users |
"omitted" | Empty thinking field, signature preserved | Opus 4.7, Mythos Preview | You don’t need visible thinking; optimize streaming latency |
Tool-use checklist
When extended thinking is on and the turn includes tools:
tool_choicemust beautoornone— notany, not a named tool- Pass thinking blocks back to the API unmodified in the continuation request
- Do not toggle thinking mid-turn; plan the strategy at turn start
- On 4.6 / Sonnet 4.6 with adaptive, interleaved thinking is automatic — no beta header needed
Prompt-cache interaction rules
- Same thinking params → same cache behavior
- Different
thinking.type,budget_tokens, ordisplay→ message cache breakpoints invalidated - Thinking blocks from previous turns get stripped from context on non-tool-result user turns (Opus 4.5+ keeps blocks by default)
- System prompts and tools keep their cache regardless of thinking changes
Opus 4.7 migration checklist
- Grep harnesses and SDK wrappers for
budget_tokens— remove when calling Opus 4.7 - Replace with
thinking={"type": "adaptive"}andoutput_config={"effort": "high"}(or tune effort per Opus 4.7 best practices) - If streaming UI relied on
thinking_deltaevents, verify behavior withdisplaydefault (omitted on Opus 4.7 → nothinking_delta) - Re-measure cache hit rates — thinking-parameter changes will invalidate message breakpoints
- For long tool-use loops, confirm thinking blocks are being round-tripped unmodified in your continuation calls
Related
- Cross-Topic Connections — cross-topic synthesis: Adaptive Thinking + Effort + Advisor as three composable levers
- Opus 4.7 Best Practices for Claude Code — effort-level guidance and adaptive thinking context
- Claude Prompting Best Practices — the general guide; this article is the thinking-specific deep-dive
- The Advisor Strategy (advisor_20260301) — another mechanism for on-demand deep reasoning; advisor consults don’t use extended thinking config directly
- Claude Managed Agents — hosted runtime where thinking configuration matters for cost/latency budgeting
- CCA-F Technical Reference — covers extended thinking patterns in the certification context
Open Questions
- Mythos Preview migration path. Is Mythos destined to replace Opus 4.7, or is it a parallel research branch? Docs don’t clarify.
- Claude Sonnet 3.7 full thinking access. “For full thinking on Claude 4 models, contact sales.” — is there a public path or Enterprise-only?
displaySDK support timeline. No SDK currently types it. When does native support land in Python/TypeScript/Go/Java/Ruby?budget_tokens>max_tokenswith interleaved thinking. Is there any hard cap, or is it bounded by the model’s context window?- Batch API + Extended Thinking. The 300k output beta is noted but interaction with thinking tokens inside batch isn’t spelled out.
Try It
- Port one extended-thinking workflow to adaptive. Pick the simplest one in your codebase, remove
budget_tokens, addthinking={"type": "adaptive"}and an expliciteffort. Measure tokens and wall-clock. - A/B the display modes. For a streaming UI, switch between
"summarized"and"omitted"and measure time-to-first-text-token. Most user-facing chat should land on"omitted". - Harden your tool-use round-trip. Add a test that asserts thinking blocks are passed back unmodified. This prevents the subtle bug where a serializer strips them and Claude loses reasoning continuity.
- Cache-hit measurement. Toggle
budget_tokensbetween two calls with an otherwise identical prompt — confirm a cache miss on the messages breakpoint. Do this before rolling out any thinking-config change that touches production. - Read Opus 4.7 best practices for the effort-level guidance that replaces
budget_tokenson Opus 4.7.