Extended Thinking (API Reference)

Source: Claude Docs Extended Thinking 2026 04 17 (Anthropic platform docs — https://platform.claude.com/docs/en/build-with-claude/extended-thinking.md)

Anthropic’s authoritative reference for the thinking API parameter. Covers the per-model compatibility matrix (crucial — Opus 4.7 rejects manual thinking budgets with a 400), how thinking blocks are returned, summarized vs omitted display modes, streaming behavior, interactions with tool use, and how thinking changes prompt-cache invalidation.

Key Takeaways

Opus 4.7+ rejects manual thinking budgets. thinking: {type: "enabled", budget_tokens: N} returns a 400 error on Opus 4.7 and later. Use thinking: {type: "adaptive"} with the effort parameter instead. Any harness passing budget_tokens to Opus 4.7 will break.
Per-model thinking support matrix:
- Opus 4.7+ — adaptive only
- Mythos Preview — adaptive default; manual type: "enabled" also accepted; display defaults to "omitted"
- Opus 4.6 — adaptive recommended; manual mode deprecated but functional
- Sonnet 4.6 — adaptive recommended; manual + interleaved deprecated but functional
- Sonnet 3.7 and Claude 4 — manual extended thinking still supported
Response contains thinking then text blocks. content array: {type: "thinking", thinking: "...", signature: "..."} followed by {type: "text", text: "..."}. The signature is encrypted thinking retained even when display is omitted — required for multi-turn continuity.
Summarized thinking is the default on Claude 4. You’re billed for full thinking tokens but see a summary. Billed output token count will NOT match visible response tokens. Claude Mythos Preview summarizes from the first token (no verbose preamble).
display: "omitted" returns empty thinking field but keeps signature. Default on Opus 4.7/Mythos. Faster time-to-first-text-token during streaming; still billed for full thinking tokens (cuts latency, not cost). Not in any SDK type definitions yet — Python forwards unrecognized dict keys; TypeScript needs type assertion; other SDKs need direct HTTP.
You cannot toggle thinking mid-turn. Entire assistant turn (including all tool-use loops) must operate in a single thinking mode. Attempting to toggle mid-turn is silently degraded — not erroring, just disabling thinking.
Tool-use constraints: only tool_choice: {"type": "auto"} (default) or tool_choice: {"type": "none"}. any or named tool choice causes errors. When passing tool results back, thinking blocks MUST be passed unmodified to preserve reasoning continuity.
Interleaved thinking: enables reasoning between tool calls. Automatic on Mythos and Opus 4.7 (with adaptive). On Opus 4.6 / Sonnet 4.6: automatic with adaptive (beta header deprecated). Other Claude 4 models: add interleaved-thinking-2025-05-14 beta header.
budget_tokens can exceed max_tokens when using interleaved thinking — represents total thinking across all blocks in the assistant turn.
Prompt-cache invalidation: changing thinking parameters (enabled/disabled or budget) invalidates message cache breakpoints. System prompts and tools remain cached. Interleaved thinking amplifies invalidation.
Output limits: Mythos / Opus 4.7 / Opus 4.6 → 128k output tokens. Sonnet 4.6 / Haiku 4.5 → 64k. Batch API with output-300k-2026-03-24 beta header → 300k for Opus 4.7/4.6 and Sonnet 4.6.
ZDR eligible. Extended Thinking is covered by Zero Data Retention arrangements (unlike Agent Skills, which is not ZDR eligible).

Minimal usage

import anthropic
client = anthropic.Anthropic()
 
# Claude 4.x (with manual budget, deprecated but functional on 4.6/Sonnet 4.6)
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 10000},
    messages=[{"role": "user", "content": "Prove there are infinitely many primes n mod 4 == 3."}],
)
 
# Opus 4.7 (adaptive ONLY — manual rejected)
response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=64000,
    thinking={"type": "adaptive"},
    output_config={"effort": "high"},  # or xhigh, max, medium, low
    messages=[{"role": "user", "content": "..."}],
)

Display modes at a glance

Mode	Behavior	Default for	Use when
`"summarized"`	Thinking blocks contain summarized thinking text	Claude 4 (default)	You want to show thinking to users
`"omitted"`	Empty thinking field, signature preserved	Opus 4.7, Mythos Preview	You don’t need visible thinking; optimize streaming latency

Tool-use checklist

When extended thinking is on and the turn includes tools:

tool_choice must be auto or none — not any, not a named tool
Pass thinking blocks back to the API unmodified in the continuation request
Do not toggle thinking mid-turn; plan the strategy at turn start
On 4.6 / Sonnet 4.6 with adaptive, interleaved thinking is automatic — no beta header needed

Prompt-cache interaction rules

Same thinking params → same cache behavior
Different thinking.type, budget_tokens, or display → message cache breakpoints invalidated
Thinking blocks from previous turns get stripped from context on non-tool-result user turns (Opus 4.5+ keeps blocks by default)
System prompts and tools keep their cache regardless of thinking changes

Opus 4.7 migration checklist

Grep harnesses and SDK wrappers for budget_tokens — remove when calling Opus 4.7
Replace with thinking={"type": "adaptive"} and output_config={"effort": "high"} (or tune effort per Opus 4.7 best practices)
If streaming UI relied on thinking_delta events, verify behavior with display default (omitted on Opus 4.7 → no thinking_delta)
Re-measure cache hit rates — thinking-parameter changes will invalidate message breakpoints
For long tool-use loops, confirm thinking blocks are being round-tripped unmodified in your continuation calls

Cross-Topic Connections — cross-topic synthesis: Adaptive Thinking + Effort + Advisor as three composable levers
Opus 4.7 Best Practices for Claude Code — effort-level guidance and adaptive thinking context
Claude Prompting Best Practices — the general guide; this article is the thinking-specific deep-dive
The Advisor Strategy (advisor_20260301) — another mechanism for on-demand deep reasoning; advisor consults don’t use extended thinking config directly
Claude Managed Agents — hosted runtime where thinking configuration matters for cost/latency budgeting
CCA-F Technical Reference — covers extended thinking patterns in the certification context

Open Questions

Mythos Preview migration path. Is Mythos destined to replace Opus 4.7, or is it a parallel research branch? Docs don’t clarify.
Claude Sonnet 3.7 full thinking access. “For full thinking on Claude 4 models, contact sales.” — is there a public path or Enterprise-only?
display SDK support timeline. No SDK currently types it. When does native support land in Python/TypeScript/Go/Java/Ruby?
budget_tokens > max_tokens with interleaved thinking. Is there any hard cap, or is it bounded by the model’s context window?
Batch API + Extended Thinking. The 300k output beta is noted but interaction with thinking tokens inside batch isn’t spelled out.

Try It

Port one extended-thinking workflow to adaptive. Pick the simplest one in your codebase, remove budget_tokens, add thinking={"type": "adaptive"} and an explicit effort. Measure tokens and wall-clock.
A/B the display modes. For a streaming UI, switch between "summarized" and "omitted" and measure time-to-first-text-token. Most user-facing chat should land on "omitted".
Harden your tool-use round-trip. Add a test that asserts thinking blocks are passed back unmodified. This prevents the subtle bug where a serializer strips them and Claude loses reasoning continuity.
Cache-hit measurement. Toggle budget_tokens between two calls with an otherwise identical prompt — confirm a cache miss on the messages breakpoint. Do this before rolling out any thinking-config change that touches production.
Read Opus 4.7 best practices for the effort-level guidance that replaces budget_tokens on Opus 4.7.

Jonathon's AI Wiki

Explorer

Extended Thinking (API Reference)

Key Takeaways

Minimal usage

Display modes at a glance

Tool-use checklist

Prompt-cache interaction rules

Opus 4.7 migration checklist

Open Questions

Try It

Graph View

Table of Contents

Backlinks

Jonathon's AI Wiki

Explorer

Extended Thinking (API Reference)

Key Takeaways

Minimal usage

Display modes at a glance

Tool-use checklist

Prompt-cache interaction rules

Opus 4.7 migration checklist

Related

Open Questions

Try It

Graph View

Table of Contents

Backlinks