Anthropic Adaptive Thinking
Adaptive thinking is Anthropic's recommended reasoning mode for Claude 4 models — a shift from manually budgeting thinking tokens to letting Claude dynamically allocate reasoning based on request complexity. It replaces thinking.type: "enabled" + budget_tokens with thinking.type: "adaptive" + the effort parameter.
The Mode Landscape
| Mode | Config | Model support | When to use |
|---|---|---|---|
| Adaptive | thinking: {type: "adaptive"} |
Mythos Preview (default), Opus 4.7 (only), Opus 4.6, Sonnet 4.6 | Default choice for Claude 4 models |
| Manual | thinking: {type: "enabled", budget_tokens: N} |
All models except Opus 4.7 (rejected). Deprecated on Opus 4.6, Sonnet 4.6 | Precise cost/latency control on older models |
| Disabled | Omit thinking or {type: "disabled"} |
All except Claude Mythos Preview | Lowest latency, no reasoning trace |
Manual mode on Claude Opus 4.7 returns a 400 error — it has been removed, not deprecated. On Opus 4.6 and Sonnet 4.6, manual mode still works but is deprecated and will be removed in a future release.
The Effort Parameter
effort is the replacement for budget_tokens as a thinking-depth control. It passes as output_config: {effort: "<level>"} alongside thinking: {type: "adaptive"}.
| Level | Thinking behavior | Availability |
|---|---|---|
max |
Always thinks, no depth constraint | Mythos Preview, Opus 4.7, Opus 4.6, Sonnet 4.6 |
xhigh |
Extended, for long-horizon work | Opus 4.7 only |
high |
Always thinks deeply (API default) | All supported models |
medium |
Moderate thinking; may skip simple queries | All supported models |
low |
Minimizes thinking; fastest, lowest cost | All supported models |
effort affects all token spend — text, tool calls, and thinking — not just thinking depth. At low, Claude also makes fewer tool calls and proceeds more directly to action.
Opus 4.7 guidance: Start at xhigh for coding and agentic work. Use high as the minimum for intelligence-sensitive tasks. Step to medium for cost-sensitive workflows. Reserve max for genuinely frontier problems.
Sonnet 4.6 guidance: Default to medium for most applications; use high only when maximum intelligence is required from Sonnet; use low for high-volume or latency-sensitive workloads.
Interleaved Thinking
In adaptive mode, interleaved thinking is automatically enabled — Claude reasons between tool calls, not just before the first one. This is the key capability advantage over manual mode for agentic workflows:
- On Opus 4.7 and Mythos Preview, inter-tool reasoning always moves into thinking blocks.
- On Opus 4.6 and Sonnet 4.6, this is automatic with adaptive mode (the deprecated
interleaved-thinking-2025-05-14beta header is no longer needed or supported on Opus 4.6).
Thinking Display Modes
The display field controls what appears in API responses — it does not affect cost.
"summarized"(default on Opus 4.6, Sonnet 4.6): Thinking blocks contain a summary of Claude's reasoning. You're charged for full thinking tokens; only the summary is visible."omitted"(default on Opus 4.7, Mythos Preview): Thinking blocks have an emptythinkingfield. Reduces time-to-first-text-token when streaming because thinking tokens are not streamed. Same cost as summarized.
The signature field is always present and identical regardless of display. It carries encrypted full thinking content for multi-turn continuity.
Thinking Encryption and Round-Tripping
Every thinking block carries an opaque signature field. The API uses it to verify that thinking blocks are authentic when they are passed back in multi-turn conversations. Rules:
- Thinking blocks must be passed back unchanged when using tools with thinking (reasoning continuity).
- Passing thinking blocks back in non-tool turns is optional on Opus 4.6+ and Sonnet 4.6+, which preserve them by default.
- Never modify the
thinkingfield or thesignature— any modification invalidates the block. - Signatures are cross-platform (Claude API, Bedrock, Vertex AI).
Prompt Caching Interaction
- Within a mode: Consecutive adaptive-mode requests preserve cache breakpoints.
- Across modes: Switching between
adaptive,enabled, anddisabledbreaks message cache breakpoints. - System prompts and tool definitions remain cached regardless of mode changes.
- For sessions involving extended thinking (which can run > 5 minutes), use the 1-hour cache TTL — see anthropic-prompt-caching.
Cost Model
You are charged for the full thinking tokens generated, not the summary tokens visible in the response. At display: "omitted", billed token count will not match visible token count. Use max_tokens as a hard ceiling on total output (thinking + text). At high or max effort, monitor for stop_reason: "max_tokens" and increase budget accordingly.
Tuning Thinking Behavior via Prompt
Adaptive thinking's triggering behavior is promptable. If Claude thinks more often than needed:
Extended thinking adds latency and should only be used when it
will meaningfully improve answer quality. When in doubt, respond directly.
Measure actual impact on your workload before deploying prompt-based tuning. Consider testing lower effort levels first.