Anthropic Streaming Patterns
Anthropic streaming is an SSE-first response mode for the Messages API. The client should parse ordered events and content blocks rather than assume a single completed JSON payload.
Baseline event flow
message_startcontent_block_start- one or more
content_block_delta content_block_stopmessage_deltamessage_stop
ping events may appear between content events and should be treated as keep-alives, not output.
Tool-streaming behavior
- Tool-use streams can interleave normal text output with tool blocks.
- Tool arguments may arrive incrementally via
input_json_delta. - A streaming client may need to buffer partial JSON until the corresponding content block stops before attempting execution.
Error-handling implications
- An Anthropic stream can return HTTP
200and still fail later in the SSE stream. - Completion should only be treated as successful after the stream reaches a valid terminal state.
- Long-running requests are better candidates for streaming than non-streaming synchronous waits, especially when networks may drop idle connections.
Thinking Block Streaming
When extended thinking or adaptive thinking is enabled, thinking blocks appear before text blocks. Additional delta types:
thinking_delta— incremental thinking content (analogous totext_delta)signature_delta— arrives just beforecontent_block_stopfor a thinking block; carries encrypted thinking for multi-turn continuity
Standard streaming sequence with thinking:
content_block_start(type:thinking)- one or more
content_block_deltawiththinking_delta - one
content_block_deltawithsignature_delta content_block_stopcontent_block_start(type:text)- one or more
content_block_deltawithtext_delta content_block_stopmessage_delta,message_stop
Omitted thinking display
With thinking.display: "omitted", no thinking_delta events are emitted. The thinking block still opens and closes, but carries only a signature_delta:
content_block_start (type: thinking, thinking: "")
content_block_delta (signature_delta: "EosnCkY...")
content_block_stop
content_block_start (type: text)
This reduces time-to-first-text-token at no cost reduction — the full thinking process runs and is billed; only streaming is suppressed. Use omitted in pipelines that do not surface thinking content to users.
Redacted thinking blocks
Rarely, the API may return redacted_thinking blocks instead of thinking blocks. These have an opaque data field (not a thinking field). If filtering content blocks when round-tripping tool use responses, also pass through redacted_thinking blocks — filtering on block.type == "thinking" alone silently drops them.
Design Guidance
- Keep streaming parsers block-aware rather than token-only.
- Separate transport events from semantic state transitions in client code.
- If the same client also supports tool use, it should treat streamed tool arguments and final
stop_reasonhandling as part of a single state machine. - When thinking is enabled, handle
thinking_deltaandsignature_deltaevent types in addition totext_delta. Collect the full signature before thecontent_block_stopevent. - At
max_tokens> 21,333, the SDK requires streaming to avoid HTTP timeouts on thinking-heavy requests. Use.stream()with.get_final_message()to get a completeMessagewithout handling intermediate events.