Literature: Anthropic Messages API
This literature note captures the first bounded Anthropic documentation batch for the direct Claude API. The corpus is intentionally narrow: authentication and request model, message structure, streaming, tool use, error handling, rate limits, and prompt caching.
Core request model
- The direct Claude API is a REST interface at
https://api.anthropic.com. - Direct requests require
x-api-key,anthropic-version, andcontent-type: application/json. - The primary interaction surface is the
Messages API, which expects amessagesarray plus request controls such asmodelandmax_tokens. - Responses include a provider request identifier, usage accounting, and a
stop_reasonthat governs the next step in the client loop.
Message semantics
- Anthropic's Messages API is stateless: callers resend the full conversation history on each request.
- Input turns can include synthetic
assistantmessages, which makes the API implementation-facing rather than chat-session-preserving by default. - Content is block-based rather than plain-text-only, which matters for multimodal inputs and tool use.
- Prefill remains possible through a final
assistantinput turn, but support is model-conditional and documented as unavailable for some newer models.
Streaming and tool use
- Streaming uses SSE with an event sequence centered on
message_start,content_block_*,message_delta, andmessage_stop. - Tool use is integrated into the same message/block structure rather than a separate tool-role channel.
- For client-side tools, the assistant emits
tool_useblocks withstop_reason: "tool_use", the caller executes the tool, and the caller returnstool_resultblocks in the nextusermessage. - Anthropic distinguishes client tools from server tools explicitly, which is a real provider-specific operational boundary.
Operational constraints
- Request-size limits, acceleration limits, and organization-tier rate limits are part of the documented contract, not incidental implementation details.
- Anthropic explicitly documents that SSE streams can fail after an HTTP
200, so transport-level success is not sufficient for completion success. - Prompt caching is a first-class context-management feature with explicit
cache_controlmarkers, short default TTLs, and limited cache-breakpoint slots.
Caveats
- Model support and some feature surfaces are conditional by model family and can change over time.
- Rate-limit tables and prefill support are especially subject to change; permanent notes should preserve the pattern, not freeze transient per-model numbers unless operationally necessary.
- Partner platforms such as Bedrock, Vertex AI, and Azure can differ from the direct Anthropic API in feature timing and payload limits.