Skip to content

Anthropic

The Anthropic provider connects locus directly to Anthropic's API (api.anthropic.com). Use it when you want the Claude family — Opus for the hardest problems, Sonnet as the everyday workhorse, Haiku for high-volume cheap calls — and want to talk to Anthropic without going through an intermediary.

Two things make this provider distinct: prompt caching (long system prompts and tool blocks pay 1/10th the input cost on repeat turns) and extended thinking (Claude 4 surfaces its reasoning as a stream of typed events your UI can render).

When to pick Anthropic

You want… This is the right provider
Claude Opus / Sonnet / Haiku from Anthropic directly
Long system prompts amortised across many turns ✓ — built-in prompt caching
Extended-thinking models with visible reasoning ✓ — ThinkEvent stream
Claude on Oracle infrastructure (no separate API key) use OCIoci:anthropic.claude-sonnet
GPT or Llama use OpenAI or OCI instead

Getting started

1. Set your API key

export ANTHROPIC_API_KEY=sk-ant-...

2. Pick a Claude model

from locus import Agent

agent = Agent(
    model="anthropic:claude-sonnet-4-20250514",
    system_prompt="You are a helpful assistant.",
)

The string "anthropic:claude-sonnet-4-20250514" tells locus the provider (anthropic:) and the exact model id. Any model id Anthropic accepts, locus accepts — including the dated revision suffixes (-20250514).

3. Run it

result = agent.run_sync("Summarise the design doc in three bullets.")
print(result.message)

That's the full setup. Streaming, tool calling, prompt caching, and extended thinking work without extra configuration.

What you get out of the box

The whole Claude family

Whatever Anthropic ships, you can address by name:

Model When to pick it
claude-opus-4-… Hardest problems — code archaeology, deep research, multi-step reasoning
claude-sonnet-4-… Everyday workhorse — fast enough, smart enough, cheap enough
claude-haiku-4-… High-volume cheap calls — classification, routing, simple summaries

Real SSE streaming

Token-level streaming. The model emits content deltas; locus converts them to ModelChunkEvents; your async for loop reads them as they arrive.

async for event in agent.run("Write a haiku about latency."):
    if isinstance(event, ModelChunkEvent) and event.content:
        print(event.content, end="", flush=True)

Tool calling — the Anthropic tool-use protocol

@tool functions are translated into Anthropic's tools schema; the model's structured tool_use blocks are parsed back into locus ToolCalls. Parallel tool calls are supported (the model can request multiple tools per turn; locus runs them concurrently via the ConcurrentExecutor).

Structured output — tool-as-schema

Anthropic doesn't expose a response_format field, so locus uses the standard "single-tool" trick: define the schema as a tool, force the model to call it. From your side, the API is identical to the other providers:

from pydantic import BaseModel

class Triage(BaseModel):
    severity: str
    needs_human: bool

agent = Agent(
    model="anthropic:claude-sonnet-4-20250514",
    output_schema=Triage,
)
result = agent.run_sync("This page is broken!")
print(result.parsed)        # Triage(severity='high', needs_human=True)

Prompt caching — opt in for long prompts

This is the biggest cost saver if your system prompt or tool block is long (skills, playbooks, RAG context). Anthropic's prompt-caching mechanism marks a span of the request as cacheable; subsequent turns within the cache window pay 1/10th the input cost on the cached span.

Opt in with prompt_cache=True on AnthropicModel. Locus then sends the system prompt as a block list with cache_control: ephemeral and tags the last entry of the tool catalog the same way (Anthropic walks markers in order — the last tag anchors the cache point).

from locus import Agent
from locus.models.native.anthropic import AnthropicModel

agent = Agent(
    model=AnthropicModel(
        model="claude-sonnet-4-20250514",
        prompt_cache=True,
    ),
    tools=[...],
    system_prompt="<a long system prompt — skills, playbooks, RAG context>",
)

result = agent.run_sync("...")
print(f"cache writes: {result.metrics.cache_creation_input_tokens}")
print(f"cache reads:  {result.metrics.cache_read_input_tokens}")
# → cache writes: 4092      (turn 1, written once)
# → cache reads:  4092       (turn 2 — same prefix, ~10× cheaper input)

When it kicks in:

  • A 5-minute "ephemeral" cache (rolling window).
  • Subsequent turns reusing the same prefix pay 0.1× input rate on the cached portion.
  • Most effective when system prompts ≥ ~1024 tokens, or you've loaded a big skill / playbook / RAG block.

cache_creation_input_tokens and cache_read_input_tokens surface on AgentResult.metrics so observability hooks can chart cache hits and the cost saved.

Extended thinking — visible reasoning

Claude 4 models with thinking_enabled think before answering, the way the OpenAI o-series does. Anthropic surfaces those thinking blocks in the response; locus emits a ThinkEvent for each one so your UI can show what the model is working on:

async for event in agent.run("..."):
    match event:
        case ThinkEvent(reasoning=r) if r:
            print(f"💭 {r}")
        case ModelChunkEvent(content=c) if c:
            print(c, end="", flush=True)

Claude on OCI — same model, different provider

Don't have an Anthropic API key? Want Claude billed through your Oracle account on Oracle infrastructure? Switch the prefix:

agent = Agent(model="oci:anthropic.claude-sonnet")

That routes through OCIOpenAIModel against the OCI Generative AI endpoint (uses OCI_PROFILE for auth, no ANTHROPIC_API_KEY needed). Same model behind it; different billing surface.

Common gotchas

Symptom Likely cause
401 authentication_error ANTHROPIC_API_KEY not set, or set to a key without console access
404 not_found_error on the model id Dated revision suffix is wrong; check https://docs.anthropic.com/en/docs/about-claude/models/all-models
429 overloaded_error Anthropic capacity; the ModelRetryHook re-tries with backoff if installed
Prompt caching not visible in usage stats Cache window expired (5 min ephemeral) or prompt below the threshold
ThinkEvents never fire Model not in the extended-thinking subset, or thinking_enabled not set in model_config

Source

AnthropicModel in src/locus/models/native/anthropic.py

See also