Back to Ollama

Anthropic compatibility

docs/api/anthropic-compatibility.mdx

0.23.19.4 KB
Original Source

Ollama provides compatibility with the Anthropic Messages API to help connect existing applications to Ollama, including tools like Claude Code.

Usage

Environment variables

To use Ollama with tools that expect the Anthropic API (like Claude Code), set these environment variables:

shell
export ANTHROPIC_AUTH_TOKEN=ollama  # required but ignored
export ANTHROPIC_BASE_URL=http://localhost:11434

Simple /v1/messages example

<CodeGroup dropdown>
python
import anthropic

client = anthropic.Anthropic(
    base_url='http://localhost:11434',
    api_key='ollama',  # required but ignored
)

message = client.messages.create(
    model='qwen3-coder',
    max_tokens=1024,
    messages=[
        {'role': 'user', 'content': 'Hello, how are you?'}
    ]
)
print(message.content[0].text)
javascript
import Anthropic from "@anthropic-ai/sdk";

const anthropic = new Anthropic({
  baseURL: "http://localhost:11434",
  apiKey: "ollama", // required but ignored
});

const message = await anthropic.messages.create({
  model: "qwen3-coder",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Hello, how are you?" }],
});

console.log(message.content[0].text);
shell
curl -X POST http://localhost:11434/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: ollama" \
-H "anthropic-version: 2023-06-01" \
-d '{
  "model": "qwen3-coder",
  "max_tokens": 1024,
  "messages": [{ "role": "user", "content": "Hello, how are you?" }]
}'
</CodeGroup>

Streaming example

<CodeGroup dropdown>
python
import anthropic

client = anthropic.Anthropic(
    base_url='http://localhost:11434',
    api_key='ollama',
)

with client.messages.stream(
    model='qwen3-coder',
    max_tokens=1024,
    messages=[{'role': 'user', 'content': 'Count from 1 to 10'}]
) as stream:
    for text in stream.text_stream:
        print(text, end='', flush=True)
javascript
import Anthropic from "@anthropic-ai/sdk";

const anthropic = new Anthropic({
  baseURL: "http://localhost:11434",
  apiKey: "ollama",
});

const stream = await anthropic.messages.stream({
  model: "qwen3-coder",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Count from 1 to 10" }],
});

for await (const event of stream) {
  if (
    event.type === "content_block_delta" &&
    event.delta.type === "text_delta"
  ) {
    process.stdout.write(event.delta.text);
  }
}
shell
curl -X POST http://localhost:11434/v1/messages \
-H "Content-Type: application/json" \
-d '{
  "model": "qwen3-coder",
  "max_tokens": 1024,
  "stream": true,
  "messages": [{ "role": "user", "content": "Count from 1 to 10" }]
}'
</CodeGroup>

Tool calling example

<CodeGroup dropdown>
python
import anthropic

client = anthropic.Anthropic(
    base_url='http://localhost:11434',
    api_key='ollama',
)

message = client.messages.create(
    model='qwen3-coder',
    max_tokens=1024,
    tools=[
        {
            'name': 'get_weather',
            'description': 'Get the current weather in a location',
            'input_schema': {
                'type': 'object',
                'properties': {
                    'location': {
                        'type': 'string',
                        'description': 'The city and state, e.g. San Francisco, CA'
                    }
                },
                'required': ['location']
            }
        }
    ],
    messages=[{'role': 'user', 'content': "What's the weather in San Francisco?"}]
)

for block in message.content:
    if block.type == 'tool_use':
        print(f'Tool: {block.name}')
        print(f'Input: {block.input}')
javascript
import Anthropic from "@anthropic-ai/sdk";

const anthropic = new Anthropic({
  baseURL: "http://localhost:11434",
  apiKey: "ollama",
});

const message = await anthropic.messages.create({
  model: "qwen3-coder",
  max_tokens: 1024,
  tools: [
    {
      name: "get_weather",
      description: "Get the current weather in a location",
      input_schema: {
        type: "object",
        properties: {
          location: {
            type: "string",
            description: "The city and state, e.g. San Francisco, CA",
          },
        },
        required: ["location"],
      },
    },
  ],
  messages: [{ role: "user", content: "What's the weather in San Francisco?" }],
});

for (const block of message.content) {
  if (block.type === "tool_use") {
    console.log("Tool:", block.name);
    console.log("Input:", block.input);
  }
}
shell
curl -X POST http://localhost:11434/v1/messages \
-H "Content-Type: application/json" \
-d '{
  "model": "qwen3-coder",
  "max_tokens": 1024,
  "tools": [
    {
      "name": "get_weather",
      "description": "Get the current weather in a location",
      "input_schema": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "The city and state"
          }
        },
        "required": ["location"]
      }
    }
  ],
  "messages": [{ "role": "user", "content": "What is the weather in San Francisco?" }]
}'
</CodeGroup>

Using with Claude Code

Claude Code can be configured to use Ollama as its backend.

For coding use cases, models like glm-4.7, minimax-m2.1, and qwen3-coder are recommended.

Download a model before use:

shell
ollama pull qwen3-coder

Note: Qwen 3 coder is a 30B parameter model requiring at least 24GB of VRAM to run smoothly. More is required for longer context lengths.

shell
ollama pull glm-4.7:cloud

Quick setup

shell
ollama launch claude

This will prompt you to select a model, configure Claude Code automatically, and launch it. To configure without launching:

shell
ollama launch claude --config

Manual setup

Set the environment variables and run Claude Code:

shell
ANTHROPIC_AUTH_TOKEN=ollama ANTHROPIC_BASE_URL=http://localhost:11434 claude --model qwen3-coder

Or set the environment variables in your shell profile:

shell
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=http://localhost:11434

Then run Claude Code with any Ollama model:

shell
claude --model qwen3-coder

Endpoints

/v1/messages

Supported features

  • Messages
  • Streaming
  • System prompts
  • Multi-turn conversations
  • Vision (images)
  • Tools (function calling)
  • Tool results
  • Thinking/extended thinking

Supported request fields

  • model
  • max_tokens
  • messages
    • Text content
    • Image content (base64)
    • Array of content blocks
    • tool_use blocks
    • tool_result blocks
    • thinking blocks
  • system (string or array)
  • stream
  • temperature
  • top_p
  • top_k
  • stop_sequences
  • tools
  • thinking
  • tool_choice
  • metadata

Supported response fields

  • id
  • type
  • role
  • model
  • content (text, tool_use, thinking blocks)
  • stop_reason (end_turn, max_tokens, tool_use)
  • usage (input_tokens, output_tokens)

Streaming events

  • message_start
  • content_block_start
  • content_block_delta (text_delta, input_json_delta, thinking_delta)
  • content_block_stop
  • message_delta
  • message_stop
  • ping
  • error

Models

Ollama supports both local and cloud models.

Local models

Pull a local model before use:

shell
ollama pull qwen3-coder

Recommended local models:

  • qwen3-coder - Excellent for coding tasks
  • gpt-oss:20b - Strong general-purpose model

Cloud models

Cloud models are available immediately without pulling:

  • glm-4.7:cloud - High-performance cloud model
  • minimax-m2.1:cloud - Fast cloud model

Default model names

For tooling that relies on default Anthropic model names such as claude-3-5-sonnet, use ollama cp to copy an existing model name:

shell
ollama cp qwen3-coder claude-3-5-sonnet

Afterwards, this new model name can be specified in the model field:

shell
curl http://localhost:11434/v1/messages \
    -H "Content-Type: application/json" \
    -d '{
        "model": "claude-3-5-sonnet",
        "max_tokens": 1024,
        "messages": [
            {
                "role": "user",
                "content": "Hello!"
            }
        ]
    }'

Differences from the Anthropic API

Behavior differences

  • API key is accepted but not validated
  • anthropic-version header is accepted but not used
  • Token counts are approximations based on the underlying model's tokenizer

Not supported

The following Anthropic API features are not currently supported:

FeatureDescription
/v1/messages/count_tokensToken counting endpoint
tool_choiceForcing specific tool use or disabling tools
metadataRequest metadata (user_id)
Prompt cachingcache_control blocks for caching prefixes
Batches API/v1/messages/batches for async batch processing
Citationscitations content blocks
PDF supportdocument content blocks with PDF files
Server-sent errorserror events during streaming (errors return HTTP status)

Partial support

FeatureStatus
Image contentBase64 images supported; URL images not supported
Extended thinkingBasic support; budget_tokens accepted but not enforced