Thinking - Ollama — ContextQMD

Thinking-capable models emit a thinking field that separates their reasoning trace from the final answer.

Use this capability to audit model steps, animate the model thinking in a UI, or hide the trace entirely when you only need the final response.

Supported models

Qwen 3
GPT-OSS (use think levels: low, medium, high — the trace cannot be fully disabled)
DeepSeek-v3.1
DeepSeek R1
Browse the latest additions under thinking models

Enable thinking in API calls

Set the think field on chat or generate requests. Most models accept booleans (true/false).

GPT-OSS instead expects one of low, medium, or high to tune the trace length.

The message.thinking (chat endpoint) or thinking (generate endpoint) field contains the reasoning trace while message.content / response holds the final answer.

<Tabs> <Tab title="cURL"> ```shell curl http://localhost:11434/api/chat -d '{ "model": "qwen3", "messages": [{ "role": "user", "content": "How many letter r are in strawberry?" }], "think": true, "stream": false }' ``` </Tab> <Tab title="Python"> ```python from ollama import chat

response = chat(
  model='qwen3',
  messages=[{'role': 'user', 'content': 'How many letter r are in strawberry?'}],
  think=True,
  stream=False,
)

print('Thinking:\n', response.message.thinking)
print('Answer:\n', response.message.content)
```

</Tab> <Tab title="JavaScript"> ```javascript import ollama from 'ollama'

const response = await ollama.chat({
  model: 'deepseek-r1',
  messages: [{ role: 'user', content: 'How many letter r are in strawberry?' }],
  think: true,
  stream: false,
})

console.log('Thinking:\n', response.message.thinking)
console.log('Answer:\n', response.message.content)
```

</Tab> </Tabs> <Note> GPT-OSS requires `think` to be set to `"low"`, `"medium"`, or `"high"`. Passing `true`/`false` is ignored for that model. </Note>

Stream the reasoning trace

Thinking streams interleave reasoning tokens before answer tokens. Detect the first thinking chunk to render a "thinking" section, then switch to the final reply once message.content arrives.

<Tabs> <Tab title="Python"> ```python from ollama import chat

stream = chat(
  model='qwen3',
  messages=[{'role': 'user', 'content': 'What is 17 × 23?'}],
  think=True,
  stream=True,
)

in_thinking = False

for chunk in stream:
  if chunk.message.thinking and not in_thinking:
    in_thinking = True
    print('Thinking:\n', end='')

  if chunk.message.thinking:
    print(chunk.message.thinking, end='')
  elif chunk.message.content:
    if in_thinking:
      print('\n\nAnswer:\n', end='')
      in_thinking = False
    print(chunk.message.content, end='')

```

</Tab> <Tab title="JavaScript"> ```javascript import ollama from 'ollama'

async function main() {
  const stream = await ollama.chat({
    model: 'qwen3',
    messages: [{ role: 'user', content: 'What is 17 × 23?' }],
    think: true,
    stream: true,
  })

  let inThinking = false

  for await (const chunk of stream) {
    if (chunk.message.thinking && !inThinking) {
      inThinking = true
      process.stdout.write('Thinking:\n')
    }

    if (chunk.message.thinking) {
      process.stdout.write(chunk.message.thinking)
    } else if (chunk.message.content) {
      if (inThinking) {
        process.stdout.write('\n\nAnswer:\n')
        inThinking = false
      }
      process.stdout.write(chunk.message.content)
    }
  }
}

main()
```

</Tab> </Tabs>

CLI quick reference

Enable thinking for a single run: ollama run deepseek-r1 --think "Where should I visit in Lisbon?"
Disable thinking: ollama run deepseek-r1 --think=false "Summarize this article"
Hide the trace while still using a thinking model: ollama run deepseek-r1 --hidethinking "Is 9.9 bigger or 9.11?"
Inside interactive sessions, toggle with /set think or /set nothink.
GPT-OSS only accepts levels: ollama run gpt-oss --think=low "Draft a headline" (replace low with medium or high as needed).

<Note>Thinking is enabled by default in the CLI and API for supported models.</Note>