docs/capabilities/thinking.mdx
Thinking-capable models emit a thinking field that separates their reasoning trace from the final answer.
Use this capability to audit model steps, animate the model thinking in a UI, or hide the trace entirely when you only need the final response.
think levels: low, medium, high — the trace cannot be fully disabled)Set the think field on chat or generate requests. Most models accept booleans (true/false).
GPT-OSS instead expects one of low, medium, or high to tune the trace length.
The message.thinking (chat endpoint) or thinking (generate endpoint) field contains the reasoning trace while message.content / response holds the final answer.
response = chat(
model='qwen3',
messages=[{'role': 'user', 'content': 'How many letter r are in strawberry?'}],
think=True,
stream=False,
)
print('Thinking:\n', response.message.thinking)
print('Answer:\n', response.message.content)
```
const response = await ollama.chat({
model: 'deepseek-r1',
messages: [{ role: 'user', content: 'How many letter r are in strawberry?' }],
think: true,
stream: false,
})
console.log('Thinking:\n', response.message.thinking)
console.log('Answer:\n', response.message.content)
```
Thinking streams interleave reasoning tokens before answer tokens. Detect the first thinking chunk to render a "thinking" section, then switch to the final reply once message.content arrives.
stream = chat(
model='qwen3',
messages=[{'role': 'user', 'content': 'What is 17 × 23?'}],
think=True,
stream=True,
)
in_thinking = False
for chunk in stream:
if chunk.message.thinking and not in_thinking:
in_thinking = True
print('Thinking:\n', end='')
if chunk.message.thinking:
print(chunk.message.thinking, end='')
elif chunk.message.content:
if in_thinking:
print('\n\nAnswer:\n', end='')
in_thinking = False
print(chunk.message.content, end='')
```
async function main() {
const stream = await ollama.chat({
model: 'qwen3',
messages: [{ role: 'user', content: 'What is 17 × 23?' }],
think: true,
stream: true,
})
let inThinking = false
for await (const chunk of stream) {
if (chunk.message.thinking && !inThinking) {
inThinking = true
process.stdout.write('Thinking:\n')
}
if (chunk.message.thinking) {
process.stdout.write(chunk.message.thinking)
} else if (chunk.message.content) {
if (inThinking) {
process.stdout.write('\n\nAnswer:\n')
inThinking = false
}
process.stdout.write(chunk.message.content)
}
}
}
main()
```
ollama run deepseek-r1 --think "Where should I visit in Lisbon?"ollama run deepseek-r1 --think=false "Summarize this article"ollama run deepseek-r1 --hidethinking "Is 9.9 bigger or 9.11?"/set think or /set nothink.ollama run gpt-oss --think=low "Draft a headline" (replace low with medium or high as needed).<Note>Thinking is enabled by default in the CLI and API for supported models.</Note>