What is Extended Thinking?

Extended Thinking gives supported models more time and a larger computation budget to deliberate before answering. It can improve results on multi-step tasks like coding, math, and logic, at the expense of latency and cost. Some providers can also return a thinking trace or signature alongside the final answer.

Controlling Extended Thinking

Use with_thinking to control models that support thinking. Some models think by default, so with_thinking is for tuning (or disabling) rather than turning it on.

ruby

chat = RubyLLM.chat(model: 'claude-opus-4.5')
  .with_thinking(effort: :high, budget: 8000)

response = chat.ask("What is 15 * 23?")

response.thinking&.text
response.thinking&.signature
response.content

with_thinking requires at least one of effort or budget:

ruby

chat.with_thinking(effort: :low)
chat.with_thinking(budget: 10_000)
chat.with_thinking(effort: :none)

Effort and Budget

Use effort to pick a qualitative depth (:low, :medium, :high) and budget for models that accept a token cap.

RubyLLM sends effort and budget exactly as provided. Check your provider's docs for supported values.

Streaming with Thinking

Thinking content is delivered alongside normal content in streaming chunks:

ruby

chat = RubyLLM.chat(model: 'claude-opus-4.5')
  .with_thinking(effort: :medium)

chat.ask("Solve this step by step: What is 127 * 43?") do |chunk|
  print chunk.thinking&.text
  print chunk.content
end

Some providers only expose thinking in the final response. In those cases, response.thinking is populated after the stream completes, and chunk.thinking stays empty.

ActiveRecord Integration

When using acts_as_chat and acts_as_message, thinking output is persisted to the message table:

ruby

# Migration (generated automatically with new installs)
# t.text :thinking_text
# t.text :thinking_signature
# t.integer :thinking_tokens

response = chat_record.ask("Explain quantum entanglement")
response.thinking&.text
response.thinking_tokens

Upgrading Existing Installations

For 1.10 upgrades, consider using the [upgrade guide]({% link _advanced/upgrading.md %}#upgrade-to-1-10) to run the generator. If you prefer manual migrations, add the columns to your message and tool calls tables:

ruby

class AddThinkingToMessages < ActiveRecord::Migration[7.1]
  def change
    add_column :messages, :thinking_text, :text
    add_column :messages, :thinking_signature, :text
    add_column :messages, :thinking_tokens, :integer
    add_column :tool_calls, :thought_signature, :string
  end
end

Provider Notes

Claude uses a thinking budget and can return both text and signature.
Anthropic requires a thinking budget.
Bedrock thinking params are model-dependent; models may accept budget, effort, or provider-specific fields.
Gemini 2.5 uses a token budget; Gemini 3 uses effort levels.
OpenAI reasoning models accept effort but may not return thinking text or signatures.
Perplexity sonar reasoning models stream <think> blocks inside content; RubyLLM extracts them after the response completes.
Mistral Magistral models always think and ignore with_thinking params. Non-magistral models warn if you pass them.
Ollama's Qwen3 models think by default and only accept effort: :none to disable thinking.
Anthropic and Ollama integrations currently do not report thinking token counts.

Next Steps

[Streaming Responses]({% link _core_features/streaming.md %})
[Rails Integration]({% link _advanced/rails.md %})
[Error Handling]({% link _advanced/error-handling.md %})

Extended Thinking

{{ page.title }}

Table of contents

What is Extended Thinking?

Controlling Extended Thinking

Effort and Budget

Streaming with Thinking

ActiveRecord Integration

Upgrading Existing Installations

Provider Notes

Next Steps