docs/_core_features/image-generation.md
{: .no_toc }
{{ page.description }} {: .fs-6 .fw-300 }
{: .no_toc .text-delta }
After reading this guide, you will know:
The simplest way to generate an image is using the global RubyLLM.paint method:
# Generate an image using the default image model
image = RubyLLM.paint("A photorealistic image of a red panda coding Ruby on a laptop")
# For models returning a URL:
if image.url
puts "Image URL: #{image.url}"
# => "https://oaidalleapiprodscus.blob.core.windows.net/..."
end
# For models returning Base64 data (like Imagen):
if image.base64?
puts "MIME Type: #{image.mime_type}" # => "image/png" or similar
puts "Data size: ~#{image.data.length} bytes"
end
# Some models revise the prompt for better results
if image.revised_prompt
puts "Revised Prompt: #{image.revised_prompt}"
# => "A photorealistic depiction of a red panda intently coding Ruby..."
end
puts "Model Used: #{image.model_id}"
The paint method abstracts the differences between provider APIs.
{: .d-inline-block }
v1.15+ {: .label .label-green }
When providers return image token usage, images expose the same cost shape as chats and messages:
image = RubyLLM.paint("A small watercolor robot", model: "gpt-image-1")
image.tokens.input
image.tokens.output
image.cost.input
image.cost.output
image.cost.total
Image costs use provider usage data plus pricing from the model registry. For models that report separate text and image input token details, RubyLLM applies the right pricing bucket to each part and returns the combined value as image.cost.input.
{: .d-inline-block }
v1.15+ {: .label .label-green }
Some models, such as OpenAI's GPT Image models, can edit an existing image instead of generating from scratch. Use with: to pass one or more source images, and mask: when you want to constrain which parts of the image may change.
image = RubyLLM.paint(
"Turn the logo green and keep the background transparent",
model: "gpt-image-1",
with: "logo.png"
)
with: accepts the same kinds of sources RubyLLM already supports elsewhere for attachments: local files, URLs, IO-like objects, and Active Storage attachments.
image = RubyLLM.paint(
"Combine these references into a postcard illustration",
model: "gpt-image-1",
with: ["person.png", "style-reference.png"]
)
image = RubyLLM.paint(
"Replace only the background with a sunset sky",
model: "gpt-image-1",
with: "portrait.png",
mask: "portrait-mask.png",
params: { size: "1024x1024" }
)
By default, RubyLLM uses the model specified in config.default_image_model, but you can specify a different one.
# Explicitly use GPT-Image-1
image_dalle = RubyLLM.paint(
"Impressionist painting of a Parisian cafe",
model: "{{ site.models.image_openai }}"
)
# Use Google's Imagen 3
image_imagen = RubyLLM.paint(
"Cyberpunk city street at night, raining, neon signs",
model: "{{ site.models.image_google }}"
)
# Use a model not in the registry (useful for custom endpoints)
image_custom = RubyLLM.paint(
"A sunset over mountains",
model: "my-custom-image-model",
provider: :openai,
assume_model_exists: true
)
You can configure the default model globally:
RubyLLM.configure do |config|
config.default_image_model = "{{ site.models.default_image }}" # Or another available image model ID
end
Refer to the [Working with Models Guide]({% link _advanced/models.md %}) and the [Available Models Guide]({% link _reference/available-models.md %}) to find image models.
Some models, like DALL-E 3, allow you to specify the desired image dimensions via the size: argument.
# Standard square (1024x1024 - default for DALL-E 3)
image_square = RubyLLM.paint("a fluffy white cat", size: "1024x1024")
# Wide landscape (1792x1024 for DALL-E 3)
image_landscape = RubyLLM.paint(
"a panoramic mountain landscape at dawn",
size: "1792x1024"
)
# Tall portrait (1024x1792 for DALL-E 3)
image_portrait = RubyLLM.paint(
"a knight standing before a castle gate",
size: "1024x1792"
)
Not all models support size customization. If a size is specified for a model that doesn't support it (like Google Imagen), RubyLLM may log a debug message indicating the size parameter is ignored. Check the provider's documentation or the [Available Models Guide]({% link _reference/available-models.md %}) for supported sizes. {: .note }
The RubyLLM::Image object provides access to the generated image data and metadata.
image.url: Returns the URL for providers like OpenAI. nil otherwise.image.data: Returns the Base64-encoded image data string for providers like Google (Imagen). nil otherwise.image.mime_type: Returns the MIME type (e.g., "image/png", "image/jpeg").image.base64?: Returns true if the image data is Base64-encoded, false otherwise.The save method works regardless of whether the image was delivered via URL or Base64. It fetches the data if necessary and writes it to the specified file path.
# Generate an image
image = RubyLLM.paint("A steampunk mechanical owl")
# Save the image to a local file
begin
saved_path = image.save("steampunk_owl.png")
puts "Image saved to #{saved_path}"
rescue => e
puts "Failed to save image: #{e.message}"
end
The to_blob method returns the raw binary image data (decoded from Base64 or downloaded from URL). This is useful for integration with other libraries or frameworks.
image = RubyLLM.paint("Abstract geometric patterns in pastel colors")
image_blob = image.to_blob
# Now you can use image_blob, e.g., upload to S3, process with MiniMagick, etc.
puts "Image blob size: #{image_blob.bytesize} bytes"
Use to_blob to easily attach generated images to Active Storage attributes.
# In a Rails model or job
class Product < ApplicationRecord
has_one_attached :generated_image
end
def generate_and_attach_image(product, prompt)
puts "Generating image for Product #{product.id}..."
image = RubyLLM.paint(prompt) # Or another model
filename = "#{product.slug}-#{Time.current.to_i}.png"
# Use StringIO to provide an IO object to Active Storage
image_io = StringIO.new(image.to_blob)
product.generated_image.attach(
io: image_io,
filename: filename,
content_type: image.mime_type || 'image/png' # Use detected MIME type or default
)
puts "Image attached successfully."
# Optionally save metadata
product.update(
image_prompt: prompt,
image_revised_prompt: image.revised_prompt,
image_model: image.model_id
)
rescue RubyLLM::Error => e
puts "Image generation failed: #{e.message}"
# Handle error appropriately
rescue => e
puts "Failed to attach image: #{e.message}"
# Handle attachment error
end
# Usage:
# product = Product.find(1)
# generate_and_attach_image(product, "A sleek, modern logo for 'RubyLLM'")
Crafting effective prompts is key to getting the desired image. Be descriptive!
# Simple prompt - often yields generic results
image1 = RubyLLM.paint("dog")
# Detailed prompt - better results
image2 = RubyLLM.paint(
"A photorealistic image of a golden retriever puppy playing fetch " \
"in a sunny park, shallow depth of field, captured with a DSLR camera."
)
# Specify style
image3 = RubyLLM.paint(
"A majestic mountain range, oil painting in the style of Bob Ross"
)
Tips for Better Prompts:
Image generation can fail due to content policy violations, rate limits, or API issues:
begin
image = RubyLLM.paint("Your prompt here")
puts "Image URL: #{image.url}"
rescue RubyLLM::BadRequestError => e
# Often indicates a content policy violation
puts "Generation failed: #{e.message}"
rescue RubyLLM::Error => e
puts "Error: #{e.message}"
end
See the [Error Handling Guide]({% link _advanced/error-handling.md %}) for comprehensive error handling strategies.
AI image generation services have content safety filters. Prompts requesting harmful, explicit, or otherwise prohibited content will usually result in a BadRequestError. Avoid generating:
Image generation can take several seconds (typically 5-20 seconds depending on the model and load).