docs/examples/multi_modal/openai_multi_modal.ipynb
<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/multi_modal/openai_multi_modal.ipynb" target="_parent"></a>
In this notebook, we show how to use the OpenAI LLM abstraction with GPT4V for image understanding/reasoning.
We also show several functions that are currently supported in the OpenAI LLM class when working with GPT4V:
complete (both sync and async): for a single prompt and list of imageschat (both sync and async): for multiple chat messagesstream complete (both sync and async): for steaming output of completestream chat (both sync and async): for steaming output of chat%pip install llama-index-llms-openai matplotlib
import os
OPENAI_API_KEY = "sk-..." # Your OpenAI API token here
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
OpenAIMultiModal and Load Images from URLsfrom llama_index.llms.openai import OpenAI
image_urls = [
"https://res.cloudinary.com/hello-tickets/image/upload/c_limit,f_auto,q_auto,w_1920/v1640835927/o3pfl41q7m5bj8jardk0.jpg",
"https://www.visualcapitalist.com/wp-content/uploads/2023/10/US_Mortgage_Rate_Surge-Sept-11-1.jpg",
"https://i2-prod.mirror.co.uk/incoming/article7160664.ece/ALTERNATES/s1200d/FIFA-Ballon-dOr-Gala-2015.jpg",
]
openai_llm = OpenAI(model="gpt-4o", max_new_tokens=300)
from PIL import Image
import requests
from io import BytesIO
import matplotlib.pyplot as plt
img_response = requests.get(image_urls[0])
print(image_urls[0])
img = Image.open(BytesIO(img_response.content))
plt.imshow(img)
from llama_index.core.llms import (
ChatMessage,
ImageBlock,
TextBlock,
MessageRole,
)
msg = ChatMessage(
role=MessageRole.USER,
blocks=[
TextBlock(text="Describe the images as an alternative text"),
ImageBlock(url=image_urls[0]),
ImageBlock(url=image_urls[1]),
],
)
response = openai_llm.chat(messages=[msg])
print(response)
We can also stream the model response asynchronously
async_resp = await openai_llm.astream_chat(messages=[msg])
async for delta in async_resp:
print(delta.delta, end="")
%pip install llama-index-readers-file
from pathlib import Path
import shutil
import requests
img_path = Path().resolve() / "image.jpg"
response = requests.get(image_urls[-1])
with open(img_path, "wb") as file:
file.write(response.content)
msg = ChatMessage(
role=MessageRole.USER,
blocks=[
TextBlock(text="Describe the image as an alternative text"),
ImageBlock(path=img_path, image_mimetype="image/jpeg"),
],
)
response = openai_llm.chat(messages=[msg])
print(response)