Back to Llama Index

Multi-Modal LLM using Azure OpenAI GPT-4o mini for image reasoning

docs/examples/multi_modal/azure_openai_multi_modal.ipynb

0.14.213.6 KB
Original Source

<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/multi_modal/azure_openai_multi_modal.ipynb" target="_parent"></a>

Multi-Modal LLM using Azure OpenAI GPT-4o mini for image reasoning

In this notebook, we show how to use GPT-4o mini with the Azure OpenAI LLM class/abstraction for image understanding/reasoning. For a more complete example, please visit this notebook.

python
%pip install llama-index-llms-azure-openai

Prerequisites

  1. Setup an Azure subscription - you can create one for free here
  2. Apply for access to Azure OpenAI Service here
  3. Create a resource in the Azure portal here
  4. Deploy a model in Azure OpenAI Studio here

You can find more details in this guide.

Note down the "model name" and "deployment name", you'll need it when connecting to your LLM.

Use GPT-4o mini to understand Images from URLs / base64

python
import os

os.environ["AZURE_OPENAI_API_KEY"] = "xxx"
os.environ["AZURE_OPENAI_ENDPOINT"] = "https://YOUR_URL.openai.azure.com/"
os.environ["OPENAI_API_VERSION"] = "2024-02-15-preview"

Initialize AzureOpenAI and Load Images from URLs

Unlike regular OpenAI, you need to pass the engine argument in addition to model. The engine is the name you gave to your model when you deployed it in Azure OpenAI Studio.

python
from llama_index.llms.azure_openai import AzureOpenAI
python
azure_openai_llm = AzureOpenAI(
    engine="my-gpt-4o-mini",
    model="gpt-4o-mini",
    max_new_tokens=300,
)

Alternatively, you can also skip setting environment variables, and pass the parameters in directly via constructor.

python
azure_openai_llm = AzureOpenAI(
    azure_endpoint="https://YOUR_URL.openai.azure.com/",
    engine="my-gpt-4o-mini",
    api_version="2024-02-15-preview",
    model="gpt-4o-mini",
    max_new_tokens=300,
    api_key="xxx",
    supports_content_blocks=True,
)
python
import base64
import requests
from llama_index.core.schema import Document, MediaResource

image_url = "https://www.visualcapitalist.com/wp-content/uploads/2023/10/US_Mortgage_Rate_Surge-Sept-11-1.jpg"

response = requests.get(image_url)
if response.status_code != 200:
    raise ValueError("Error: Could not retrieve image from URL.")
img_data = base64.b64encode(response.content)

image_document = Document(image_resource=MediaResource(data=img_data))
python
from IPython.display import HTML

src = f''
HTML(src)

Complete a prompt with an image

python
from llama_index.core.llms import (
    ChatMessage,
    ImageBlock,
    TextBlock,
    MessageRole,
)

msg = ChatMessage(
    role=MessageRole.USER,
    blocks=[
        TextBlock(text="Describe the images as an alternative text"),
        ImageBlock(image=image_document.image_resource.data),
    ],
)

response = azure_openai_llm.chat(messages=[msg])
python
print(response)