Multi-Modal LLM using DashScope qwen-vl model for image reasoning

In this notebook, we show how to use DashScope qwen-vl MultiModal LLM class/abstraction for image understanding/reasoning. Async is not currently supported

We also show several functions we are now supporting for DashScope LLM:

complete (sync): for a single prompt and list of images
chat (sync): for multiple chat messages
stream complete (sync): for steaming output of complete
stream chat (sync): for steaming output of chat
multi round conversation.

python

!pip install -U llama-index-multi-modal-llms-dashscope

Use DashScope to understand Images from URLs

python

# Set API key
%env DASHSCOPE_API_KEY=YOUR_DASHSCOPE_API_KEY

Initialize `DashScopeMultiModal` and Load Images from URLs

python

from llama_index.multi_modal_llms.dashscope import (
    DashScopeMultiModal,
    DashScopeMultiModalModels,
)

from llama_index.core.multi_modal_llms.generic_utils import load_image_urls


image_urls = [
    "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg",
]

image_documents = load_image_urls(image_urls)

dashscope_multi_modal_llm = DashScopeMultiModal(
    model_name=DashScopeMultiModalModels.QWEN_VL_MAX,
)

Complete a prompt with images

python

complete_response = dashscope_multi_modal_llm.complete(
    prompt="What's in the image?",
    image_documents=image_documents,
)
print(complete_response)

python

### Complete a prompt with multi images
multi_image_urls = [
    "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg",
    "https://dashscope.oss-cn-beijing.aliyuncs.com/images/panda.jpeg",
]

multi_image_documents = load_image_urls(multi_image_urls)
complete_response = dashscope_multi_modal_llm.complete(
    prompt="What animals are in the pictures?",
    image_documents=multi_image_documents,
)
print(complete_response)

Steam Complete a prompt with a bunch of images

python

stream_complete_response = dashscope_multi_modal_llm.stream_complete(
    prompt="What's in the image?",
    image_documents=image_documents,
)

for r in stream_complete_response:
    print(r.delta, end="")

multi round conversation with chat messages

python

from llama_index.core.base.llms.types import MessageRole
from llama_index.multi_modal_llms.dashscope.utils import (
    create_dashscope_multi_modal_chat_message,
)

chat_message_user_1 = create_dashscope_multi_modal_chat_message(
    "What's in the image?", MessageRole.USER, image_documents
)
chat_response = dashscope_multi_modal_llm.chat([chat_message_user_1])
print(chat_response.message.content[0]["text"])
chat_message_assistent_1 = create_dashscope_multi_modal_chat_message(
    chat_response.message.content[0]["text"], MessageRole.ASSISTANT, None
)
chat_message_user_2 = create_dashscope_multi_modal_chat_message(
    "what are they doing?", MessageRole.USER, None
)
chat_response = dashscope_multi_modal_llm.chat(
    [chat_message_user_1, chat_message_assistent_1, chat_message_user_2]
)
print(chat_response.message.content[0]["text"])

Stream Chat through a list of chat messages

python

stream_chat_response = dashscope_multi_modal_llm.stream_chat(
    [chat_message_user_1, chat_message_assistent_1, chat_message_user_2]
)
for r in stream_chat_response:
    print(r.delta, end="")

Use images from local files

Use local file:
Linux&mac file schema: file:///home/images/test.png
Windows file schema: file://D:/images/abc.png

python

from llama_index.multi_modal_llms.dashscope.utils import load_local_images

local_images = [
    "file://THE_FILE_PATH1",
    "file://THE_FILE_PATH2",
]

image_documents = load_local_images(local_images)
chat_message_local = create_dashscope_multi_modal_chat_message(
    "What animals are in the pictures?", MessageRole.USER, image_documents
)
chat_response = dashscope_multi_modal_llm.chat([chat_message_local])
print(chat_response.message.content[0]["text"])

Multi-Modal LLM using DashScope qwen-vl model for image reasoning

Use DashScope to understand Images from URLs

Initialize DashScopeMultiModal and Load Images from URLs

Complete a prompt with images

Steam Complete a prompt with a bunch of images

multi round conversation with chat messages

Stream Chat through a list of chat messages

Use images from local files

Initialize `DashScopeMultiModal` and Load Images from URLs