Back to Llama Index

ASI LLM

docs/examples/llm/asi1.ipynb

0.14.2114.1 KB
Original Source

<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/llm/asi1.ipynb" target="_parent"></a>

ASI LLM

ASI1-Mini is an advanced, agentic LLM designed by fetch.ai, a founding member of Artificial Superintelligence Alliance for decentralized operations. Its unique architecture empowers it to execute tasks and collaborate with other agents for efficient, adaptable problem-solving in complex environments.

This notebook demonstrates how to use ASI models with LlamaIndex. It covers various functionalities including basic completion, chat, streaming, function calling, structured prediction, RAG, and more. If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

Setup

First, let's install the required packages:

python
%pip install llama-index-llms-asi llama-index-llms-openai llama-index-core

Setting API Keys

You'll need to set your API keys for ASI and optionally for OpenAI if you want to compare the two:

python
import os

# Set your API keys here - To get the API key visit https://asi1.ai/chat and login
os.environ["ASI_API_KEY"] = "your-api-key"

Basic Completion

Let's start with a basic completion example using ASI:

python
from llama_index.llms.asi import ASI

# Create an ASI LLM instance
llm = ASI(model="asi1-mini")

# Complete a prompt
response = llm.complete("Who is Paul Graham? ")
print(response)

Chat

Now let's try chat functionality:

python
from llama_index.core.base.llms.types import ChatMessage

# Create messages
messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]

# Get chat response
chat_response = llm.chat(messages)
print(chat_response)

Streaming

ASI supports streaming for chat responses:

python
# Stream chat response
for chunk in llm.stream_chat(messages):
    print(chunk.delta, end="")

Using stream_chat endpoint

python
from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = llm.stream_chat(messages)
python
for r in resp:
    print(r.delta, end="")

Using stream_complete endpoint

python
resp = llm.stream_complete("Paul Graham is ")
python
for r in resp:
    print(r.delta, end="")

Image Support

ASI has support for images in the input of chat messages for many models.

Using the content blocks feature of chat messages, you can easily combone text and images in a single LLM prompt.

python
!wget https://cdn.pixabay.com/photo/2016/07/07/16/46/dice-1502706_640.jpg -O image.png
python
from llama_index.core.llms import ChatMessage, TextBlock, ImageBlock
from llama_index.llms.asi import ASI

llm = ASI(model="asi1-mini")

messages = [
    ChatMessage(
        role="user",
        blocks=[
            ImageBlock(path="image.png"),
            TextBlock(text="Describe the image in a few sentences."),
        ],
    )
]

resp = llm.chat(messages)
print(resp.message.content)

Function Calling/Tool Calling

ASI LLM have native support for function calling. This conveniently integrates with LlamaIndex tool abstractions, letting you plug in any arbitrary Python function to the LLM.

In the example below, we define a function to generate a Song object.

python
from pydantic import BaseModel
from llama_index.core.tools import FunctionTool
from llama_index.llms.asi import ASI


class Song(BaseModel):
    """A song with name and artist"""

    name: str
    artist: str


def generate_song(name: str, artist: str) -> Song:
    """Generates a song with provided name and artist."""
    return Song(name="Sky full of stars", artist="Coldplay")


# Create tool
tool = FunctionTool.from_defaults(fn=generate_song)

The strict parameter tells ASI whether or not to use constrained sampling when generating tool calls/structured outputs. This means that the generated tool call schema will always contain the expected fields.

Since this seems to increase latency, it defaults to false.

python
from llama_index.llms.asi import ASI

# Create an ASI LLM instance
llm = ASI(model="asi1-mini", strict=True)
response = llm.predict_and_call(
    [tool],
    "Pick a random song for me",
    # strict=True  # can also be set at the function level to override the class
)
print(str(response))
python
llm = ASI(model="asi1-mini")
response = llm.predict_and_call(
    [tool],
    "Generate five songs from the Beatles",
    allow_parallel_tool_calls=True,
)
for s in response.sources:
    print(f"Name: {s.tool_name}, Input: {s.raw_input}, Output: {str(s)}")

Manual Tool Calling

While automatic tool calling with predict_and_call provides a streamlined experience, manual tool calling gives you more control over the process. With manual tool calling, you can:

  1. Explicitly control when and how tools are called
  2. Process intermediate results before continuing the conversation
  3. Implement custom error handling and fallback strategies
  4. Chain multiple tool calls together in a specific sequence

ASI supports manual tool calling, but requires more specific prompting compared to some other LLMs. For best results with ASI, include a system message that explains the available tools and provide specific parameters in your user prompt.

The following example demonstrates manual tool calling with ASI to generate a song:

python
from pydantic import BaseModel
from llama_index.core.tools import FunctionTool
from llama_index.core.llms import ChatMessage


class Song(BaseModel):
    """A song with name and artist"""

    name: str
    artist: str


def generate_song(name: str, artist: str) -> Song:
    """Generates a song with provided name and artist."""
    return Song(name=name, artist=artist)


# Create tool
tool = FunctionTool.from_defaults(fn=generate_song)

# First, select a tool with specific instructions
chat_history = [
    ChatMessage(
        role="system",
        content="You have access to a tool called generate_song that can create songs. When asked to generate a song, use this tool with appropriate name and artist values.",
    ),
    ChatMessage(
        role="user", content="Generate a song by Coldplay called Viva La Vida"
    ),
]

# Get initial response
resp = llm.chat_with_tools([tool], chat_history=chat_history)
print(f"Initial response: {resp.message.content}")

# Check for tool calls
tool_calls = llm.get_tool_calls_from_response(
    resp, error_on_no_tool_call=False
)

# Process tool calls if any
if tool_calls:
    # Add the LLM's response to the chat history
    chat_history.append(resp.message)

    for tool_call in tool_calls:
        tool_name = tool_call.tool_name
        tool_kwargs = tool_call.tool_kwargs

        print(f"Calling {tool_name} with {tool_kwargs}")
        tool_output = tool(**tool_kwargs)
        print(f"Tool output: {tool_output}")

        # Add tool response to chat history
        chat_history.append(
            ChatMessage(
                role="tool",
                content=str(tool_output),
                additional_kwargs={"tool_call_id": tool_call.tool_id},
            )
        )

        # Get final response
        resp = llm.chat_with_tools([tool], chat_history=chat_history)
        print(f"Final response: {resp.message.content}")
else:
    print("No tool calls detected in the response.")

Structured Prediction

You can use ASI to extract structured data from text:

python
from llama_index.core.prompts import PromptTemplate
from pydantic import BaseModel
from typing import List


class MenuItem(BaseModel):
    """A menu item in a restaurant."""

    course_name: str
    is_vegetarian: bool


class Restaurant(BaseModel):
    """A restaurant with name, city, and cuisine."""

    name: str
    city: str
    cuisine: str
    menu_items: List[MenuItem]


# Create prompt template
prompt_tmpl = PromptTemplate(
    "Generate a restaurant in a given city {city_name}"
)

# Option 1: Use structured_predict
restaurant_obj = llm.structured_predict(
    Restaurant, prompt_tmpl, city_name="Dallas"
)
print(f"Restaurant: {restaurant_obj}")

# Option 2: Use as_structured_llm
structured_llm = llm.as_structured_llm(Restaurant)
restaurant_obj2 = structured_llm.complete(
    prompt_tmpl.format(city_name="Miami")
).raw
print(f"Restaurant: {restaurant_obj2}")

Note: Structured streaming is currently not supported with ASI.

Async

ASI supports async operations:

python
from llama_index.llms.asi import ASI

# Create an ASI LLM instance
llm = ASI(model="asi1-mini")
python
resp = await llm.acomplete("who is Paul Graham")
python
print(resp)
python
resp = await llm.astream_complete("Paul Graham is ")
python
import asyncio
import nest_asyncio

async for delta in resp:
    print(delta.delta, end="")
python
import asyncio
import nest_asyncio

# Enable nest_asyncio for Jupyter notebooks
nest_asyncio.apply()


async def test_async():
    # Async completion
    resp = await llm.acomplete("Paul Graham is ")
    print(f"Async completion: {resp}")

    # Async chat
    resp = await llm.achat(messages)
    print(f"Async chat: {resp}")

    # Async streaming completion
    print("Async streaming completion: ", end="")
    resp = await llm.astream_complete("Paul Graham is ")
    async for delta in resp:
        print(delta.delta, end="")
    print()

    # Async streaming chat
    print("Async streaming chat: ", end="")
    resp = await llm.astream_chat(messages)
    async for delta in resp:
        print(delta.delta, end="")
    print()


# Run async tests
asyncio.run(test_async())

Simple RAG

Let's implement a simple RAG application with ASI:

python
%pip install llama-index-embeddings-openai
python
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.embeddings.openai import OpenAIEmbedding

os.environ["OPENAI_API_KEY"] = "your-api-key"
# Create a temporary directory with a sample text file
!mkdir -p temp_data
!echo "Paul Graham is a programmer, writer, and investor. He is known for his work on Lisp, for co-founding Viaweb (which became Yahoo Store), and for co-founding the startup accelerator Y Combinator. He is also known for his essays on his website. He studied at HolaHola High school" > temp_data/paul_graham.txt

# Load documents
documents = SimpleDirectoryReader("temp_data").load_data()

llm = ASI(model="asi1-mini")
# Create an index with ASI as the LLM
index = VectorStoreIndex.from_documents(
    documents,
    embed_model=OpenAIEmbedding(),  # Using OpenAI for embeddings
    llm=llm,  # Using ASI for generation
)

# Create a query engine
query_engine = index.as_query_engine()

# Query the index
response = query_engine.query("Where did Paul Graham study?")
print(response)

LlamaCloud RAG

If you have a LlamaCloud account, you can use ASI with LlamaCloud for RAG:

python
# Install required packages
%pip install llama-cloud-services
python
import os
from llama_cloud_services import LlamaCloudIndex
from llama_index.llms.asi import ASI

# Set your LlamaCloud API key
os.environ["LLAMA_CLOUD_API_KEY"] = "your-key"
os.environ["OPENAI_API_KEY"] = "your-key"

# Connect to an existing LlamaCloud index


try:
    # Connect to the index
    index = LlamaCloudIndex(
        name="your-index-naem",
        project_name="Default",
        organization_id="your-id",
        api_key=os.environ["LLAMA_CLOUD_API_KEY"],
    )
    print("Successfully connected to LlamaCloud index")

    # Create an ASI LLM
    llm = ASI(model="asi1-mini")

    # Create a retriever
    retriever = index.as_retriever()

    # Create a query engine with ASI
    query_engine = index.as_query_engine(llm=llm)

    # Test retriever
    query = "What is the revenue of Uber in 2021?"
    print(f"\nTesting retriever with query: {query}")
    nodes = retriever.retrieve(query)
    print(f"Retrieved {len(nodes)} nodes\n")

    # Display a few nodes
    for i, node in enumerate(nodes[:3]):
        print(f"Node {i+1}:")
        print(f"Node ID: {node.node_id}")
        print(f"Score: {node.score}")
        print(f"Text: {node.text[:200]}...\n")

    # Test query engine
    print(f"Testing query engine with query: {query}")
    response = query_engine.query(query)
    print(f"Response: {response}")
except Exception as e:
    print(f"Error: {e}")

Set API Key at a per-instance level

If desired, you can have separate LLM instances use separate API keys:

python
from llama_index.llms.asi import ASI

# Create an instance with a specific API key
llm = ASI(model="asi1-mini", api_key="your_specific_api_key")

# Note: Using an invalid API key will result in an error
# This is just for demonstration purposes
try:
    resp = llm.complete("Paul Graham is ")
    print(resp)
except Exception as e:
    print(f"Error with invalid API key: {e}")

Additional kwargs

Rather than adding the same parameters to each chat or completion call, you can set them at a per-instance level with additional_kwargs:

python
from llama_index.llms.asi import ASI

# Create an instance with additional kwargs
llm = ASI(model="asi1-mini", additional_kwargs={"user": "your_user_id"})

# Complete a prompt
resp = llm.complete("Paul Graham is ")
print(resp)
python
from llama_index.core.base.llms.types import ChatMessage

# Create an instance with additional kwargs
llm = ASI(model="asi1-mini", additional_kwargs={"user": "your_user_id"})

# Create messages
messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]

# Get chat response
resp = llm.chat(messages)
print(resp)

Conclusion

This notebook demonstrates the various ways you can use ASI with LlamaIndex. The integration supports most of the functionality available in LlamaIndex, including:

  • Basic completion and chat
  • Streaming responses
  • Multimodal support
  • Function calling
  • Structured prediction
  • Async operations
  • RAG applications
  • LlamaCloud integration
  • Per-instance API keys
  • Additional kwargs

Note that structured streaming is currently not supported with ASI.