Interacting with LLM deployed in Amazon SageMaker Endpoint with LlamaIndex

An Amazon SageMaker endpoint is a fully managed resource that enables the deployment of machine learning models, specifically LLM (Large Language Models), for making predictions on new data.

This notebook demonstrates how to interact with LLM endpoints using SageMakerLLM, unlocking additional llamaIndex features. So, It is assumed that an LLM is deployed on a SageMaker endpoint.

Setting Up

If you’re opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

python

%pip install llama-index-llms-sagemaker-endpoint

python

! pip install llama-index

You have to specify the endpoint name to interact with.

python

ENDPOINT_NAME = "<-YOUR-ENDPOINT-NAME->"

Credentials should be provided to connect to the endpoint. You can either:

use an AWS profile by specifying the profile_name parameter, if not specified, the default credential profile will be used.
Pass credentials as parameters (aws_access_key_id, aws_secret_access_key, aws_session_token, region_name).

for more details check this link.

AWS profile name

python

from llama_index.llms.sagemaker_endpoint import SageMakerLLM

AWS_ACCESS_KEY_ID = "<-YOUR-AWS-ACCESS-KEY-ID->"
AWS_SECRET_ACCESS_KEY = "<-YOUR-AWS-SECRET-ACCESS-KEY->"
AWS_SESSION_TOKEN = "<-YOUR-AWS-SESSION-TOKEN->"
REGION_NAME = "<-YOUR-ENDPOINT-REGION-NAME->"

python

llm = SageMakerLLM(
    endpoint_name=ENDPOINT_NAME,
    aws_access_key_id=AWS_ACCESS_KEY_ID,
    aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
    aws_session_token=AWS_SESSION_TOKEN,
    region_name=REGION_NAME,
)

With credentials:

python

from llama_index.llms.sagemaker_endpoint import SageMakerLLM

ENDPOINT_NAME = "<-YOUR-ENDPOINT-NAME->"
PROFILE_NAME = "<-YOUR-PROFILE-NAME->"
llm = SageMakerLLM(
    endpoint_name=ENDPOINT_NAME, profile_name=PROFILE_NAME
)  # Omit the profile name to use the default profile

Basic Usage

Call `complete` with a prompt

python

resp = llm.complete(
    "Paul Graham is ", formatted=True
)  # formatted=True to avoid adding system prompt
print(resp)

Call `chat` with a list of messages

python

from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = llm.chat(messages)

python

print(resp)

Streaming

Using `stream_complete` endpoint

python

resp = llm.stream_complete("Paul Graham is ", formatted=True)

python

for r in resp:
    print(r.delta)

Using `stream_chat` endpoint

python

from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = llm.stream_chat(messages)

python

for r in resp:
    print(r.delta, end="")

Configure Model

SageMakerLLM is an abstraction for interacting with different language models (LLM) deployed in Amazon SageMaker. All the default parameters are compatible with the Llama 2 model. Therefore, if you are using a different model, you will likely need to set the following parameters:

messages_to_prompt: A callable that accepts a list of ChatMessage objects and, if not specified in the message, a system prompt. It should return a string containing the messages in the endpoint LLM-compatible format.
completion_to_prompt: A callable that accepts a completion string with a system prompt and returns a string in the endpoint LLM-compatible format.
content_handler: A class that inherits from llama_index.llms.sagemaker_llm_endpoint_utils.BaseIOHandler and implements the following methods: serialize_input, deserialize_output, deserialize_streaming_output, and remove_prefix.

Interacting with LLM deployed in Amazon SageMaker Endpoint with LlamaIndex

Interacting with LLM deployed in Amazon SageMaker Endpoint with LlamaIndex

Setting Up

Basic Usage

Call complete with a prompt

Call chat with a list of messages

Streaming

Using stream_complete endpoint

Using stream_chat endpoint

Configure Model

Call `complete` with a prompt

Call `chat` with a list of messages

Using `stream_complete` endpoint

Using `stream_chat` endpoint