Back to Llama Index

Interacting with LLM deployed in Amazon SageMaker Endpoint with LlamaIndex

docs/examples/llm/sagemaker_endpoint_llm.ipynb

0.14.214.2 KB
Original Source

<a href="https://colab.research.google.com/drive/104BZb4U1KLzOYArnCzhqN-CqJWZDeqhz?usp=sharing" target="_parent"></a>

Interacting with LLM deployed in Amazon SageMaker Endpoint with LlamaIndex

An Amazon SageMaker endpoint is a fully managed resource that enables the deployment of machine learning models, specifically LLM (Large Language Models), for making predictions on new data.

This notebook demonstrates how to interact with LLM endpoints using SageMakerLLM, unlocking additional llamaIndex features. So, It is assumed that an LLM is deployed on a SageMaker endpoint.

Setting Up

If you’re opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

python
%pip install llama-index-llms-sagemaker-endpoint
python
! pip install llama-index

You have to specify the endpoint name to interact with.

python
ENDPOINT_NAME = "<-YOUR-ENDPOINT-NAME->"

Credentials should be provided to connect to the endpoint. You can either:

  • use an AWS profile by specifying the profile_name parameter, if not specified, the default credential profile will be used.
  • Pass credentials as parameters (aws_access_key_id, aws_secret_access_key, aws_session_token, region_name).

for more details check this link.

AWS profile name

python
from llama_index.llms.sagemaker_endpoint import SageMakerLLM

AWS_ACCESS_KEY_ID = "<-YOUR-AWS-ACCESS-KEY-ID->"
AWS_SECRET_ACCESS_KEY = "<-YOUR-AWS-SECRET-ACCESS-KEY->"
AWS_SESSION_TOKEN = "<-YOUR-AWS-SESSION-TOKEN->"
REGION_NAME = "<-YOUR-ENDPOINT-REGION-NAME->"
python
llm = SageMakerLLM(
    endpoint_name=ENDPOINT_NAME,
    aws_access_key_id=AWS_ACCESS_KEY_ID,
    aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
    aws_session_token=AWS_SESSION_TOKEN,
    region_name=REGION_NAME,
)

With credentials:

python
from llama_index.llms.sagemaker_endpoint import SageMakerLLM

ENDPOINT_NAME = "<-YOUR-ENDPOINT-NAME->"
PROFILE_NAME = "<-YOUR-PROFILE-NAME->"
llm = SageMakerLLM(
    endpoint_name=ENDPOINT_NAME, profile_name=PROFILE_NAME
)  # Omit the profile name to use the default profile

Basic Usage

Call complete with a prompt

python
resp = llm.complete(
    "Paul Graham is ", formatted=True
)  # formatted=True to avoid adding system prompt
print(resp)

Call chat with a list of messages

python
from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = llm.chat(messages)
python
print(resp)

Streaming

Using stream_complete endpoint

python
resp = llm.stream_complete("Paul Graham is ", formatted=True)
python
for r in resp:
    print(r.delta)

Using stream_chat endpoint

python
from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = llm.stream_chat(messages)
python
for r in resp:
    print(r.delta, end="")

Configure Model

SageMakerLLM is an abstraction for interacting with different language models (LLM) deployed in Amazon SageMaker. All the default parameters are compatible with the Llama 2 model. Therefore, if you are using a different model, you will likely need to set the following parameters:

  • messages_to_prompt: A callable that accepts a list of ChatMessage objects and, if not specified in the message, a system prompt. It should return a string containing the messages in the endpoint LLM-compatible format.

  • completion_to_prompt: A callable that accepts a completion string with a system prompt and returns a string in the endpoint LLM-compatible format.

  • content_handler: A class that inherits from llama_index.llms.sagemaker_llm_endpoint_utils.BaseIOHandler and implements the following methods: serialize_input, deserialize_output, deserialize_streaming_output, and remove_prefix.