docs/examples/llm/sagemaker_endpoint_llm.ipynb
<a href="https://colab.research.google.com/drive/104BZb4U1KLzOYArnCzhqN-CqJWZDeqhz?usp=sharing" target="_parent"></a>
An Amazon SageMaker endpoint is a fully managed resource that enables the deployment of machine learning models, specifically LLM (Large Language Models), for making predictions on new data.
This notebook demonstrates how to interact with LLM endpoints using SageMakerLLM, unlocking additional llamaIndex features.
So, It is assumed that an LLM is deployed on a SageMaker endpoint.
If you’re opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.
%pip install llama-index-llms-sagemaker-endpoint
! pip install llama-index
You have to specify the endpoint name to interact with.
ENDPOINT_NAME = "<-YOUR-ENDPOINT-NAME->"
Credentials should be provided to connect to the endpoint. You can either:
profile_name parameter, if not specified, the default credential profile will be used.aws_access_key_id, aws_secret_access_key, aws_session_token, region_name).for more details check this link.
AWS profile name
from llama_index.llms.sagemaker_endpoint import SageMakerLLM
AWS_ACCESS_KEY_ID = "<-YOUR-AWS-ACCESS-KEY-ID->"
AWS_SECRET_ACCESS_KEY = "<-YOUR-AWS-SECRET-ACCESS-KEY->"
AWS_SESSION_TOKEN = "<-YOUR-AWS-SESSION-TOKEN->"
REGION_NAME = "<-YOUR-ENDPOINT-REGION-NAME->"
llm = SageMakerLLM(
endpoint_name=ENDPOINT_NAME,
aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
aws_session_token=AWS_SESSION_TOKEN,
region_name=REGION_NAME,
)
With credentials:
from llama_index.llms.sagemaker_endpoint import SageMakerLLM
ENDPOINT_NAME = "<-YOUR-ENDPOINT-NAME->"
PROFILE_NAME = "<-YOUR-PROFILE-NAME->"
llm = SageMakerLLM(
endpoint_name=ENDPOINT_NAME, profile_name=PROFILE_NAME
) # Omit the profile name to use the default profile
complete with a promptresp = llm.complete(
"Paul Graham is ", formatted=True
) # formatted=True to avoid adding system prompt
print(resp)
chat with a list of messagesfrom llama_index.core.llms import ChatMessage
messages = [
ChatMessage(
role="system", content="You are a pirate with a colorful personality"
),
ChatMessage(role="user", content="What is your name"),
]
resp = llm.chat(messages)
print(resp)
stream_complete endpointresp = llm.stream_complete("Paul Graham is ", formatted=True)
for r in resp:
print(r.delta)
stream_chat endpointfrom llama_index.core.llms import ChatMessage
messages = [
ChatMessage(
role="system", content="You are a pirate with a colorful personality"
),
ChatMessage(role="user", content="What is your name"),
]
resp = llm.stream_chat(messages)
for r in resp:
print(r.delta, end="")
SageMakerLLM is an abstraction for interacting with different language models (LLM) deployed in Amazon SageMaker. All the default parameters are compatible with the Llama 2 model. Therefore, if you are using a different model, you will likely need to set the following parameters:
messages_to_prompt: A callable that accepts a list of ChatMessage objects and, if not specified in the message, a system prompt. It should return a string containing the messages in the endpoint LLM-compatible format.
completion_to_prompt: A callable that accepts a completion string with a system prompt and returns a string in the endpoint LLM-compatible format.
content_handler: A class that inherits from llama_index.llms.sagemaker_llm_endpoint_utils.BaseIOHandler and implements the following methods: serialize_input, deserialize_output, deserialize_streaming_output, and remove_prefix.