docs/examples/llm/azure_openai.ipynb
<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/llm/azure_openai.ipynb" target="_parent"></a>
If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.
%pip install llama-index-llms-azure-openai
!pip install llama-index
You can find more details in this guide.
Note down the "model name" and "deployment name", you'll need it when connecting to your LLM.
To find the setup information necessary, do the following setups:
from IPython.display import Image
Image(filename="./azure_playground.png")
api_type, api_base, api_version, engine (this should be the same as the "deployment name" from before), and the keyfrom IPython.display import Image
Image(filename="./azure_env.png")
Using Azure deployment of OpenAI models is very similar to normal OpenAI. You just need to configure a couple more environment variables.
OPENAI_API_VERSION: set this to 2023-07-01-preview
This may change in the future.AZURE_OPENAI_ENDPOINT: your endpoint should look like the following
https://YOUR_RESOURCE_NAME.openai.azure.com/AZURE_OPENAI_API_KEY: your API keyimport os
os.environ["AZURE_OPENAI_API_KEY"] = "<your-api-key>"
os.environ[
"AZURE_OPENAI_ENDPOINT"
] = "https://<your-resource-name>.openai.azure.com/"
os.environ["OPENAI_API_VERSION"] = "2023-07-01-preview"
from llama_index.llms.azure_openai import AzureOpenAI
Unlike normal OpenAI, you need to pass a engine argument in addition to model. The engine is the name of your model deployment you selected in Azure OpenAI Studio. See previous section on "find your setup information" for more details.
llm = AzureOpenAI(
engine="simon-llm", model="gpt-35-turbo-16k", temperature=0.0
)
Alternatively, you can also skip setting environment variables, and pass the parameters in directly via constructor.
llm = AzureOpenAI(
engine="my-custom-llm",
model="gpt-35-turbo-16k",
temperature=0.0,
azure_endpoint="https://<your-resource-name>.openai.azure.com/",
api_key="<your-api-key>",
api_version="2023-07-01-preview",
)
Use the complete endpoint for text completion
response = llm.complete("The sky is a beautiful blue and")
print(response)
response = llm.stream_complete("The sky is a beautiful blue and")
for r in response:
print(r.delta, end="")
Use the chat endpoint for conversation
from llama_index.core.llms import ChatMessage
messages = [
ChatMessage(
role="system", content="You are a pirate with colorful personality."
),
ChatMessage(role="user", content="Hello"),
]
response = llm.chat(messages)
print(response)
response = llm.stream_chat(messages)
for r in response:
print(r.delta, end="")
Rather than adding same parameters to each chat or completion call, you can set them at a per-instance level with additional_kwargs.
llm = AzureOpenAI(
engine="simon-llm",
model="gpt-35-turbo-16k",
temperature=0.0,
additional_kwargs={"user": "your_user_id"},
)