llama-index-integrations/llms/llama-index-llms-ibm/README.md
This package integrates the LlamaIndex LLMs API with the IBM watsonx.ai Foundation Models API by leveraging ibm-watsonx-ai SDK. With this integration, you can use one of the models that are available in IBM watsonx.ai to perform model inference.
pip install llama-index-llms-ibm
To use IBM's models, you must have an IBM Cloud user API key. Here's how to obtain and set up your API key:
import os
from getpass import getpass
watsonx_api_key = getpass()
os.environ["WATSONX_APIKEY"] = watsonx_api_key
Alternatively, you can set the environment variable in your terminal.
Linux/macOS: Open your terminal and execute the following command:
export WATSONX_APIKEY='your_ibm_api_key'
To make this environment variable persistent across terminal sessions, add the above line to your ~/.bashrc, ~/.bash_profile, or ~/.zshrc file.
Windows: For Command Prompt, use:
set WATSONX_APIKEY=your_ibm_api_key
You might need to adjust model parameters for different models or tasks. For more details on parameters, see Available MetaNames.
temperature = 0.5
max_new_tokens = 50
additional_params = {
"min_new_tokens": 1,
"top_k": 50,
}
Initialize the WatsonxLLM class with the previously set parameters.
from llama_index.llms.ibm import WatsonxLLM
watsonx_llm = WatsonxLLM(
model_id="ibm/granite-4-h-small",
url="PASTE_YOUR_URL_HERE",
project_id="PASTE_YOUR_PROJECT_ID_HERE",
temperature=temperature,
max_new_tokens=max_new_tokens,
additional_params=additional_params,
)
Note:
project_id or space_id. To get your project or space ID, open your project or space, go to the Manage tab, and click General. For more information see: Project documentation or Deployment space documentation.model_id. You can find the list of available models in Supported foundation models.Alternatively, you can use Cloud Pak for Data credentials. For more details, refer to watsonx.ai software setup.
watsonx_llm = WatsonxLLM(
model_id="ibm/granite-4-h-small",
url="PASTE YOUR URL HERE",
username="PASTE_YOUR_USERNAME_HERE",
password="PASTE_YOUR_PASSWORD_HERE",
instance_id="openshift",
version="5.2",
project_id="PASTE_YOUR_PROJECT_ID_HERE",
temperature=temperature,
max_new_tokens=max_new_tokens,
additional_params=additional_params,
)
Below is an example that shows how to call the model directly using a string type prompt:
response = watsonx_llm.complete("What is a Generative AI?")
print(response)
chat with a list of messagesTo create chat completions by providing a list of messages, use the following code:
from llama_index.core.llms import ChatMessage
messages = [
ChatMessage(role="system", content="You are an AI assistant"),
ChatMessage(role="user", content="Who are you?"),
]
response = watsonx_llm.chat(messages)
print(response)
To stream the model output, use the following code:
for chunk in watsonx_llm.stream_complete(
"Describe your favorite city and why it is your favorite."
):
print(chunk.delta, end="")
Similarly, to stream the chat completions, use the following code:
messages = [
ChatMessage(role="system", content="You are an AI assistant"),
ChatMessage(role="user", content="Who are you?"),
]
for chunk in watsonx_llm.stream_chat(messages):
print(chunk.delta, end="")