WatsonX

IBM WatsonX offers a range of enterprise-grade foundation models optimized for various business use cases. This provider supports text generation and chat models from the Granite and Llama series, along with additional models for code generation and multilingual tasks.

Supported Models

IBM watsonx.ai provides foundation models through its inference API. The promptfoo WatsonX provider currently supports text generation and chat models that can be called directly via API.

:::tip Finding Available Models

To see the latest models available in your region, use IBM's API or review IBM's supported foundation models:

bash

curl "https://us-south.ml.cloud.ibm.com/ml/v1/foundation_model_specs?version=2024-05-01" \
  -H "Authorization: Bearer YOUR_TOKEN"

:::

Currently Available Models

The following are representative ready-to-use models that IBM currently provides for direct inferencing through the text generation or chat APIs:

IBM Granite

ibm/granite-4-h-small - Latest ready-to-use Granite text model
ibm/granite-3-8b-instruct - Older instruct model (deprecated)
ibm/granite-8b-code-instruct - Code generation specialist

Meta Llama

meta-llama/llama-4-maverick-17b-128e-instruct-fp8 - Latest Llama 4 model
meta-llama/llama-3-3-70b-instruct - Latest Llama 3.3 (70B)

Mistral

mistralai/mistral-large-2512 - Latest ready-to-use Mistral Large model
mistralai/mistral-medium-2505 - Mid-tier model
mistralai/mistral-small-3-1-24b-instruct-2503 - Smaller instruct model

Other Models

openai/gpt-oss-120b - Open-source GPT-compatible model
sdaia/allam-1-13b-instruct - Arabic and English instruct model

Other Model Types

IBM watsonx.ai also offers:

Deploy on Demand Models - Curated models that require creating a dedicated deployment first
Embedding Models - For generating text embeddings (e.g., ibm/granite-embedding-278m-multilingual)
Reranker Models - For improving search results (e.g., cross-encoder/ms-marco-minilm-l-12-v2)
Vision and Guardrail Models - Models with APIs or payloads that differ from the provider's current text/chat workflow

:::info Additional Model Types Not Currently Supported

The promptfoo WatsonX provider focuses on text generation and chat models only. Deploy on Demand, embedding, and reranker models use different API endpoints and workflows. For these model types, use IBM's API directly or create a custom provider.

:::

:::note Model Availability

Region-specific: Model availability varies by IBM Cloud region
Version changes: IBM regularly updates available models
Deprecation: Models marked "deprecated" will be removed in future releases

Always verify current availability using IBM's API or check your watsonx.ai project's model catalog.

:::

Prerequisites

Before integrating the WatsonX provider, ensure you have the following:

IBM Cloud Account: You will need an IBM Cloud account to obtain API access to WatsonX models.
API Key or Bearer Token, and Project ID:
- API Key: You can retrieve your API key by logging in to your IBM Cloud Account and navigating to the "API Keys" section.
- Bearer Token: To obtain a bearer token, follow this guide.
- Project ID: To find your Project ID, log in to IBM WatsonX Prompt Lab, select your project, and locate the project ID in the provided curl command.

Make sure you have either the API key or bearer token, along with the project ID, before proceeding.

Installation

To install the WatsonX provider, use the following steps:

Install the necessary dependencies:

npm install @ibm-cloud/watsonx-ai ibm-cloud-sdk-core

Set up the necessary environment variables:

You can choose between two authentication methods:

Option 1: IAM Authentication (Recommended)
sh
```
export WATSONX_AI_APIKEY=your-ibm-cloud-api-key
export WATSONX_AI_PROJECT_ID=your-project-id
```
Option 2: Bearer Token Authentication
sh
```
export WATSONX_AI_BEARER_TOKEN=your-bearer-token
export WATSONX_AI_PROJECT_ID=your-project-id
```
Force Specific Auth Method (Optional)
sh
```
export WATSONX_AI_AUTH_TYPE=iam  # or 'bearertoken'
```
:::note Authentication Priority

If WATSONX_AI_AUTH_TYPE is not set, the provider will automatically use:
1. IAM authentication if WATSONX_AI_APIKEY is available
2. Bearer token authentication if WATSONX_AI_BEARER_TOKEN is available
:::

Alternatively, you can configure the authentication and project ID directly in the configuration file:

yaml

providers:
  - id: watsonx:ibm/granite-4-h-small
    config:
      # Option 1: IAM Authentication
      apiKey: your-ibm-cloud-api-key

      # Option 2: Bearer Token Authentication
      # apiBearerToken: your-ibm-cloud-bearer-token

      projectId: your-ibm-project-id
      serviceUrl: https://us-south.ml.cloud.ibm.com

Usage Examples

Once configured, you can use the WatsonX provider to generate text responses based on prompts. Here's an example using the Granite 4 H Small model:

yaml

providers:
  - watsonx:ibm/granite-4-h-small

prompts:
  - "Answer the following question: '{{question}}'"

tests:
  - vars:
      question: 'What is the capital of France?'
    assert:
      - type: contains
        value: 'Paris'

You can also use other models by changing the model ID:

yaml

providers:
  # IBM Granite models
  - watsonx:ibm/granite-4-h-small
  - watsonx:ibm/granite-8b-code-instruct

  # Meta Llama models
  - watsonx:meta-llama/llama-3-3-70b-instruct
  - watsonx:meta-llama/llama-4-maverick-17b-128e-instruct-fp8

  # Mistral models
  - watsonx:mistralai/mistral-large-2512
  - watsonx:mistralai/mistral-medium-2505

Configuration Options

Text Generation Parameters

The WatsonX provider supports the full range of text generation parameters from the IBM SDK:

Parameter	Type	Description
`maxNewTokens`	number	Maximum tokens to generate (default: 100)
`minNewTokens`	number	Minimum tokens before stop sequences apply
`temperature`	number	Sampling temperature (0-2)
`topP`	number	Nucleus sampling parameter (0-1)
`topK`	number	Top-k sampling parameter
`decodingMethod`	string	`'greedy'` or `'sample'`
`stopSequences`	string[]	Sequences that cause generation to stop
`repetitionPenalty`	number	Penalty for repeated tokens
`randomSeed`	number	Seed for reproducible outputs
`timeLimit`	number	Time limit in milliseconds
`truncateInputTokens`	number	Max input tokens before truncation
`includeStopSequence`	boolean	Include stop sequence in output
`lengthPenalty`	object	Length penalty configuration

Example with Parameters

yaml

providers:
  - id: watsonx:ibm/granite-4-h-small
    config:
      temperature: 0.7
      topP: 0.9
      topK: 50
      maxNewTokens: 1024
      stopSequences: ['END', 'STOP']
      repetitionPenalty: 1.1
      decodingMethod: sample

Length Penalty

For more control over output length:

yaml

providers:
  - id: watsonx:ibm/granite-4-h-small
    config:
      lengthPenalty:
        decayFactor: 1.5
        startIndex: 10

Chat Mode

WatsonX also supports chat-style interactions using the textChat API. Use the watsonx:chat: prefix:

yaml

providers:
  - id: watsonx:chat:ibm/granite-4-h-small
    config:
      temperature: 0.7
      maxNewTokens: 1024

Chat mode automatically parses messages in JSON format:

yaml

prompts:
  - |
    [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "{{question}}"}
    ]

providers:
  - watsonx:chat:ibm/granite-4-h-small

For plain text prompts, the chat provider automatically wraps them as a user message.

Chat vs Text Generation

Feature	Text Generation (`watsonx:`)	Chat (`watsonx:chat:`)
API Method	`generateText`	`textChat`
Input Format	Plain text	Messages array or plain text
Best For	Completion tasks	Conversational applications
System Messages	Not supported	Supported

Environment Variables

Variable	Description
`WATSONX_AI_APIKEY`	IBM Cloud API key for IAM authentication
`WATSONX_AI_BEARER_TOKEN`	Bearer token for token-based authentication
`WATSONX_AI_PROJECT_ID`	WatsonX project ID
`WATSONX_AI_AUTH_TYPE`	Force auth type: `iam` or `bearertoken`

Migrating from IBM BAM

The IBM BAM provider has been deprecated (sunset March 2025). To migrate:

Change provider prefix from bam: to watsonx:
Update authentication to use WatsonX credentials
Update model IDs to WatsonX equivalents (e.g., ibm/granite-4-h-small)