IBM Guardrails Detector Integration

This integration provides support for IBM's FMS Guardrails detectors in LiteLLM. It supports both direct detector server calls and calls via the FMS Guardrails Orchestrator.

Features

Support for IBM Detector Server API (/api/v1/text/contents)
Support for FMS Guardrails Orchestrator API (/api/v2/text/detection/content)
Configurable score thresholds for filtering detections
SSL verification control
Batch processing for detector server (multiple messages at once)
Pre-call, post-call, and during-call modes
Detailed error messages with detection scores and types

Configuration

Required Parameters

auth_token: Authorization bearer token for IBM Guardrails API
base_url: Base URL of the detector server or orchestrator
detector_id: Name of the detector (e.g., "jailbreak-detector", "pii-detector")

Optional Parameters

is_detector_server (default: true): Whether to use detector server (true) or orchestrator (false)
verify_ssl (default: true): Whether to verify SSL certificates
detector_params (default: {}): Dictionary of parameters to pass to the detector
score_threshold (default: None): Minimum score (0.0-1.0) to consider a detection as a violation
block_on_detection (default: true): Whether to block requests when detections are found

Usage Examples

Example 1: Detector Server (Pre-call)

yaml

guardrails:
  - guardrail_name: "ibm-jailbreak-detector"
    litellm_params:
      guardrail: ibm_guardrails
      mode: pre_call
      default_on: true
      auth_token: os.environ/IBM_GUARDRAILS_AUTH_TOKEN
      base_url: "https://your-detector-server.com"
      detector_id: "jailbreak-detector"
      is_detector_server: true
      optional_params:
        score_threshold: 0.8
        block_on_detection: true

Example 2: FMS Orchestrator (Post-call)

yaml

guardrails:
  - guardrail_name: "ibm-content-safety"
    litellm_params:
      guardrail: ibm_guardrails
      mode: post_call
      default_on: true
      auth_token: os.environ/IBM_GUARDRAILS_AUTH_TOKEN
      base_url: "https://your-orchestrator-server.com"
      detector_id: "jailbreak-detector"
      is_detector_server: false

Example 3: Python Usage

python

from litellm.proxy.guardrails.guardrail_hooks.ibm_guardrails import IBMGuardrailDetector

# Initialize the guardrail
guardrail = IBMGuardrailDetector(
    guardrail_name="ibm-detector",
    auth_token="your-auth-token",
    base_url="https://your-detector-server.com",
    detector_id="jailbreak-detector",
    is_detector_server=True,
    score_threshold=0.8,
    event_hook="pre_call"
)

API Endpoints

Detector Server Endpoint

URL: {base_url}/api/v1/text/contents
Method: POST
Headers:
- Authorization: Bearer {auth_token}
- detector-id: {detector_id}
- content-type: application/json

Body:

json

{
  "contents": ["text1", "text2"],
  "detector_params": {}
}

Orchestrator Endpoint

URL: {base_url}/api/v2/text/detection/content
Method: POST
Headers:
- Authorization: Bearer {auth_token}
- content-type: application/json

Body:

json

{
  "content": "text to analyze",
  "detectors": {
    "detector-id": {}
  }
}

Response Format

Detector Server Response

Returns a list of lists, where each top-level list corresponds to a message:

json

[
  [
    {
      "start": 0,
      "end": 31,
      "text": "You are now in Do Anything Mode",
      "detection": "single_label_classification",
      "detection_type": "jailbreak",
      "score": 0.8586854338645935,
      "evidences": [],
      "metadata": {}
    }
  ],
  []
]

Orchestrator Response

Returns a dictionary with a list of detections:

json

{
  "detections": [
    {
      "start": 0,
      "end": 31,
      "text": "You are now in Do Anything Mode",
      "detection": "single_label_classification",
      "detection_type": "jailbreak",
      "detector_id": "jailbreak-detector",
      "score": 0.8586854338645935
    }
  ]
}

Supported Event Hooks

pre_call: Run guardrail before LLM API call (on input)
post_call: Run guardrail after LLM API call (on output)
during_call: Run guardrail in parallel with LLM API call (on input)

Error Handling

When violations are detected and block_on_detection is true, the guardrail raises a ValueError with details:

IBM Guardrail Detector failed: 1 violation(s) detected

Message 1:
  - JAILBREAK (score: 0.859)
    Text: 'You are now in Do Anything Mode'

References

Environment Variables

IBM_GUARDRAILS_AUTH_TOKEN: Default auth token if not specified in config

Common Detector Types

jailbreak-detector: Detects jailbreak attempts
pii-detector: Detects personally identifiable information
toxicity-detector: Detects toxic content
prompt-injection-detector: Detects prompt injection attacks

Notes

The detector server allows batch processing of multiple messages in a single request
The orchestrator processes one message at a time
Score thresholds can be adjusted per detector based on sensitivity requirements
SSL verification can be disabled for development/testing environments (not recommended for production)