Back to Litellm

IBM Guardrails Detector Integration

litellm/proxy/guardrails/guardrail_hooks/ibm_guardrails/README.md

1.89.25.5 KB
Original Source

IBM Guardrails Detector Integration

This integration provides support for IBM's FMS Guardrails detectors in LiteLLM. It supports both direct detector server calls and calls via the FMS Guardrails Orchestrator.

Features

  • Support for IBM Detector Server API (/api/v1/text/contents)
  • Support for FMS Guardrails Orchestrator API (/api/v2/text/detection/content)
  • Configurable score thresholds for filtering detections
  • SSL verification control
  • Batch processing for detector server (multiple messages at once)
  • Pre-call, post-call, and during-call modes
  • Detailed error messages with detection scores and types

Configuration

Required Parameters

  • auth_token: Authorization bearer token for IBM Guardrails API
  • base_url: Base URL of the detector server or orchestrator
  • detector_id: Name of the detector (e.g., "jailbreak-detector", "pii-detector")

Optional Parameters

  • is_detector_server (default: true): Whether to use detector server (true) or orchestrator (false)
  • verify_ssl (default: true): Whether to verify SSL certificates
  • detector_params (default: {}): Dictionary of parameters to pass to the detector
  • score_threshold (default: None): Minimum score (0.0-1.0) to consider a detection as a violation
  • block_on_detection (default: true): Whether to block requests when detections are found

Usage Examples

Example 1: Detector Server (Pre-call)

yaml
guardrails:
  - guardrail_name: "ibm-jailbreak-detector"
    litellm_params:
      guardrail: ibm_guardrails
      mode: pre_call
      default_on: true
      auth_token: os.environ/IBM_GUARDRAILS_AUTH_TOKEN
      base_url: "https://your-detector-server.com"
      detector_id: "jailbreak-detector"
      is_detector_server: true
      optional_params:
        score_threshold: 0.8
        block_on_detection: true

Example 2: FMS Orchestrator (Post-call)

yaml
guardrails:
  - guardrail_name: "ibm-content-safety"
    litellm_params:
      guardrail: ibm_guardrails
      mode: post_call
      default_on: true
      auth_token: os.environ/IBM_GUARDRAILS_AUTH_TOKEN
      base_url: "https://your-orchestrator-server.com"
      detector_id: "jailbreak-detector"
      is_detector_server: false

Example 3: Python Usage

python
from litellm.proxy.guardrails.guardrail_hooks.ibm_guardrails import IBMGuardrailDetector

# Initialize the guardrail
guardrail = IBMGuardrailDetector(
    guardrail_name="ibm-detector",
    auth_token="your-auth-token",
    base_url="https://your-detector-server.com",
    detector_id="jailbreak-detector",
    is_detector_server=True,
    score_threshold=0.8,
    event_hook="pre_call"
)

API Endpoints

Detector Server Endpoint

  • URL: {base_url}/api/v1/text/contents
  • Method: POST
  • Headers:
    • Authorization: Bearer {auth_token}
    • detector-id: {detector_id}
    • content-type: application/json
  • Body:
    json
    {
      "contents": ["text1", "text2"],
      "detector_params": {}
    }
    

Orchestrator Endpoint

  • URL: {base_url}/api/v2/text/detection/content
  • Method: POST
  • Headers:
    • Authorization: Bearer {auth_token}
    • content-type: application/json
  • Body:
    json
    {
      "content": "text to analyze",
      "detectors": {
        "detector-id": {}
      }
    }
    

Response Format

Detector Server Response

Returns a list of lists, where each top-level list corresponds to a message:

json
[
  [
    {
      "start": 0,
      "end": 31,
      "text": "You are now in Do Anything Mode",
      "detection": "single_label_classification",
      "detection_type": "jailbreak",
      "score": 0.8586854338645935,
      "evidences": [],
      "metadata": {}
    }
  ],
  []
]

Orchestrator Response

Returns a dictionary with a list of detections:

json
{
  "detections": [
    {
      "start": 0,
      "end": 31,
      "text": "You are now in Do Anything Mode",
      "detection": "single_label_classification",
      "detection_type": "jailbreak",
      "detector_id": "jailbreak-detector",
      "score": 0.8586854338645935
    }
  ]
}

Supported Event Hooks

  • pre_call: Run guardrail before LLM API call (on input)
  • post_call: Run guardrail after LLM API call (on output)
  • during_call: Run guardrail in parallel with LLM API call (on input)

Error Handling

When violations are detected and block_on_detection is true, the guardrail raises a ValueError with details:

IBM Guardrail Detector failed: 1 violation(s) detected

Message 1:
  - JAILBREAK (score: 0.859)
    Text: 'You are now in Do Anything Mode'

References

Environment Variables

  • IBM_GUARDRAILS_AUTH_TOKEN: Default auth token if not specified in config

Common Detector Types

  • jailbreak-detector: Detects jailbreak attempts
  • pii-detector: Detects personally identifiable information
  • toxicity-detector: Detects toxic content
  • prompt-injection-detector: Detects prompt injection attacks

Notes

  • The detector server allows batch processing of multiple messages in a single request
  • The orchestrator processes one message at a time
  • Score thresholds can be adjusted per detector based on sensitivity requirements
  • SSL verification can be disabled for development/testing environments (not recommended for production)