docs/versioned_docs/version-1.9.0/Components/guardrails.mdx
import Icon from "@site/src/components/icon"; import PartialParams from '@site/docs/_partial-hidden-params.mdx';
The Guardrails component validates input text against security and safety guardrails by issuing prompts to a language model (LLM) to check for violations.
The following guardrails can be validated against:
When validation passes, the input continues through the Pass output. When validation fails, the input is blocked and sent through the Fail output with a justification explaining why it failed.
The Jailbreak and Prompt Injection guardrails include additional heuristic detection first, and then fall back to LLM validation if needed. This additional stage identifies obvious patterns quickly and reduces API costs by avoiding unnecessary LLM calls for clear violations.
The Guardrails component uses a language model to analyze input and can produce false positives or miss some violations. Use this component in addition to other data-sanitization best practices, such as personnel training and scripts that check for literal values or regex patterns, rather than as a sole safeguard.
Use the Enable Custom Guardrail parameter to create your own, specific guardrail validations. In the Custom Guardrail Description* field, enter a natural language guardrail description of disallowed data that you want to detect.
Custom guardrails can work simultaneously with the built-in guardrails, and follow the same validation process.
For example, to block inputs that mention competitor names or products, enter the following in the Custom Guardrail Description field:
competitor company names, competitor product names, or references to competing services
When this custom guardrail is enabled, the LLM analyzes the input text against your criteria. If it detects content matching your description, such as mentions of competitors, validation fails and the input is blocked. Otherwise, validation passes and the input continues through the Pass output.
| Name | Type | Description |
|---|---|---|
Language Model (model) | LanguageModel | Input parameter. Connect a Language Model component to use as the driver for this component. The model reviews the data, compares it against the guardrails, and determines if any data is in violation of the guardrails. |
API Key (api_key) | Secret String | Input parameter. Model provider API key. Required if the model provider needs authentication. |
Guardrails (enabled_guardrails) | Multiselect | Input parameter. Select one or more security guardrails to validate the input against. Options: PII, Tokens/Passwords, Jailbreak, Offensive Content, Malicious Code, Prompt Injection. Default: ["PII", "Tokens/Passwords", "Jailbreak"]. |
Input Text (input_text) | Multiline String | Input parameter. The text to validate against guardrails. Accepts Message input types. |
Enable Custom Guardrail (enable_custom_guardrail) | Boolean | Input parameter. Enable a custom guardrail with your own validation criteria. Default: false. |
Custom Guardrail Description (custom_guardrail_explanation) | Multiline String | Input parameter. Describe what the custom guardrail should check for. This description is used by the LLM to validate the input. Be specific and clear about what you want to detect. Only used when enable_custom_guardrail is true. |
Heuristic Detection Threshold (heuristic_threshold) | Slider | Input parameter. Score threshold (0.0-1.0) for heuristic jailbreak/prompt injection detection. Strong patterns such as "ignore instructions" and "jailbreak" have high weights, while weak patterns such as "bypass" and "act as" have low weights. If the cumulative score meets or exceeds this threshold, the input fails immediately. Lower values are more strict. Higher values defer more cases to LLM validation. Default: 0.7. |