site/docs/red-team/strategies/audio.md
The Audio strategy converts prompt text into speech audio and then encodes that audio as a base64 string. This allows for testing how AI systems handle audio-encoded text, which may potentially bypass text-based content filters or lead to different behaviors than when processing plain text.
This strategy is useful for:
Use it like so in your promptfooconfig.yaml:
strategies:
- audio
Or with additional configuration:
strategies:
- id: audio
config:
language: fr # Use French audio (ISO 639-1 code)
:::warning
This strategy requires remote generation to perform the text-to-speech conversion. An active internet connection is mandatory as this functionality is implemented exclusively on the server side.
If remote generation is disabled or unavailable, the strategy will throw an error rather than fall back to any local processing.
:::
The strategy performs the following operations:
The resulting test case contains the same semantic content as the original but in a different format that may be processed differently by AI systems.
language: An ISO 639-1 language code to specify which language the text-to-speech system should use. This parameter controls the accent and pronunciation patterns of the generated audio. Defaults to 'en' (English) if not specified. Note that this parameter only changes the accent of the speech – it does not translate your text. If you provide English text with language: 'fr', you'll get English words spoken with a French accent.This strategy is worth implementing because: