import { Callout, Steps, Tabs, Cards } from "@/docs/components";

Voice Cloning

Create custom AI voices and generate speech with ElevenLabs integration.

Overview

Voice cloning on elizaOS Cloud enables you to:

Clone voices: Create AI replicas of any voice
Generate speech: Convert text to natural-sounding audio
Custom voices: Use cloned voices in your agents
Multi-language: Support for 29+ languages

Quick Start

Dashboard

Navigate to Dashboard → Voices for the visual interface.

API

bash

# Clone a voice from audio samples
curl -X POST "https://elizacloud.ai/api/elevenlabs/voices/clone" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "name=My Voice Clone" \
  -F "[email protected]" \
  -F "[email protected]"

</Tabs.Tab> <Tabs.Tab>

bash

# Generate speech
curl -X POST "https://elizacloud.ai/api/elevenlabs/tts" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello, this is a test of voice synthesis.",
    "voice_id": "voice_abc123",
    "model_id": "eleven_multilingual_v2"
  }' \
  --output speech.mp3

</Tabs.Tab> <Tabs.Tab>

bash

# List your voices
curl -X GET "https://elizacloud.ai/api/elevenlabs/voices/user" \
  -H "Authorization: Bearer YOUR_API_KEY"

</Tabs.Tab> </Tabs>

Voice Cloning

<Steps> ### Prepare Audio Samples Gather 1-3 minutes of clean audio from the target voice.

Requirements:

Clear speech without background noise
Single speaker only
High quality (WAV or MP3, 44.1kHz+)

Upload Samples

Upload audio files via dashboard or API.

Create Clone

Submit the cloning request and wait for processing.

Verify Quality

Test the cloned voice with sample text.

</Steps>

Sample Requirements

Requirement	Recommendation
Duration	1-3 minutes total
Format	WAV, MP3, M4A
Quality	44.1kHz, 16-bit minimum
Content	Natural speech, varied intonation
Noise	Minimal background noise

<Callout type="warning"> Using someone's voice without permission may violate their rights. Only clone voices you have rights to use. </Callout>

Text-to-Speech

Generate Speech

javascript

const response = await fetch("https://elizacloud.ai/api/elevenlabs/tts", {
  method: "POST",
  headers: {
    Authorization: "Bearer YOUR_API_KEY",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    text: "Welcome to elizaOS Cloud!",
    voice_id: "voice_abc123",
    model_id: "eleven_multilingual_v2",
    voice_settings: {
      stability: 0.5,
      similarity_boost: 0.75,
    },
  }),
});

const audioBlob = await response.blob();
const audioUrl = URL.createObjectURL(audioBlob);

Voice Settings

Setting	Range	Description
`stability`	0-1	Higher = more consistent, lower = more expressive
`similarity_boost`	0-1	How closely to match the original voice
`style`	0-1	Style exaggeration (v2 models only)
`use_speaker_boost`	bool	Enhance speaker similarity

Available Models

Model	Languages	Quality	Speed
`eleven_multilingual_v2`	29	Highest	Medium
`eleven_monolingual_v1`	English	High	Fast
`eleven_turbo_v2`	English	Good	Fastest

Pre-built Voices

elizaOS Cloud provides pre-built voices:

bash

curl -X GET "https://elizacloud.ai/api/elevenlabs/voices" \
  -H "Authorization: Bearer YOUR_API_KEY"

json

{
  "voices": [
    {
      "voice_id": "21m00Tcm4TlvDq8ikWAM",
      "name": "Rachel",
      "labels": { "accent": "american", "age": "young" },
      "preview_url": "https://..."
    },
    {
      "voice_id": "AZnzlk1XvdvUeBnXmlld",
      "name": "Domi",
      "labels": { "accent": "american", "age": "young" }
    }
  ]
}

Voice Management

Get Voice Details

bash

curl -X GET "https://elizacloud.ai/api/elevenlabs/voices/voice_abc123" \
  -H "Authorization: Bearer YOUR_API_KEY"

Delete Voice

bash

curl -X DELETE "https://elizacloud.ai/api/elevenlabs/voices/voice_abc123" \
  -H "Authorization: Bearer YOUR_API_KEY"

Check Clone Status

bash

curl -X GET "https://elizacloud.ai/api/elevenlabs/voices/jobs" \
  -H "Authorization: Bearer YOUR_API_KEY"

Agent Integration

Use cloned voices with your agents:

json

{
  "name": "Voice Assistant",
  "bio": ["Helpful AI assistant with custom voice"],
  "settings": {
    "voice": {
      "provider": "elevenlabs",
      "voiceId": "voice_abc123",
      "model": "eleven_multilingual_v2"
    }
  }
}

Speech-to-Text

Convert audio to text:

bash

curl -X POST "https://elizacloud.ai/api/elevenlabs/stt" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "[email protected]"

json

{
  "text": "This is the transcribed text from the audio.",
  "confidence": 0.95,
  "words": [
    { "word": "This", "start": 0.0, "end": 0.2, "confidence": 0.98 },
    { "word": "is", "start": 0.2, "end": 0.3, "confidence": 0.99 }
  ]
}

Pricing

See Billing & Credits and the dashboard/API Explorer for current voice pricing. Voice cloning and TTS/STT billing can vary by provider configuration.

<Callout type="info"> Monitor your voice usage in the [billing dashboard](/dashboard/billing). </Callout>

Best Practices

Quality Samples — Use high-quality, noise-free audio (44.1kHz+, minimal background noise)
Natural Speech — Include varied intonation, pacing, and emotional range in samples
Sufficient Length — Provide 1-3 minutes of audio for best clone quality
Test Thoroughly — Verify clone quality with diverse text before production use

Next Steps

<Cards> <Cards.Card title="AI Agents" href="/docs/agents"> Add voice to your agents </Cards.Card> <Cards.Card title="Video Generation" href="/docs/video-generation"> Add voiceovers to videos </Cards.Card> <Cards.Card title="API Reference" href="/docs/api"> Complete API documentation </Cards.Card> </Cards>