Skip to content

Voice Cloning

Clone a voice from an audio sample and use it for TTS across any supported provider.

Clone a Voice

Upload an audio sample (up to 25MB) to create a custom voice:

bash
curl -X POST https://persona-labsvoice-api-production.up.railway.app/v1/voices/clone \
  -H "Authorization: Bearer $PH0NY_API_KEY" \
  -F "file=@voice-sample.wav" \
  -F "name=My Custom Voice" \
  -F "description=Cloned from interview recording"

Response:

json
{
  "id": "voice_abc123",
  "name": "My Custom Voice",
  "description": "Cloned from interview recording",
  "isCustom": true,
  "createdAt": "2026-03-21T00:00:00.000Z"
}

TIP

For best results, use a clear audio sample of 30 seconds to 5 minutes with minimal background noise. WAV or MP3 formats work best.

Use a Cloned Voice

Reference the voice ID when creating or updating an agent:

bash
curl -X POST https://persona-labsvoice-api-production.up.railway.app/v1/agents \
  -H "Authorization: Bearer $PH0NY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Custom Voice Agent",
    "systemPrompt": "You are a helpful assistant.",
    "voiceId": "voice_abc123",
    "ttsProvider": "cartesia"
  }'

Or use it directly with the synthesize endpoint:

bash
curl -X POST https://persona-labsvoice-api-production.up.railway.app/v1/synthesize \
  -H "Authorization: Bearer $PH0NY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello, this is my cloned voice speaking.",
    "voiceId": "voice_abc123",
    "format": "mp3"
  }'

List Voices

bash
curl https://persona-labsvoice-api-production.up.railway.app/v1/voices \
  -H "Authorization: Bearer $PH0NY_API_KEY"

Returns both built-in and custom cloned voices.

Delete a Voice

bash
curl -X DELETE https://persona-labsvoice-api-production.up.railway.app/v1/voices/voice_abc123 \
  -H "Authorization: Bearer $PH0NY_API_KEY"

Supported Providers

Cloned voices can be used with the following TTS providers:

ProviderQualityLatencyNotes
CartesiaHigh~40ms TTFBRecommended for production
ElevenLabsHigh~200ms TTFBWide language support
Fish AudioGood~100ms TTFBCost-effective
Resemble AIHigh~150ms TTFBEnterprise features
Pocket TTSBasic~13-30s TTFBFree tier fallback

Voice IDs are stored per-provider. When you clone a voice, it may be registered with multiple providers depending on your configuration.

Built by Persona Labs.