Voice Cloning
Clone a voice from an audio sample and use it for TTS across any supported provider.
Clone a Voice
Upload an audio sample (up to 25MB) to create a custom voice:
bash
curl -X POST https://persona-labsvoice-api-production.up.railway.app/v1/voices/clone \
-H "Authorization: Bearer $PH0NY_API_KEY" \
-F "file=@voice-sample.wav" \
-F "name=My Custom Voice" \
-F "description=Cloned from interview recording"Response:
json
{
"id": "voice_abc123",
"name": "My Custom Voice",
"description": "Cloned from interview recording",
"isCustom": true,
"createdAt": "2026-03-21T00:00:00.000Z"
}TIP
For best results, use a clear audio sample of 30 seconds to 5 minutes with minimal background noise. WAV or MP3 formats work best.
Use a Cloned Voice
Reference the voice ID when creating or updating an agent:
bash
curl -X POST https://persona-labsvoice-api-production.up.railway.app/v1/agents \
-H "Authorization: Bearer $PH0NY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Custom Voice Agent",
"systemPrompt": "You are a helpful assistant.",
"voiceId": "voice_abc123",
"ttsProvider": "cartesia"
}'Or use it directly with the synthesize endpoint:
bash
curl -X POST https://persona-labsvoice-api-production.up.railway.app/v1/synthesize \
-H "Authorization: Bearer $PH0NY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "Hello, this is my cloned voice speaking.",
"voiceId": "voice_abc123",
"format": "mp3"
}'List Voices
bash
curl https://persona-labsvoice-api-production.up.railway.app/v1/voices \
-H "Authorization: Bearer $PH0NY_API_KEY"Returns both built-in and custom cloned voices.
Delete a Voice
bash
curl -X DELETE https://persona-labsvoice-api-production.up.railway.app/v1/voices/voice_abc123 \
-H "Authorization: Bearer $PH0NY_API_KEY"Supported Providers
Cloned voices can be used with the following TTS providers:
| Provider | Quality | Latency | Notes |
|---|---|---|---|
| Cartesia | High | ~40ms TTFB | Recommended for production |
| ElevenLabs | High | ~200ms TTFB | Wide language support |
| Fish Audio | Good | ~100ms TTFB | Cost-effective |
| Resemble AI | High | ~150ms TTFB | Enterprise features |
| Pocket TTS | Basic | ~13-30s TTFB | Free tier fallback |
Voice IDs are stored per-provider. When you clone a voice, it may be registered with multiple providers depending on your configuration.