Sessions API
Sessions connect agents to real-time voice channels (phone calls via Twilio, or direct WebSocket connections).
Create Twilio Session
Generate a signed relay context and TwiML for connecting a Twilio call to an agent.
POST /v1/sessions/twilioRequest Body
{
"agentId": "agent_abc123",
"from": "+18005551234",
"to": "+19005551234",
"instanceId": "inst_xyz"
}| Field | Type | Required | Description |
|---|---|---|---|
agentId | string | yes | Agent to connect |
from | string | yes | Caller phone number |
to | string | yes | Destination phone number |
instanceId | string | no | Agent instance for personalized context |
Response 200
Returns TwiML and session metadata for Twilio to connect the caller to the agent's WebSocket stream.
Twilio Media Stream
WS /v1/sessions/twilio/streamWebSocket endpoint for Twilio media streams. Twilio connects here after receiving TwiML from the session creation endpoint.
The WebSocket handles:
- Bidirectional audio streaming (mulaw 8kHz)
- Real-time STT transcription
- LLM inference
- TTS synthesis and playback
- Tool execution
- Barge-in / interruption detection
Authentication
Twilio strips query parameters from <Stream> WebSocket URLs. Authentication tokens are passed as <Parameter> elements in TwiML and read from customParameters in the WebSocket start message.
Message Flow
- Twilio sends
startmessage with session metadata - Twilio streams
mediamessages (base64-encoded mulaw audio) - Server sends
mediamessages back (synthesized speech) - Server may send
markmessages for synchronization - Session ends with
stopmessage
Agent WebSocket Session
Direct WebSocket connection to an agent, bypassing Twilio.
WS /v1/agents/:id/sessionConnect directly from a browser or custom client. Useful for web-based voice interfaces without phone integration.
Task Callbacks
POST /v1/sessions/twilio/task/:taskId/status
POST /v1/sessions/twilio/task/:taskId/connectInternal callback endpoints used by the task dispatcher for outbound calls. These are called by Twilio, not by API consumers.
Architecture
Phone Call Flow:
Caller -> Twilio -> POST /v1/sessions/twilio (TwiML)
-> WS /v1/sessions/twilio/stream
-> TwilioRuntimeSession
-> STT -> Tools -> LLM -> TTS -> Caller
Direct WebSocket Flow:
Browser -> WS /v1/agents/:id/session
-> AgentRuntime
-> STT -> LLM -> TTS -> BrowserLatency
- Target first response: < 1.5 seconds
- Steady-state turn latency: < 900ms
- Cartesia TTS TTFB: ~40ms
- Full pipeline includes STT + LLM + TTS