Skip to content

Sessions API

Sessions connect agents to real-time voice channels (phone calls via Twilio, or direct WebSocket connections).

Create Twilio Session

Generate a signed relay context and TwiML for connecting a Twilio call to an agent.

POST /v1/sessions/twilio

Request Body

json
{
  "agentId": "agent_abc123",
  "from": "+18005551234",
  "to": "+19005551234",
  "instanceId": "inst_xyz"
}
FieldTypeRequiredDescription
agentIdstringyesAgent to connect
fromstringyesCaller phone number
tostringyesDestination phone number
instanceIdstringnoAgent instance for personalized context

Response 200

Returns TwiML and session metadata for Twilio to connect the caller to the agent's WebSocket stream.

Twilio Media Stream

WS /v1/sessions/twilio/stream

WebSocket endpoint for Twilio media streams. Twilio connects here after receiving TwiML from the session creation endpoint.

The WebSocket handles:

  • Bidirectional audio streaming (mulaw 8kHz)
  • Real-time STT transcription
  • LLM inference
  • TTS synthesis and playback
  • Tool execution
  • Barge-in / interruption detection

Authentication

Twilio strips query parameters from <Stream> WebSocket URLs. Authentication tokens are passed as <Parameter> elements in TwiML and read from customParameters in the WebSocket start message.

Message Flow

  1. Twilio sends start message with session metadata
  2. Twilio streams media messages (base64-encoded mulaw audio)
  3. Server sends media messages back (synthesized speech)
  4. Server may send mark messages for synchronization
  5. Session ends with stop message

Agent WebSocket Session

Direct WebSocket connection to an agent, bypassing Twilio.

WS /v1/agents/:id/session

Connect directly from a browser or custom client. Useful for web-based voice interfaces without phone integration.

Task Callbacks

POST /v1/sessions/twilio/task/:taskId/status
POST /v1/sessions/twilio/task/:taskId/connect

Internal callback endpoints used by the task dispatcher for outbound calls. These are called by Twilio, not by API consumers.

Architecture

Phone Call Flow:
  Caller -> Twilio -> POST /v1/sessions/twilio (TwiML)
                   -> WS /v1/sessions/twilio/stream
                   -> TwilioRuntimeSession
                   -> STT -> Tools -> LLM -> TTS -> Caller

Direct WebSocket Flow:
  Browser -> WS /v1/agents/:id/session
          -> AgentRuntime
          -> STT -> LLM -> TTS -> Browser

Latency

  • Target first response: < 1.5 seconds
  • Steady-state turn latency: < 900ms
  • Cartesia TTS TTFB: ~40ms
  • Full pipeline includes STT + LLM + TTS

Built by Persona Labs.