Voice Agents
Build real-time conversational AI agents with natural voice interaction.
Overview
Voice Agents enable you to build interactive, voice-driven AI experiences. Each agent maintains conversational context, supports tool calling, and streams audio in real time over WebSockets.
Quickstart
Create an agent and start a voice session in just a few lines of code.
Create Agent
/v1/agents/createCreate a new voice agent with a custom persona, voice, and optional tool definitions. Agents persist across sessions and can be reused.
| Parameter | Type | Description |
|---|---|---|
| nameREQUIRED | string | Display name for the agent |
| voice_idREQUIRED | string | Voice to use for speech synthesis |
| system_promptREQUIRED | string | System instructions defining agent behavior |
| language | string | Language code (e.g. en, es, fr). Defaults to en |
| tools | object[] | Array of tool/function definitions the agent can call |
| max_turns | integer | Maximum conversation turns per session. Defaults to 50 |
Response
Create Session
/v1/agents/{agent_id}/sessionsStart a new conversational session with an agent. Returns a WebSocket URL for real-time audio streaming.
| Parameter | Type | Description |
|---|---|---|
| agent_idREQUIRED | string | ID of the agent to start a session with |
| metadata | object | Custom key-value metadata to attach to the session |
| initial_message | string | Optional greeting message the agent speaks first |
Response
List Agents
/v1/agentsRetrieve a paginated list of all agents in your account.
| Parameter | Type | Description |
|---|---|---|
| limit | integer | Number of agents to return. Defaults to 20, max 100 |
| offset | integer | Number of agents to skip for pagination |
Delete Agent
/v1/agents/{agent_id}Permanently delete an agent and terminate all its active sessions. This action cannot be undone.
| Parameter | Type | Description |
|---|---|---|
| agent_idREQUIRED | string | ID of the agent to delete |
Response Objects
Reference for the objects returned by Voice Agent endpoints.
Agent Object
| Field | Type | Description |
|---|---|---|
| id | string | Unique agent identifier (prefixed with agt_) |
| name | string | Display name of the agent |
| voice_id | string | Voice used for synthesis |
| system_prompt | string | System instructions for agent behavior |
| created_at | string | ISO 8601 creation timestamp |
| status | string | Agent status: active, paused, or deleted |
Session Object
| Field | Type | Description |
|---|---|---|
| session_id | string | Unique session identifier (prefixed with ses_) |
| agent_id | string | ID of the agent this session belongs to |
| websocket_url | string | WebSocket URL for real-time audio streaming |
| created_at | string | ISO 8601 creation timestamp |
| expires_at | string | ISO 8601 session expiry timestamp |
| status | string | Session status: active, ended, or expired |
Best Practices
Keep system prompts focused
Write clear, concise system prompts that define a single persona. Avoid overloading agents with too many responsibilities -- create separate agents for distinct use cases instead.
Handle WebSocket reconnections
Network interruptions are inevitable. Implement automatic reconnection logic with exponential backoff, and use the session metadata to restore conversation context.
Set appropriate max_turns limits
Configure max_turns to prevent runaway sessions and control costs. For simple Q&A agents, 10-20 turns is typically sufficient. For complex workflows, consider 30-50.
Use tool calling for dynamic data
Instead of hardcoding information in prompts, define tools that fetch live data such as order status, account details, or inventory. This keeps responses accurate and up to date.