Skip to content

Conversational AI

Build real-time voice conversations with AI — speak naturally and get instant spoken responses.

Overview

The Conversational AI API enables real-time, spoken dialogue between users and AI. Create a conversation session, open a WebSocket audio stream, and begin talking. The AI listens, understands context, and responds in natural speech with minimal latency. Conversations maintain full context across turns so the AI remembers what was said earlier. Use it to power voice assistants, customer service agents, interactive tutors, companion apps, and any experience where users need to speak with AI as naturally as they would with another person.

Key Capabilities

  • Real-time dialogue
  • Context memory
  • Interruption handling
  • Multilingual support
  • Tool / function calling
  • Custom personas

Quickstart

Create a conversation session and start talking to AI in just a few lines of code.

1# 1. Create a conversation session
2curl -X POST https://api.nur.ai/v1/conversations/create \
3 -H "Authorization: Bearer nur_your_api_key" \
4 -H "Content-Type: application/json" \
5 -d '{
6 "voice_id": "nadia_v2",
7 "system_prompt": "You are a friendly travel assistant.",
8 "language": "en"
9 }'
10
11# Response: { "session_id": "conv_abc123", ... }
12
13# 2. Connect via WebSocket to stream audio
14# wss://api.nur.ai/v1/conversations/conv_abc123/stream
15# Send microphone audio frames, receive AI speech frames in real time.

Endpoints

POST/v1/conversations/create

Create a new conversation session. A session holds the AI persona, voice configuration, conversation history, and any registered tools. Once created, connect to the session via WebSocket to begin real-time voice dialogue.

ParameterTypeDescription
voice_idREQUIREDstringThe voice the AI will use when speaking. See the Voices page for available options.
system_promptREQUIREDstringInstructions that define the AI persona, tone, and behavior for the conversation.
languagestringPrimary language for the conversation (ISO 639-1 code). Default: "en".
toolsarrayTool/function definitions the AI can invoke mid-conversation (e.g., booking, lookups).
context_windowintegerNumber of previous turns to keep in context. Default: 50.
max_turnsintegerMaximum number of dialogue turns before the session auto-closes. Default: unlimited.
1curl -X POST https://api.nur.ai/v1/conversations/create \
2 -H "Authorization: Bearer nur_your_api_key" \
3 -H "Content-Type: application/json" \
4 -d '{
5 "voice_id": "nadia_v2",
6 "system_prompt": "You are a knowledgeable customer support agent for Acme Corp. Be concise and helpful.",
7 "language": "en",
8 "tools": [
9 {
10 "name": "lookup_order",
11 "description": "Look up an order by its ID",
12 "parameters": {
13 "order_id": { "type": "string", "required": true }
14 }
15 }
16 ],
17 "context_window": 40,
18 "max_turns": 100
19 }'
1{
2 "session_id": "conv_abc123def456",
3 "status": "ready",
4 "voice_id": "nadia_v2",
5 "language": "en",
6 "context_window": 40,
7 "max_turns": 100,
8 "tools": ["lookup_order"],
9 "created_at": "2025-06-15T10:30:00Z",
10 "websocket_url": "wss://api.nur.ai/v1/conversations/conv_abc123def456/stream"
11}
WebSocket/v1/conversations/{session_id}/stream

Open a persistent WebSocket connection for real-time audio streaming on an active conversation session. Send raw audio frames from the user's microphone and receive AI-generated speech frames back. The connection supports voice activity detection, mid-utterance interruptions, and bidirectional audio flow for natural turn-taking.

ParameterTypeDescription
sample_rateintegerAudio sample rate in Hz for both input and output. Default: 16000.
encodingstringAudio encoding format: "pcm16", "opus", or "mulaw". Default: "pcm16".
vad_thresholdnumberVoice activity detection sensitivity (0.0 to 1.0). Higher values require louder speech to trigger. Default: 0.5.
1# Connect via WebSocket (using websocat as an example)
2websocat "wss://api.nur.ai/v1/conversations/conv_abc123def456/stream?sample_rate=16000&encoding=pcm16&vad_threshold=0.5" \
3 -H "Authorization: Bearer nur_your_api_key"
4
5# Once connected, send raw audio frames as binary messages.
6# Receive AI speech audio frames and JSON event messages.
1// Connection established
2{ "type": "session_started", "session_id": "conv_abc123def456" }
3
4// User starts speaking (detected by VAD)
5{ "type": "turn_start", "role": "user" }
6
7// User transcript (generated incrementally)
8{ "type": "transcript", "role": "user", "text": "What's the weather like in Tokyo?", "is_final": true }
9
10// AI begins responding
11{ "type": "turn_start", "role": "assistant", "latency_ms": 320 }
12
13// AI audio chunks are sent as binary WebSocket frames
14
15// AI transcript
16{ "type": "transcript", "role": "assistant", "text": "Right now in Tokyo it's 22 degrees Celsius and partly cloudy.", "is_final": true }
17
18// AI turn complete
19{ "type": "turn_end", "role": "assistant", "turn_id": "turn_002", "duration_ms": 2450 }
GET/v1/conversations/{session_id}

Retrieve details and full conversation history for a session. Returns the session configuration, current status, and an ordered list of all dialogue turns including transcripts and metadata.

1curl -X GET https://api.nur.ai/v1/conversations/conv_abc123def456 \
2 -H "Authorization: Bearer nur_your_api_key"
1{
2 "session_id": "conv_abc123def456",
3 "status": "active",
4 "voice_id": "nadia_v2",
5 "language": "en",
6 "system_prompt": "You are a knowledgeable customer support agent for Acme Corp.",
7 "total_turns": 6,
8 "context_window": 40,
9 "created_at": "2025-06-15T10:30:00Z",
10 "last_activity_at": "2025-06-15T10:34:12Z",
11 "history": [
12 {
13 "turn_id": "turn_001",
14 "role": "user",
15 "text": "Hi, I need help with my recent order.",
16 "timestamp": "2025-06-15T10:30:45Z",
17 "duration_ms": 1820
18 },
19 {
20 "turn_id": "turn_002",
21 "role": "assistant",
22 "text": "Of course! I'd be happy to help. Could you give me your order number?",
23 "timestamp": "2025-06-15T10:30:47Z",
24 "duration_ms": 2340,
25 "latency_ms": 310
26 }
27 ]
28}
DELETE/v1/conversations/{session_id}

End a conversation session and release all associated resources. Any active WebSocket connections on the session will be closed immediately. The conversation history remains available for retrieval for 30 days after deletion.

1curl -X DELETE https://api.nur.ai/v1/conversations/conv_abc123def456 \
2 -H "Authorization: Bearer nur_your_api_key"
1{
2 "session_id": "conv_abc123def456",
3 "status": "ended",
4 "total_turns": 12,
5 "total_duration_ms": 184320,
6 "ended_at": "2025-06-15T10:37:45Z"
7}

Response Objects

The API uses the following core object schemas across conversation endpoints and WebSocket events.

1{
2 "session_id": "string (unique conversation session identifier)",
3 "status": "string ("ready" | "active" | "ended")",
4 "voice_id": "string (voice used by the AI)",
5 "language": "string (ISO 639-1 language code)",
6 "system_prompt": "string (AI persona instructions)",
7 "tools": "array (registered tool definitions)",
8 "context_window": "integer (number of turns retained in context)",
9 "max_turns": "integer | null (turn limit, null if unlimited)",
10 "total_turns": "integer (number of turns completed so far)",
11 "created_at": "string (ISO 8601 timestamp)",
12 "last_activity_at": "string (ISO 8601 timestamp)",
13 "websocket_url": "string (WebSocket URL for streaming)",
14 "history": "array of TurnEvent objects"
15}

Best Practices

Tune VAD sensitivity for your environment

In noisy environments (call centers, outdoors), raise vad_threshold toward 0.7-0.8 to avoid false activations. In quiet environments, lower it to 0.3-0.4 for more responsive detection. Test with real users to find the right balance between responsiveness and false triggers.

Craft detailed system prompts for natural dialogue

A well-written system prompt dramatically improves conversation quality. Specify the persona, tone, response length preferences, and how the AI should handle ambiguity. For example: "Keep responses under two sentences unless the user asks for detail. Always confirm before taking actions."

Design tool calls for conversational flow

When registering tools, write clear descriptions so the AI knows when to invoke them. Keep tool execution fast (under 2 seconds) to avoid awkward pauses. If a tool takes longer, configure the AI to say a brief hold message like "Let me look that up for you" while waiting.

Handle errors and disconnections gracefully

WebSocket connections can drop due to network issues. Implement automatic reconnection with exponential backoff and resume the session using the same session_id. Store the last turn_id locally so you can detect and skip duplicate events after reconnecting.