Conversational AI

Build real-time voice conversations with AI — speak naturally and get instant spoken responses.

Overview

The Conversational AI API enables real-time, spoken dialogue between users and AI. Create a conversation session, open a WebSocket audio stream, and begin talking. The AI listens, understands context, and responds in natural speech with minimal latency. Conversations maintain full context across turns so the AI remembers what was said earlier. Use it to power voice assistants, customer service agents, interactive tutors, companion apps, and any experience where users need to speak with AI as naturally as they would with another person.

Key Capabilities

Real-time dialogue
Context memory
Interruption handling
Multilingual support
Tool / function calling
Custom personas

Quickstart

Create a conversation session and start talking to AI in just a few lines of code.

# 1. Create a conversation session
curl -X POST https://api.nur.ai/v1/conversations/create \
  -H "Authorization: Bearer nur_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "voice_id": "nadia_v2",
    "system_prompt": "You are a friendly travel assistant.",
    "language": "en"
  }'
# Response: { "session_id": "conv_abc123", ... }
# 2. Connect via WebSocket to stream audio
#    wss://api.nur.ai/v1/conversations/conv_abc123/stream
#    Send microphone audio frames, receive AI speech frames in real time.

Endpoints

POST/v1/conversations/create

Create a new conversation session. A session holds the AI persona, voice configuration, conversation history, and any registered tools. Once created, connect to the session via WebSocket to begin real-time voice dialogue.

Parameter	Type	Description
voice_idREQUIRED	string	The voice the AI will use when speaking. See the Voices page for available options.
system_promptREQUIRED	string	Instructions that define the AI persona, tone, and behavior for the conversation.
language	string	Primary language for the conversation (ISO 639-1 code). Default: "en".
tools	array	Tool/function definitions the AI can invoke mid-conversation (e.g., booking, lookups).
context_window	integer	Number of previous turns to keep in context. Default: 50.
max_turns	integer	Maximum number of dialogue turns before the session auto-closes. Default: unlimited.

curl -X POST https://api.nur.ai/v1/conversations/create \
  -H "Authorization: Bearer nur_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "voice_id": "nadia_v2",
    "system_prompt": "You are a knowledgeable customer support agent for Acme Corp. Be concise and helpful.",
    "language": "en",
    "tools": [
      {
        "name": "lookup_order",
        "description": "Look up an order by its ID",
        "parameters": {
          "order_id": { "type": "string", "required": true }
        }
      }
    ],
    "context_window": 40,
    "max_turns": 100
  }'

{
  "session_id": "conv_abc123def456",
  "status": "ready",
  "voice_id": "nadia_v2",
  "language": "en",
  "context_window": 40,
  "max_turns": 100,
  "tools": ["lookup_order"],
  "created_at": "2025-06-15T10:30:00Z",
  "websocket_url": "wss://api.nur.ai/v1/conversations/conv_abc123def456/stream"
}

WebSocket/v1/conversations/{session_id}/stream

Open a persistent WebSocket connection for real-time audio streaming on an active conversation session. Send raw audio frames from the user's microphone and receive AI-generated speech frames back. The connection supports voice activity detection, mid-utterance interruptions, and bidirectional audio flow for natural turn-taking.

Parameter	Type	Description
sample_rate	integer	Audio sample rate in Hz for both input and output. Default: 16000.
encoding	string	Audio encoding format: "pcm16", "opus", or "mulaw". Default: "pcm16".
vad_threshold	number	Voice activity detection sensitivity (0.0 to 1.0). Higher values require louder speech to trigger. Default: 0.5.

# Connect via WebSocket (using websocat as an example)
websocat "wss://api.nur.ai/v1/conversations/conv_abc123def456/stream?sample_rate=16000&encoding=pcm16&vad_threshold=0.5" \
  -H "Authorization: Bearer nur_your_api_key"
# Once connected, send raw audio frames as binary messages.
# Receive AI speech audio frames and JSON event messages.

// Connection established
{ "type": "session_started", "session_id": "conv_abc123def456" }
// User starts speaking (detected by VAD)
{ "type": "turn_start", "role": "user" }
// User transcript (generated incrementally)
{ "type": "transcript", "role": "user", "text": "What's the weather like in Tokyo?", "is_final": true }
// AI begins responding
{ "type": "turn_start", "role": "assistant", "latency_ms": 320 }
// AI audio chunks are sent as binary WebSocket frames
// AI transcript
{ "type": "transcript", "role": "assistant", "text": "Right now in Tokyo it's 22 degrees Celsius and partly cloudy.", "is_final": true }
// AI turn complete
{ "type": "turn_end", "role": "assistant", "turn_id": "turn_002", "duration_ms": 2450 }

GET/v1/conversations/{session_id}

Retrieve details and full conversation history for a session. Returns the session configuration, current status, and an ordered list of all dialogue turns including transcripts and metadata.

curl -X GET https://api.nur.ai/v1/conversations/conv_abc123def456 \
  -H "Authorization: Bearer nur_your_api_key"

{
  "session_id": "conv_abc123def456",
  "status": "active",
  "voice_id": "nadia_v2",
  "language": "en",
  "system_prompt": "You are a knowledgeable customer support agent for Acme Corp.",
  "total_turns": 6,
  "context_window": 40,
  "created_at": "2025-06-15T10:30:00Z",
  "last_activity_at": "2025-06-15T10:34:12Z",
  "history": [
    {
      "turn_id": "turn_001",
      "role": "user",
      "text": "Hi, I need help with my recent order.",
      "timestamp": "2025-06-15T10:30:45Z",
      "duration_ms": 1820
    },
    {
      "turn_id": "turn_002",
      "role": "assistant",
      "text": "Of course! I'd be happy to help. Could you give me your order number?",
      "timestamp": "2025-06-15T10:30:47Z",
      "duration_ms": 2340,
      "latency_ms": 310
    }
  ]
}

DELETE/v1/conversations/{session_id}

End a conversation session and release all associated resources. Any active WebSocket connections on the session will be closed immediately. The conversation history remains available for retrieval for 30 days after deletion.

curl -X DELETE https://api.nur.ai/v1/conversations/conv_abc123def456 \
  -H "Authorization: Bearer nur_your_api_key"

{
  "session_id": "conv_abc123def456",
  "status": "ended",
  "total_turns": 12,
  "total_duration_ms": 184320,
  "ended_at": "2025-06-15T10:37:45Z"
}

Response Objects

The API uses the following core object schemas across conversation endpoints and WebSocket events.

{
  "session_id": "string (unique conversation session identifier)",
  "status": "string ("ready" | "active" | "ended")",
  "voice_id": "string (voice used by the AI)",
  "language": "string (ISO 639-1 language code)",
  "system_prompt": "string (AI persona instructions)",
  "tools": "array (registered tool definitions)",
  "context_window": "integer (number of turns retained in context)",
  "max_turns": "integer | null (turn limit, null if unlimited)",
  "total_turns": "integer (number of turns completed so far)",
  "created_at": "string (ISO 8601 timestamp)",
  "last_activity_at": "string (ISO 8601 timestamp)",
  "websocket_url": "string (WebSocket URL for streaming)",
  "history": "array of TurnEvent objects"
}

Best Practices

Tune VAD sensitivity for your environment

In noisy environments (call centers, outdoors), raise vad_threshold toward 0.7-0.8 to avoid false activations. In quiet environments, lower it to 0.3-0.4 for more responsive detection. Test with real users to find the right balance between responsiveness and false triggers.

Craft detailed system prompts for natural dialogue

A well-written system prompt dramatically improves conversation quality. Specify the persona, tone, response length preferences, and how the AI should handle ambiguity. For example: "Keep responses under two sentences unless the user asks for detail. Always confirm before taking actions."

Design tool calls for conversational flow

When registering tools, write clear descriptions so the AI knows when to invoke them. Keep tool execution fast (under 2 seconds) to avoid awkward pauses. If a tool takes longer, configure the AI to say a brief hold message like "Let me look that up for you" while waiting.

Handle errors and disconnections gracefully

WebSocket connections can drop due to network issues. Implement automatic reconnection with exponential backoff and resume the session using the same session_id. Store the last turn_id locally so you can detect and skip duplicate events after reconnecting.