Rate Limits

Understand request quotas and how to optimize your API usage.

Overview

Rate limits protect the Nur API from abuse and ensure fair usage across all customers. Limits are applied at two levels: per-minute request limits that control the rate of incoming requests, and monthly usage quotas that cap the total volume of resources consumed within a billing cycle.

Your rate limits are determined by your account tier. When you exceed a limit, the API returns a 429 Too Many Requests response with a Retry-After header indicating how long to wait before retrying.

Limits by Tier

Each tier provides increasing capacity for requests, usage, and resources. Upgrade your plan at any time from the Nur Dashboard.

Limit	Free	Pro	Business	Enterprise
Requests / min	100	1,000	5,000	Custom
Characters / month	10,000	500,000	2,000,000	Custom
Voice clones	1	10	50	Custom
Dubbing minutes / month	5	120	600	Custom
Music tracks / month	10	200	1,000	Custom

Per-Endpoint Limits

Some resource-intensive endpoints have additional per-endpoint rate limits that are lower than the global request limit. These apply on top of the tier-based limits.

Endpoint	Free	Pro	Business
/v1/dubbing/create	10 / min	50 / min	200 / min
/v1/voices/clone	5 / min	20 / min	100 / min
/v1/music/generate	10 / min	100 / min	500 / min
/v1/audio/enhance	20 / min	100 / min	500 / min
/v1/tts/generate	50 / min	500 / min	2,000 / min
/v1/stt/transcribe	30 / min	200 / min	1,000 / min

Rate Limit Headers

Every API response includes rate limit headers so you can monitor your current usage and remaining capacity in real time.

HTTP/1.1 200 OK
X-RateLimit-Limit: 1000        # Maximum requests allowed per minute
X-RateLimit-Remaining: 742     # Requests remaining in the current window
X-RateLimit-Reset: 1706108460  # Unix timestamp when the window resets

Header	Type	Description
X-RateLimit-Limit	Integer	The maximum number of requests allowed in the current time window.
X-RateLimit-Remaining	Integer	The number of requests remaining before the limit is reached.
X-RateLimit-Reset	Unix timestamp	The time at which the current rate limit window resets.
Retry-After	Integer (seconds)	Seconds to wait before retrying. Only present on 429 responses.

Handling 429 Responses

When you receive a 429 status code, read the Retry-After header to determine how long to wait. If the header is not present, use exponential backoff starting at 1 second.

import time
import requests
def make_request_with_backoff(url, headers, payload, max_retries=5):
    """Make an API request with automatic retry on rate limits."""
    for attempt in range(max_retries):
        response = requests.post(url, headers=headers, json=payload)
        if response.status_code == 429:
            # Read the Retry-After header
            retry_after = int(response.headers.get("Retry-After", 2 ** attempt))
            remaining = response.headers.get("X-RateLimit-Remaining", "unknown")
            print(f"Rate limited. Remaining: {remaining}. Waiting {retry_after}s...")
            time.sleep(retry_after)
            continue
        response.raise_for_status()
        return response.json()
    raise Exception("Max retries exceeded due to rate limiting")
# Usage
result = make_request_with_backoff(
    url="https://api.nur.ai/v1/tts/generate",
    headers={"Authorization": "Bearer nur_your_api_key"},
    payload={"text": "Hello world", "voice_id": "rachel_v2"},
)

Best Practices

Follow these recommendations to make the most of your rate limits and avoid unnecessary throttling.

Batch requests when possible

Some endpoints support batch operations. For example, use the batch transcription endpoint to process multiple audio files in a single request instead of making individual calls.

Implement client-side caching

Cache voice lists, transcription results, and other responses that do not change frequently. This reduces redundant API calls and keeps you well within your limits.

Use webhooks instead of polling

For long-running operations like dubbing or voice cloning, register a webhook to receive a notification when the job completes instead of polling the status endpoint repeatedly.

Monitor rate limit headers

Read the X-RateLimit-Remaining header on every response and throttle your client when the value approaches zero. This is more efficient than reacting to 429 errors.

Spread requests evenly

Avoid burst patterns where you send many requests at once. Distribute your requests evenly across the rate limit window for smoother throughput.

Request a limit increase

If your use case requires higher limits, contact the Nur team or upgrade to a higher tier. Enterprise plans offer fully custom limits tailored to your needs.