Rate limits & quotas

Per-tier request limits, concurrency caps, credit rates, and a recommended retry strategy.

Request limits

API endpoints are rate-limited per API secret to protect service quality.

TierConcurrent sessionsAgent generations/day
Free25
Creator ($20/mo)520
Pro ($99/mo)1050
EnterpriseCustomCustom

Check your current tier and usage at Developer → API Keys.

Concurrency limits

ResourceLimitNotes
Cloud avatar sessionsBased on tierActive WebRTC sessions.
Agent generation3 concurrentQueued if exceeded.
Dynamics generation2 concurrentQueued if exceeded.

Credit rates

Live sessions bill per minute by model and host; some operations are one-time.

FeatureCredits/min
Voice chat (managed agent, no avatar)10
Camera chat (managed agent, camera on)30
Essence — cloud2
Essence — self-hosted1
Expression — cloud4
Expression — self-hosted2
One-time operationCredits
Agent generation250
Dynamics generation250

Check your balance with GET /v2/credit-summaries — see Billing.

Endpoint guidelines

EndpointGuidance
POST /v1/validateLightweight — use for health checks.
POST /v1/agent/generateHeavy — a 2–5 min async operation.
GET /v1/agent/status/*Poll at 5 s intervals; avoid sub-second polling.
POST /v1/agent/*/speakPer active session — agent must be in a room.
POST /v1/files/upload10 MB image, 100 MB video; size limits enforced.
POST /v1/dynamics/generateHeavy — triggers video generation.

Handling limits

If you exceed limits or run out of credits, the API returns an error:

{
  "error": {
    "code": "INSUFFICIENT_BALANCE",
    "message": "Insufficient credits",
    "httpStatus": 402
  },
  "status": "error",
  "status_code": 402
}

Common status codes: 402 (no credits), 429 (rate limited), 503 (workers busy). See the full error reference.

Use exponential backoff with jitter for 429 and 503:

import time, random, requests

def api_request_with_retry(url, headers, max_retries=3):
    for attempt in range(max_retries):
        resp = requests.post(url, headers=headers)
        if resp.status_code not in (429, 503):
            return resp
        wait = (2 ** attempt) + random.uniform(0, 1)
        time.sleep(wait)
    return resp  # last response if all retries exhausted

Best practices

Use webhooks instead of polling

Rather than polling /v1/agent/status/{id} in a loop, prefer webhook notification when generation completes (where available).

Cache agent details

Agent data rarely changes. Cache GET /v1/agent/{code} responses locally and refresh only when needed.

Reuse sessions

Keep avatar sessions alive between conversations instead of creating new ones — session creation is the most expensive operation.

Check credits before heavy operations

Call GET /v2/credit-summaries before agent generation (250 credits) or dynamics creation (250 credits) to avoid calls that fail with 402.

Need higher limits?

Contact us via Discord or email for enterprise pricing with custom limits.