Text to speech

Turn text into natural speech with bitHuman's real-time TTS — built-in voices, inline tuning, and shareable voice codes designed in the playground.

bitHuman’s text-to-speech runs the same in-house voice engine that powers live agents. One POST turns text into a WAV you can save or stream on the fly. It supports 31 languages, ten built-in voices, fine-grained tuning, and voice codes — opaque handles for a voice you’ve designed in the Voice Designer.

Authentication

Every call uses your bitHuman API secret in the api-secret header. Get one at Developer → API Keys (free tier, no card), then export it so the examples below pick it up:

export BITHUMAN_API_SECRET=your_api_secret

Synthesize speech

POST https://api.bithuman.ai/v1/tts returns audio bytes (a WAV by default).

curl

curl -X POST https://api.bithuman.ai/v1/tts \
  -H "api-secret: $BITHUMAN_API_SECRET" \
  -H "content-type: application/json" \
  -d '{"text": "Hello from bitHuman.", "voice": "F1", "language": "en"}' \
  --output voice.wav

Python

import os, requests

resp = requests.post(
    "https://api.bithuman.ai/v1/tts",
    headers={"api-secret": os.environ["BITHUMAN_API_SECRET"]},
    json={"text": "Hello from bitHuman.", "voice": "F1", "language": "en"},
    timeout=60,
)
resp.raise_for_status()
with open("voice.wav", "wb") as f:
    f.write(resp.content)

Request fields

FieldTypeNotes
textstringRequired. Any length; multi-sentence is supported.
voicestringBuilt-in voice id (M1M5, F1F5). Defaults to M1.
voice_codestringA designed-voice handle (see Voice codes). Takes precedence over voice.
axesobjectInline tuning — see Tuning a voice. Ignored when voice_code is set.
languagestringISO-2 code. 31 languages supported. Defaults to en.
total_stepsintegerQuality vs. speed: 5 fast, 8 balanced (default), 12 highest.
speednumberPlayback rate, 0.72.0. Defaults to 1.05.

List voices

GET /v1/voices returns the catalog — ten built-ins (M1M5, F1F5) plus any custom voices.

curl https://api.bithuman.ai/v1/voices -H "api-secret: $BITHUMAN_API_SECRET"
# {"voices":[{"id":"F1","kind":"builtin"}, ... ]}

Tuning a voice

Shape any built-in voice with semantic axesgender, pitch, rate, and brightness. Offsets are small (roughly −0.3…0.3); 0 is neutral. Call GET /v1/studio/axes for each axis’s suggested range and per-voice anchors.

curl -X POST https://api.bithuman.ai/v1/tts \
  -H "api-secret: $BITHUMAN_API_SECRET" \
  -H "content-type: application/json" \
  -d '{
    "text": "Tuned, warm, and a touch brighter.",
    "voice": "F3",
    "axes": {"gender": 0.1, "pitch": 0.05, "rate": -0.1, "brightness": 0.2}
  }' \
  --output voice.wav

Voice codes

Rather than hand-tuning axes, design a voice from a description in the Voice Designer (“a calm meditation guide”, “a gruff old captain”). When you open Use in your app, you get a voice code — a single opaque handle that already encodes the base voice and its tuning. Pass it as voice_code and skip voice/axes entirely:

curl -X POST https://api.bithuman.ai/v1/tts \
  -H "api-secret: $BITHUMAN_API_SECRET" \
  -H "content-type: application/json" \
  -d '{"text": "Hello from my custom voice.", "voice_code": "YOUR_VOICE_CODE"}' \
  --output voice.wav

A voice code is a UUID (e.g. f8fb5feb-8a19-435c-89e5-a286a03565ec). The endpoint expands it to the underlying voice + tuning, so your integration only ever references the code — re-tune the voice in the playground without touching your code path.

Stream and play on the fly

/v1/tts returns standard WAV bytes, so you can pipe the response straight into a player instead of saving a file — handy for quick local testing:

curl -sN -X POST https://api.bithuman.ai/v1/tts \
  -H "api-secret: $BITHUMAN_API_SECRET" \
  -H "content-type: application/json" \
  -d '{"text": "Playing instantly.", "voice_code": "YOUR_VOICE_CODE"}' \
  | ffplay -autoexit -nodisp -i -

For sentence-by-sentence streaming of length-prefixed PCM frames (lowest latency for long text), set "stream": true.

OpenAI-compatible endpoint

Already calling OpenAI’s TTS? Point existing clients at POST /v1/audio/speech — swap the base URL to https://api.bithuman.ai/v1 and the auth header to api-secret. See the API reference for the full schema.

Errors

401 means a missing or invalid api-secret; 400 is a malformed body; 503 means the queue is briefly full — retry with backoff. See Errors.