Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.bithuman.ai/llms.txt

Use this file to discover all available pages before exploring further.

This page covers the Python SDK path (Essence on CPU, self-hosted). If you’re building somewhere else, jump straight to:

1. Get Credentials

1

Sign up

Create an account at www.bithuman.ai
2

Copy your API Secret

Go to Developer → API Keys and copy your API Secret.API Keys
3

Download an avatar model

Download an avatar model (.imx file) from the Explore page — click the menu on any agent and select Download.Download Model

2. Install

pip install bithuman --upgrade
The SDK includes opencv-python-headless automatically. Do not install opencv-python (full) separately — it conflicts with PyAV and causes FFmpeg warnings on macOS.
Source, changelog, and issue tracker for the bithuman PyPI package live at bithuman-python-sdk-public. The runtime source is private (signing material is baked in), but the public repo mirrors the README + changelog and is where to file bugs or feature requests.

3. Run Your First Avatar

You need a .wav audio file to drive the avatar. A sample speech.wav is included in each example directory, or generate your own with any TTS service.

Option A — CLI (fastest, no coding)

export BITHUMAN_API_SECRET=your_api_secret
bithuman generate avatar.imx --audio speech.wav --output demo.mp4
Open demo.mp4 to see your avatar talking.
Don’t have a WAV yet? Grab the bundled sample in one line: curl -O https://raw.githubusercontent.com/bithuman-product/bithuman-examples/main/essence-selfhosted/speech.wav

Option B — Python

import asyncio, os
from bithuman import AsyncBithuman
from bithuman.audio import load_audio, float32_to_int16

async def main():
    runtime = await AsyncBithuman.create(
        model_path="avatar.imx",
        api_secret=os.environ["BITHUMAN_API_SECRET"],
    )
    await runtime.start()

    pcm, sr = load_audio("speech.wav")
    pcm = float32_to_int16(pcm)
    chunk = sr // 25                          # one chunk per video frame
    for i in range(0, len(pcm), chunk):
        await runtime.push_audio(pcm[i : i + chunk].tobytes(), sr)
    await runtime.flush()

    try:
        async for frame in runtime.run():
            if frame.has_image:
                image = frame.bgr_image        # numpy (H, W, 3) uint8
            if frame.end_of_speech:
                break
    finally:
        await runtime.stop()

asyncio.run(main())
Want to display the frames live? See Audio Clip for the OpenCV + speaker playback pattern. Full working example on GitHub

Key Concepts

ConceptDescription
RuntimeAsyncBithuman instance that processes audio into video
push_audioFeed audio bytes — avatar lip-syncs in real-time
flushSignals end of audio input
run()Async generator that yields frames at 25 FPS
FrameContains .bgr_image (numpy), .audio_chunk, .end_of_speech

Troubleshooting

The SDK is not installed. Run:
pip install bithuman --upgrade
Make sure you’re using the correct Python environment (virtualenv, conda, etc.).
Your API secret is invalid or missing. Check:
  1. You copied the full secret from Developer → API Keys
  2. The api_secret parameter or BITHUMAN_API_SECRET env var is set correctly
  3. Your account is active with available credits
Quick test:
curl -X POST https://api.bithuman.ai/v1/validate \
  -H "api-secret: YOUR_SECRET"
The avatar needs audio input to animate:
  1. Ensure you’re calling push_audio() with valid audio data
  2. Call flush() after pushing all audio
  3. Check that the audio is 16-bit PCM format (use float32_to_int16() helper)
  4. Verify audio sample rate matches the file (typically 16000 or 44100)
This is normal for the first session — the .imx model takes time to load and initialize. Subsequent sessions in the same process start instantly.To reduce perceived latency, keep the runtime alive between sessions instead of recreating it.
The model file path is wrong. Check:
  1. The .imx file exists at the path you specified
  2. Use an absolute path if running from a different directory
  3. Download a model from the Explore page if you don’t have one

Next Steps

Audio Clip

Play audio file through avatar (5 min)

Live Microphone

Real-time mic input (10 min)

AI Conversation

OpenAI voice chat (15 min)
Or jump straight to the Docker App for a complete end-to-end setup.

Guides

System Requirements

  • Python 3.9 – 3.14
  • Essence (CPU): Linux (x86_64 / ARM64), macOS 13+ (Intel or Apple Silicon), or Windows 10+. 1–2 CPU cores, 4 GB RAM typical.
  • Expression on-device: macOS 14+ on Apple Silicon M3 or later, 16 GB RAM. Elsewhere, use the self-hosted GPU deployment on Linux + NVIDIA.