Python — Hello, avatar

Get a real-time on-device bitHuman avatar rendering in under 20 lines of Python.

Prerequisites

pip install bithuman==2.3.0 soundfile
  • Runs on macOS arm64 (M3+), Linux x86_64, and Linux aarch64 — fully on-device.
  • Two input files the script reads by name:
    • avatar.imx — an avatar model. Run bithuman pull modern-court-jester once (the CLI caches it at ~/.cache/bithuman/showcase/modern-court-jester.imx), then copy it next to your script as avatar.imx. Or download one from Explore.
    • speech.wav — any speech clip from any TTS (ElevenLabs, OpenAI, your own recording, …).

Run it

  1. Set your API secret in the same shell you’ll run from.
export BITHUMAN_API_SECRET=your_secret
  1. Save the Full code below as hello.py, with avatar.imx and speech.wav beside it.

  2. Run it.

python hello.py

What you’ll see

The program prints nothing and exits 0. It renders frames into frame.bgr_image but doesn’t display them — that’s the minimal loop, by design. To actually watch the avatar (OpenCV window + speaker playback), run the canonical example:

git clone https://github.com/bithuman-product/bithuman-sdk-public.git
cd bithuman-sdk-public/Examples/python/local-essence
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt      # incl. opencv — not bundled with the SDK
python quickstart.py \
  --model ~/.cache/bithuman/showcase/modern-court-jester.imx \
  --audio-file speech.wav            # this example dir ships a speech.wav

The streaming contract — push_audioflushrun — is documented in full in Audio streaming.

Full code

# hello.py — minimal avatar render loop
import asyncio, os
import numpy as np
import soundfile as sf
from bithuman import AsyncBithuman

# bithuman 2.3 is library-only — the old bithuman.audio helpers were
# removed. Inline what we need: load a WAV, downmix to mono 16 kHz,
# convert float32 → int16 PCM.
def load_audio(path: str, target_sr: int = 16000) -> tuple[np.ndarray, int]:
    audio, sr = sf.read(path, dtype="float32", always_2d=False)
    if audio.ndim > 1:
        audio = audio.mean(axis=1)
    if sr != target_sr:
        n = int(round(len(audio) * target_sr / sr))
        audio = np.interp(
            np.linspace(0, len(audio), n, endpoint=False),
            np.arange(len(audio)), audio,
        ).astype(np.float32)
        sr = target_sr
    return audio, sr

def float32_to_int16(arr: np.ndarray) -> np.ndarray:
    return (np.clip(arr, -1.0, 1.0) * 32767.0).astype(np.int16)

async def main():
    runtime = await AsyncBithuman.create(
        model_path="avatar.imx",
        api_secret=os.environ["BITHUMAN_API_SECRET"],
    )

    pcm, sr = load_audio("speech.wav")
    pcm = float32_to_int16(pcm)
    chunk = sr // 100  # 10 ms per chunk
    for i in range(0, len(pcm), chunk):
        await runtime.push_audio(
            pcm[i : i + chunk].tobytes(), sr, last_chunk=False,
        )
    await runtime.flush()

    try:
        async for frame in runtime.run():
            if frame.has_image:
                bgr = frame.bgr_image            # (H, W, 3) uint8 numpy array
                # Encode it, display it, push it to a video sink — your choice.
            if frame.end_of_speech:
                break
    finally:
        await runtime.stop()

asyncio.run(main())

Full source: GitHub

Next steps

  • Audio streaming — the canonical push_audio / flush / run contract in depth.
  • Python SDK — full API surface, LiveKit agents, troubleshooting.
  • AI voice chat — OpenAI Realtime voice chat driving the avatar.