Python — Hello, avatar
Get a real-time on-device bitHuman avatar rendering in under 20 lines of Python.
Prerequisites
- A bitHuman API secret — get one at Developer → API Keys; see Authentication.
- Python 3.10–3.14 (use a virtualenv). Install the library:
pip install bithuman==2.3.0 soundfile
- Runs on macOS arm64 (M3+), Linux x86_64, and Linux aarch64 — fully on-device.
- Two input files the script reads by name:
avatar.imx— an avatar model. Runbithuman pull modern-court-jesteronce (the CLI caches it at~/.cache/bithuman/showcase/modern-court-jester.imx), then copy it next to your script asavatar.imx. Or download one from Explore.speech.wav— any speech clip from any TTS (ElevenLabs, OpenAI, your own recording, …).
Run it
- Set your API secret in the same shell you’ll run from.
export BITHUMAN_API_SECRET=your_secret
-
Save the Full code below as
hello.py, withavatar.imxandspeech.wavbeside it. -
Run it.
python hello.py
What you’ll see
The program prints nothing and exits 0. It renders frames into frame.bgr_image but doesn’t display them — that’s the minimal loop, by design. To actually watch the avatar (OpenCV window + speaker playback), run the canonical example:
git clone https://github.com/bithuman-product/bithuman-sdk-public.git
cd bithuman-sdk-public/Examples/python/local-essence
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt # incl. opencv — not bundled with the SDK
python quickstart.py \
--model ~/.cache/bithuman/showcase/modern-court-jester.imx \
--audio-file speech.wav # this example dir ships a speech.wav
The streaming contract — push_audio → flush → run — is documented in full in Audio streaming.
Full code
# hello.py — minimal avatar render loop
import asyncio, os
import numpy as np
import soundfile as sf
from bithuman import AsyncBithuman
# bithuman 2.3 is library-only — the old bithuman.audio helpers were
# removed. Inline what we need: load a WAV, downmix to mono 16 kHz,
# convert float32 → int16 PCM.
def load_audio(path: str, target_sr: int = 16000) -> tuple[np.ndarray, int]:
audio, sr = sf.read(path, dtype="float32", always_2d=False)
if audio.ndim > 1:
audio = audio.mean(axis=1)
if sr != target_sr:
n = int(round(len(audio) * target_sr / sr))
audio = np.interp(
np.linspace(0, len(audio), n, endpoint=False),
np.arange(len(audio)), audio,
).astype(np.float32)
sr = target_sr
return audio, sr
def float32_to_int16(arr: np.ndarray) -> np.ndarray:
return (np.clip(arr, -1.0, 1.0) * 32767.0).astype(np.int16)
async def main():
runtime = await AsyncBithuman.create(
model_path="avatar.imx",
api_secret=os.environ["BITHUMAN_API_SECRET"],
)
pcm, sr = load_audio("speech.wav")
pcm = float32_to_int16(pcm)
chunk = sr // 100 # 10 ms per chunk
for i in range(0, len(pcm), chunk):
await runtime.push_audio(
pcm[i : i + chunk].tobytes(), sr, last_chunk=False,
)
await runtime.flush()
try:
async for frame in runtime.run():
if frame.has_image:
bgr = frame.bgr_image # (H, W, 3) uint8 numpy array
# Encode it, display it, push it to a video sink — your choice.
if frame.end_of_speech:
break
finally:
await runtime.stop()
asyncio.run(main())
Full source: GitHub
Next steps
- Audio streaming — the canonical
push_audio/flush/runcontract in depth. - Python SDK — full API surface, LiveKit agents, troubleshooting.
- AI voice chat — OpenAI Realtime voice chat driving the avatar.