What you get
Three inputs, one output. The `.imx` bundle is the single weights file; the portrait is the identity you pass at runtime.Requirements. macOS 14+, Apple Silicon M3 or later (M5+ recommended for the smoothest experience), 16 GB RAM, ~5 GB free disk, Python 3.9–3.14. M1 / M2 / Intel fall back to self-hosted GPU.
Quick start (3 steps)
Install the SDK
bithuman wheel for macOS arm64 ships bithuman-expression-daemon pre-built — no extra setup.Get an API secret
From bithuman.ai → Developer → API Keys. Expression costs 2 credits / minute of rendered video.
Run it
Zero-argument path — the CLI auto-downloads the ~3.7 GB demo Or pass your own portrait + audio:
.imx bundle on first run (cached to ~/.cache/bithuman/models/) and renders a 15 s sample clip:Expect
✓ Wrote demo.mp4 — N frames @ 25 FPS. Model load: ~10 s. First frame: < 1.3 s. Rendering: ~0.5× real time on M3, ~1× on M5.Use a gallery portrait (instead of your own)
Every persona from the Halo app is a public portrait URL. Pass it directly to--identity — no download step, the CLI caches it under ~/.cache/bithuman/identities/ after first use.
Agent gallery
| Code | Name | Character |
|---|---|---|
A74NWD9723 | Energetic Audio Story Buddy | Podcast-host storyteller (Halo’s default) |
A91MJY5711 | Warm Relativity Mentor Einstein | Einstein reimagined as a curious mentor |
A22MCJ3461 | Late-Night Interview Host | Charming talk-show riff |
A32XFH3193 | Ethics Advisor | Boardroom-grade principled advisor |
A43XYD7624 | Stage Presence Coach | Stand-up comic + coach |
A24HAC6344 | Fairy-Tale Grandmother | Storytime narrator |
A02GXF3393 | Whimsical Bee Entertainer | Giggly bee mascot |
A37QAW0225 | Pirate Trivia Host | Captain Quizbeard |
A23WJF0199 | Wise Pup | Sir Barksworth the British dog |
In Python
Same pipeline, from the SDK:Change the face mid-session
identity= value | Cost at load / swap |
|---|---|
None (default) | 0 — uses bundle’s baked-in face |
"portrait.jpg" / .png | ~300 ms (encoder pass) |
"portrait.npy" (cached) | instant |
.npy to disk and reusing it across sessions.
How it works
Onepip install bithuman dispatches to the right runtime automatically:
Inside the Swift subprocess: MLX on the GPU (speech encoder + diffusion animator), CoreML on the Neural Engine (face renderer). Python only shuffles PCM + frames over a framed stdio pipe.
Performance contract
Performance contract
- Per-frame budget ≤ 40 ms (25 FPS enforced by the actor)
- First-frame latency ≤ 1.3 s (full receptive field) / ≤ 450 ms (partial window)
- Bounded memory — working set caps at ~4 GB during a burst;
shutdown()releases - One model evaluation at a time — Halo uses
setLLMGenerating(true/false)to keep the avatar from contending with the LLM for the GPU during generation
Troubleshooting
| Symptom | Cause + fix |
|---|---|
ExpressionModelNotSupported on an M1 / M2 Mac | Animator requires M3 or later memory bandwidth. Use an M3+ Mac, or the self-hosted GPU deployment on Linux + NVIDIA. |
ExpressionModelNotSupported on Intel / Linux / Windows | No local path — use the self-hosted GPU deployment. |
pre-encoded identity spatial dim N ≠ pipeline dim M | The cached .npy was encoded for a different renderer resolution than the one in your .imx. Re-encode from the source portrait. |
| First-frame latency > 1.3 s | Usually another MLX workload (LLM, another avatar) is contending for the GPU. Serialize with setLLMGenerating(true/false) while the LLM is generating. |
| Stuttering lip-sync during long replies on M3 / M4 | Requires bithuman ≥ 1.10.6 + BithumanAvatar 0.6.2 (or newer) — they ship the concurrent-MLX fix. |
Advanced
Build your own Expression bundle from raw weights
Build your own Expression bundle from raw weights
The
bithuman pack CLI packs raw animator + encoder + renderer weights into an .imx. This is for model authors, not consumers — if you’re here to integrate Expression into an app, you don’t need it.Clone the full example repo
Clone the full example repo
Next steps
PyPI: bithuman
pip install bithuman — SDK reference, changelog, release history.bitHuman Halo (consumer app)
The free desktop companion built on this exact pipeline.
Self-hosted GPU (Linux + NVIDIA)
Same Expression animator, CUDA backend — for hosts without Apple Silicon.
AI voice agent (LiveKit)
Wire this pipeline into LiveKit + an LLM + TTS for a full voice agent.
