Essence vs Expression
The two bitHuman avatar models — what each does, where each runs, and which one to pick.
At a glance
bitHuman ships two avatar models. Both share the same .imx file format, the same SDK methods, and the same push audio → drain frames shape. Essence is the default — it runs on virtually every CPU and is what bithuman pull ships in the showcase. Expression is the heavier high-fidelity option for specific on-device Apple Silicon or GPU server use cases.
| Essence (default) | Expression | |
|---|---|---|
| What it does | Pre-built avatar identity packaged in an .imx file. Real-time lip-sync. | Dynamic facial animation from any portrait image at runtime. |
| Avatar source | .imx you build once from a photo or video. | Any face image — provide at runtime, no build step. |
| Custom gestures | Yes (wave, nod, laugh, etc.) | No |
| Idle animation | Pre-recorded natural movement | AI-generated micro-movements |
| Compute needed | Any modern CPU | Apple Silicon M3+ (demo apps) or NVIDIA GPU |
| Memory footprint | Low (~200–500 MB) | Higher (~2–6 GB) |
| Best for | Kiosks, mobile, edge, 24/7 deployments, high concurrency | Close-up native consumer apps, custom faces per session |
| Pricing | 1 credit/min self-hosted · 2 credits/min cloud | 2 credits/min self-hosted · 4 credits/min cloud |
Both ship to every surface — SDKs, REST API, LiveKit plugin, CLI, on-device, embed widget. The same .imx file works everywhere.
Where each model runs
| Surface | Essence | Expression |
|---|---|---|
| iOS / iPadOS | iPhone 17 Pro+, iPad Pro M4+ | iPad Pro M4+ only |
| macOS arm64 | Any Apple Silicon | M3+ |
| macOS Intel | Pending (2.3 ships arm64 only) | — |
| Android | arm64-v8a, Android 10+ | — |
| Linux x86_64 / aarch64 | Any modern CPU | via NVIDIA GPU (Docker) |
| Windows | Pending (use WSL2 today) | — |
| Raspberry Pi 4B+ | Supported | — |
| bitHuman Cloud | Managed | Managed |
| Self-hosted CPU | Python SDK / LiveKit plugin | — |
| Self-hosted GPU | — | Docker container |
Native macOS-Intel and Windows wheels are pending for the 2.3 line; the architecture page tracks per-platform shipping status. iPhone Expression is not currently supported — use Essence on iPhone.
Essence
Essence packages a complete avatar identity (face, body, gestures) into an .imx file. At runtime, the SDK plays back pre-rendered base motion and patches the mouth region in real time to match incoming audio.
Runtime characteristics
- ~200–500 MB resident, 1–2 CPU cores, real-time at 25 FPS.
- Runs on macOS arm64, Linux x86_64 / aarch64, iOS, iPadOS, Android, Raspberry Pi 4B+, and in the browser via WASM.
- No idle timeout — sessions can run 24/7. Reliable for unattended kiosks and lobby displays.
- Supports custom gestures (wave, nod, laugh) triggered by keywords or API.
- Predictable, consistent behavior. Lower per-stream cost — the right pick for high-concurrency self-hosted deployments.
Try it from the showcase
The CLI ships a curated set of ready-to-run Essence .imx avatars:
bithuman list # browse the showcase
bithuman pull modern-court-jester # downloads to ~/.cache/bithuman/showcase/<slug>.imx
bithuman run modern-court-jester.imx # live browser-served avatar
How to ship it
- Python SDK — self-host on macOS arm64 + Linux x86_64 / aarch64.
- Swift SDK — native Mac, iPad, iPhone apps.
- Kotlin SDK — native Android apps (Beta).
- bitHuman CLI — no code, terminal or browser.
- REST API — backend integration in any language.
- Cloud LiveKit plugin — managed, no infrastructure.
- Embed widget — drop-in iframe for websites.
Expression
Expression generates real-time facial animation directly from a portrait image. The face can change between sessions or even mid-session — no avatar build step is required.
Runtime characteristics
- ~2–6 GB resident; needs Apple Silicon M3+ (Mac) / M4+ (iPad Pro) or an NVIDIA GPU (8 GB+ VRAM).
- Works with any face image — drag-and-drop swap, photo, video frame, anything.
- AI-driven expressions adapt to speech content and emotional context.
- Higher visual fidelity for close-up conversational interactions.
- On-device demo apps target macOS M3+ and iPad Pro M4+. iPhone Expression and macOS-Intel are not currently supported.
- On Apple Silicon the Swift SDK auto-spawns a
bithuman-expression-daemonsubprocess to drive the model.
How to ship it
- Cloud LiveKit plugin — bitHuman hosts the GPU worker (set
model="expression"). - Self-hosted GPU — your own NVIDIA GPU via the Docker container.
- On-device macOS / iPadOS — Apple Silicon M3+, via the Swift SDK.
- bitHuman CLI —
bithuman runwith an Expression.imx. - REST API — same endpoint as Essence; the model is selected per agent.
Which should I use?
24/7 kiosk or always-on display
Essence. No idle timeout, runs on CPU, predictable for unattended deployments.
iPhone app
Essence. Expression on iPhone isn’t currently supported — iPad and Mac are the on-device Expression hosts.
Android app
Essence via the Kotlin SDK (Beta).
Native Mac or iPad app with close-up dynamic faces
Expression on-device via the Swift SDK or the Mac/iPad reference apps.
Need custom gestures (wave, nod, laugh)
Essence. Expression doesn’t support gesture triggers.
Quickest setup with any face photo
Expression via the cloud plugin. Pass the image at session start — no build step.
Voice agent on LiveKit with maximum concurrency
Essence. Lower per-stream cost makes it the right pick for high-concurrency deployments.
Edge hardware (Raspberry Pi, low-power laptop)
Essence. Runs on 1–2 CPU cores at 25 FPS.
Highest visual quality for offline video generation
Expression with quality="high". Best for offline batch jobs rather than real-time streaming.
Where to go next
- Quickstart — get your first avatar running in ~2 minutes.
- Architecture — engine layering and the full per-platform device matrix.
- Pricing — credits, tiers, and what’s metered.
- Avatars and the
.imxformat — how avatars are packaged.