Essence vs Expression

The two bitHuman avatar models — what each does, where each runs, and which one to pick.

At a glance

bitHuman ships two avatar models. Both share the same .imx file format, the same SDK methods, and the same push audio → drain frames shape. Essence is the default — it runs on virtually every CPU and is what bithuman pull ships in the showcase. Expression is the heavier high-fidelity option for specific on-device Apple Silicon or GPU server use cases.

Essence (default)Expression
What it doesPre-built avatar identity packaged in an .imx file. Real-time lip-sync.Dynamic facial animation from any portrait image at runtime.
Avatar source.imx you build once from a photo or video.Any face image — provide at runtime, no build step.
Custom gesturesYes (wave, nod, laugh, etc.)No
Idle animationPre-recorded natural movementAI-generated micro-movements
Compute neededAny modern CPUApple Silicon M3+ (demo apps) or NVIDIA GPU
Memory footprintLow (~200–500 MB)Higher (~2–6 GB)
Best forKiosks, mobile, edge, 24/7 deployments, high concurrencyClose-up native consumer apps, custom faces per session
Pricing1 credit/min self-hosted · 2 credits/min cloud2 credits/min self-hosted · 4 credits/min cloud

Both ship to every surface — SDKs, REST API, LiveKit plugin, CLI, on-device, embed widget. The same .imx file works everywhere.

Where each model runs

SurfaceEssenceExpression
iOS / iPadOSiPhone 17 Pro+, iPad Pro M4+iPad Pro M4+ only
macOS arm64Any Apple SiliconM3+
macOS IntelPending (2.3 ships arm64 only)
Androidarm64-v8a, Android 10+
Linux x86_64 / aarch64Any modern CPUvia NVIDIA GPU (Docker)
WindowsPending (use WSL2 today)
Raspberry Pi 4B+Supported
bitHuman CloudManagedManaged
Self-hosted CPUPython SDK / LiveKit plugin
Self-hosted GPUDocker container

Native macOS-Intel and Windows wheels are pending for the 2.3 line; the architecture page tracks per-platform shipping status. iPhone Expression is not currently supported — use Essence on iPhone.

Essence

Essence packages a complete avatar identity (face, body, gestures) into an .imx file. At runtime, the SDK plays back pre-rendered base motion and patches the mouth region in real time to match incoming audio.

Runtime characteristics

  • ~200–500 MB resident, 1–2 CPU cores, real-time at 25 FPS.
  • Runs on macOS arm64, Linux x86_64 / aarch64, iOS, iPadOS, Android, Raspberry Pi 4B+, and in the browser via WASM.
  • No idle timeout — sessions can run 24/7. Reliable for unattended kiosks and lobby displays.
  • Supports custom gestures (wave, nod, laugh) triggered by keywords or API.
  • Predictable, consistent behavior. Lower per-stream cost — the right pick for high-concurrency self-hosted deployments.

Try it from the showcase

The CLI ships a curated set of ready-to-run Essence .imx avatars:

bithuman list                          # browse the showcase
bithuman pull modern-court-jester      # downloads to ~/.cache/bithuman/showcase/<slug>.imx
bithuman run modern-court-jester.imx   # live browser-served avatar

How to ship it

Expression

Expression generates real-time facial animation directly from a portrait image. The face can change between sessions or even mid-session — no avatar build step is required.

Runtime characteristics

  • ~2–6 GB resident; needs Apple Silicon M3+ (Mac) / M4+ (iPad Pro) or an NVIDIA GPU (8 GB+ VRAM).
  • Works with any face image — drag-and-drop swap, photo, video frame, anything.
  • AI-driven expressions adapt to speech content and emotional context.
  • Higher visual fidelity for close-up conversational interactions.
  • On-device demo apps target macOS M3+ and iPad Pro M4+. iPhone Expression and macOS-Intel are not currently supported.
  • On Apple Silicon the Swift SDK auto-spawns a bithuman-expression-daemon subprocess to drive the model.

How to ship it

Which should I use?

24/7 kiosk or always-on display

Essence. No idle timeout, runs on CPU, predictable for unattended deployments.

iPhone app

Essence. Expression on iPhone isn’t currently supported — iPad and Mac are the on-device Expression hosts.

Android app

Essence via the Kotlin SDK (Beta).

Native Mac or iPad app with close-up dynamic faces

Expression on-device via the Swift SDK or the Mac/iPad reference apps.

Need custom gestures (wave, nod, laugh)

Essence. Expression doesn’t support gesture triggers.

Quickest setup with any face photo

Expression via the cloud plugin. Pass the image at session start — no build step.

Voice agent on LiveKit with maximum concurrency

Essence. Lower per-stream cost makes it the right pick for high-concurrency deployments.

Edge hardware (Raspberry Pi, low-power laptop)

Essence. Runs on 1–2 CPU cores at 25 FPS.

Highest visual quality for offline video generation

Expression with quality="high". Best for offline batch jobs rather than real-time streaming.

Where to go next