Skip to main content
bitHuman offers two avatar models. Pick the one that fits your use case.

At a Glance

EssenceExpression
What it doesPre-recorded animations with real-time lip-syncAI-generated facial expressions from any photo
Input.imx model file (generated from your photo/video)Any face image
Runs onCPU (any device)GPU (cloud or NVIDIA) or Apple Silicon (M3+)
Lip-syncYesYes
Custom gesturesYes (wave, nod, laugh, etc.)No
Idle animationsPre-recorded (natural movement loops)AI-generated micro-movements
Session timeoutNone (runs 24/7)10 min idle (GPU protection)
Best forKiosks, always-on displays, consistent brandingDynamic conversations, custom faces, quick prototyping

Essence

The Essence model uses pre-built avatar animations with real-time lip-sync. You create an agent on bithuman.ai from a photo or video, which generates an .imx model file containing idle loops, talking animations, and gesture videos. Strengths:
  • Runs on CPU only — no GPU required. Works on Raspberry Pi, laptops, edge devices.
  • Supports custom gestures (wave, nod, laugh) triggered by keywords or API.
  • No idle timeout — sessions run indefinitely. Ideal for 24/7 museum kiosks and lobby displays.
  • Consistent, predictable avatar behavior with smooth transitions between states.
Deployment options:

Expression

The Expression model generates real-time facial expressions using a GPU-accelerated diffusion pipeline. It takes any face image and produces dynamic lip-sync, eye movement, and emotional expressions — no pre-built model file needed. Strengths:
  • Works with any face image — no avatar generation step required.
  • AI-driven expressions respond to speech content and context.
  • Higher visual fidelity for close-up conversational interactions.
Deployment options:
  • Cloud Plugin — bitHuman hosts the GPU worker (add model="expression")
  • Self-Hosted GPU — run the Docker container on your own NVIDIA GPU
  • macOS Local — run natively on Apple Silicon M3+ via the Swift SDK

Which Should I Use?

Use Essence. It has no idle timeout and runs on CPU, making it reliable for unattended deployments.
Use Expression with the Cloud Plugin. Just pass avatar_image=Image.open("face.jpg") — no generation step.
Use Essence. Expression does not support gesture triggers.
Either model works. Start with Essence (default) for reliability, switch to Expression for dynamic faces.
Use Essence. It runs on CPU with 1-2 cores at 25 FPS.
Use Expression with quality="high". Best for offline video generation, not real-time streaming.