Documentation Index
Fetch the complete documentation index at: https://docs.bithuman.ai/llms.txt
Use this file to discover all available pages before exploring further.
The big picture
A bitHuman avatar is a virtual character that moves its lips, face, and body in real time based on audio input. Here’s what happens when someone talks to an avatar:Key concepts
Avatar model — Essence (.imx) vs Expression
Avatar model — Essence (.imx) vs Expression
bitHuman ships two avatar models.Essence uses a pre-built
.imx model file. Build it once from a photo or video on bithuman.ai, then run it anywhere — CPU only, no GPU. Supports gestures, animal mode, and full body. Ideal for kiosks and 24/7 displays.Expression generates facial animation from any face image at runtime — no .imx step. Needs an NVIDIA GPU (server-side) or Apple Silicon M3+ (on-device). Ideal for dynamic faces and consumer apps.Side-by-side comparison: Essence vs Expression →.LiveKit room (for real-time conversation)
LiveKit room (for real-time conversation)
A room is a virtual meeting space where participants communicate in real time using audio and video — similar to a Zoom or Google Meet call.In a bitHuman session, the room typically has:
- Your user — the person talking to the avatar
- An AI agent — handles conversation logic (speech-to-text, AI response, text-to-speech)
- The avatar — renders animated video frames based on the agent’s speech
Avatar Session (LiveKit integration)
Avatar Session (LiveKit integration)
An AvatarSession is the integration point that connects your AI agent to a bitHuman avatar inside a LiveKit room.When you create an
AvatarSession, bitHuman:- Loads the avatar model (cloud or local)
- Joins the LiveKit room as a participant
- Listens for audio from your AI agent
- Generates animated video frames in real time
- Publishes the video back to the room
API secret
API secret
Your API secret authenticates your application with bitHuman services. Create one at Developer → API Keys.It’s used for:
- Verifying your identity
- Tracking usage and billing (2 credits/min for Expression, 1–2 cr/min for Essence; see pricing)
- Downloading cloud avatar models
VoiceChatConfig.apiKey or the BITHUMAN_API_KEY env var; the Python SDK and REST API use the api-secret header or BITHUMAN_API_SECRET env var.The full matrix
bitHuman’s two models map onto three runtime surfaces. Find your row, then follow the links.| bitHuman Cloud | Self-hosted server | On-device (Apple Silicon) | |
|---|---|---|---|
Essence (CPU, .imx) | Cloud Plugin — avatar_id + API secret | Python SDK — pip install bithuman, run .imx locally | — |
| Expression (GPU / M3+) | Cloud Plugin — avatar_id or any face image, model="expression" | Docker container — Linux + NVIDIA, dynamic face from any image | Swift SDK + bithuman-cli — Mac/iPad/iPhone, all inference on-device |
| Direct REST control (any model) | REST API — api.bithuman.ai/v1/... for agent generation, speak, dynamics, embed tokens | Same REST API works against your hosted agents | n/a |
Which approach should I use?
Start here:- Building a Mac/iPad/iPhone app? → Apple Silicon Swift SDK — runs on the user’s device, no infra.
- Building a website or web app? → Cloud Plugin — fastest, scales for you.
- Need a 24/7 kiosk on a tiny CPU box? → Self-Hosted CPU (Essence) — no GPU, no idle timeout.
- Need dynamic faces + on-prem privacy? → Self-Hosted GPU (Expression) — Docker on your NVIDIA box.
- Just calling endpoints from a backend? → REST API —
curlagainstapi.bithuman.ai, any language.
Side-by-side
| Cloud Plugin | Self-Hosted CPU | Self-Hosted GPU | Apple Silicon Swift SDK | |
|---|---|---|---|---|
| Setup time | ~2 min | ~5 min | ~10 min | ~10 min |
| Compute | bitHuman cloud | Your CPU | Your GPU (8 GB+ VRAM) | User’s Apple Silicon GPU + Neural Engine |
| Network | Cloud round-trip per turn | None after auth | None after auth | Heartbeat only (avatar mode); none in audio-only mode |
| Avatar source | Pre-built agent ID, or face image (Expression) | .imx model file | Any face image | .bhx weights bundle + bundled / drag-dropped portraits |
| Where it runs | Server | Server | Server | End-user’s Mac / iPad / iPhone |
| Models supported | Essence, Expression | Essence | Expression | Expression |
| Best for | Web apps, quick demos, scaling | Edge, offline, privacy | Dynamic faces, high volume | Native consumer apps, privacy-strict verticals |
Four ways to use bitHuman
Cloud Plugin
Easiest. Avatar runs on bitHuman’s servers.No model files to manage. Provide an Agent ID and API secret. Works with both Essence and Expression.Best for: getting started quickly, web apps, and production deployments.
Self-Hosted CPU (Essence)
Most private (server-side). Avatar runs on your machine.Download an
.imx model and run locally. Works offline after auth. Python SDK (pip install bithuman).Best for: privacy-sensitive backends, edge servers, kiosks.Self-Hosted GPU (Expression)
Most flexible (server-side). GPU container on your infrastructure.Linux + NVIDIA Docker image. Use any face image to create avatars on-the-fly. No pre-built models needed.Best for: dynamic avatars, high volume, full infrastructure control.
Apple Silicon Swift SDK
Most user-private. Avatar runs on the end-user’s device.
bitHumanKit Swift Package for Mac/iPad/iPhone. Drop in voice + lip-synced avatar; everything inferences locally. Ships with bithuman-cli (Homebrew) and three reference apps.Best for: native consumer / pro-sumer apps, offline-first products, privacy-strict verticals.How the avatar joins a room
This describes the three server-side surfaces (Cloud, Self-hosted CPU, Self-hosted GPU). The Apple Silicon Swift SDK doesn’t use LiveKit — see Swift SDK quickstart for that flow instead.Your agent connects to a LiveKit room
Your AI agent (the code you write) connects to a LiveKit room and waits for a user to join. This is where the conversation will happen.
You create an AvatarSession
In your agent code, you create a
bithuman.AvatarSession with either a cloud avatar_id or a local model_path. This tells bitHuman which avatar to use.The avatar session starts
When you call
avatar.start(session, room=ctx.room), bitHuman:- Cloud mode: Sends a request to bitHuman’s servers, which launch an avatar worker that joins your room
- Self-hosted mode: Loads the
.imxmodel (Essence) or hits your GPU container (Expression) and starts generating frames
The avatar appears in the room
The avatar joins the LiveKit room as a video participant. Users in the room see the avatar’s video feed — a lifelike face that moves and speaks.
Visual flow
What you need
| Component | What it is | Where to get it |
|---|---|---|
| API secret / API key | Authenticates your app | Developer → API Keys |
| Avatar source | Essence: .imx model. Expression: any face image (server) or bundled portrait (Swift) | Explore page or your own photo/video |
| LiveKit server (server-side flows only) | Real-time communication | LiveKit Cloud (free tier) or self-hosted |
| AI agent (server-side flows) | Conversation logic | Your code + an LLM (OpenAI, Anthropic, etc.) — the Swift SDK includes an on-device LLM |
Next steps
Quickstart (Python)
Get an avatar running in 5 minutes
Quickstart (Swift)
On-device voice + avatar in 10 minutes
Avatar Sessions
Cloud, CPU, GPU — every mode with code
Examples
Working examples for every surface
