Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.bithuman.ai/llms.txt

Use this file to discover all available pages before exploring further.

The big picture

A bitHuman avatar is a virtual character that moves its lips, face, and body in real time based on audio input. Here’s what happens when someone talks to an avatar:
You speak into a microphone

Audio is sent to an AI agent (e.g. ChatGPT)

The AI generates a text response

Text is converted to speech (TTS)

bitHuman animates the avatar's face to match the speech

You see a lifelike avatar talking back to you
All of this happens fast enough for a natural conversation.

Key concepts

bitHuman ships two avatar models.Essence uses a pre-built .imx model file. Build it once from a photo or video on bithuman.ai, then run it anywhere — CPU only, no GPU. Supports gestures, animal mode, and full body. Ideal for kiosks and 24/7 displays.Expression generates facial animation from any face image at runtime — no .imx step. Needs an NVIDIA GPU (server-side) or Apple Silicon M3+ (on-device). Ideal for dynamic faces and consumer apps.Side-by-side comparison: Essence vs Expression →.
A room is a virtual meeting space where participants communicate in real time using audio and video — similar to a Zoom or Google Meet call.In a bitHuman session, the room typically has:
  • Your user — the person talking to the avatar
  • An AI agent — handles conversation logic (speech-to-text, AI response, text-to-speech)
  • The avatar — renders animated video frames based on the agent’s speech
LiveKit is the open-source platform that powers this real-time communication. You don’t need to understand LiveKit deeply — bitHuman handles the complex parts.Note: real-time rooms are only used for interactive conversation flows. Batch video generation and the on-device Swift SDK don’t use LiveKit at all.
An AvatarSession is the integration point that connects your AI agent to a bitHuman avatar inside a LiveKit room.When you create an AvatarSession, bitHuman:
  1. Loads the avatar model (cloud or local)
  2. Joins the LiveKit room as a participant
  3. Listens for audio from your AI agent
  4. Generates animated video frames in real time
  5. Publishes the video back to the room
A few lines of code — the session handles everything else. See deployment/avatar-sessions.
Your API secret authenticates your application with bitHuman services. Create one at Developer → API Keys.It’s used for:
  • Verifying your identity
  • Tracking usage and billing (2 credits/min for Expression, 1–2 cr/min for Essence; see pricing)
  • Downloading cloud avatar models
The Swift SDK reads it from VoiceChatConfig.apiKey or the BITHUMAN_API_KEY env var; the Python SDK and REST API use the api-secret header or BITHUMAN_API_SECRET env var.

The full matrix

bitHuman’s two models map onto three runtime surfaces. Find your row, then follow the links.
bitHuman CloudSelf-hosted serverOn-device (Apple Silicon)
Essence (CPU, .imx)Cloud Pluginavatar_id + API secretPython SDKpip install bithuman, run .imx locally
Expression (GPU / M3+)Cloud Pluginavatar_id or any face image, model="expression"Docker container — Linux + NVIDIA, dynamic face from any imageSwift SDK + bithuman-cli — Mac/iPad/iPhone, all inference on-device
Direct REST control (any model)REST APIapi.bithuman.ai/v1/... for agent generation, speak, dynamics, embed tokensSame REST API works against your hosted agentsn/a

Which approach should I use?

Start here:

Side-by-side

Cloud PluginSelf-Hosted CPUSelf-Hosted GPUApple Silicon Swift SDK
Setup time~2 min~5 min~10 min~10 min
ComputebitHuman cloudYour CPUYour GPU (8 GB+ VRAM)User’s Apple Silicon GPU + Neural Engine
NetworkCloud round-trip per turnNone after authNone after authHeartbeat only (avatar mode); none in audio-only mode
Avatar sourcePre-built agent ID, or face image (Expression).imx model fileAny face image.bhx weights bundle + bundled / drag-dropped portraits
Where it runsServerServerServerEnd-user’s Mac / iPad / iPhone
Models supportedEssence, ExpressionEssenceExpressionExpression
Best forWeb apps, quick demos, scalingEdge, offline, privacyDynamic faces, high volumeNative consumer apps, privacy-strict verticals

Four ways to use bitHuman

Cloud Plugin

Easiest. Avatar runs on bitHuman’s servers.No model files to manage. Provide an Agent ID and API secret. Works with both Essence and Expression.Best for: getting started quickly, web apps, and production deployments.

Self-Hosted CPU (Essence)

Most private (server-side). Avatar runs on your machine.Download an .imx model and run locally. Works offline after auth. Python SDK (pip install bithuman).Best for: privacy-sensitive backends, edge servers, kiosks.

Self-Hosted GPU (Expression)

Most flexible (server-side). GPU container on your infrastructure.Linux + NVIDIA Docker image. Use any face image to create avatars on-the-fly. No pre-built models needed.Best for: dynamic avatars, high volume, full infrastructure control.

Apple Silicon Swift SDK

Most user-private. Avatar runs on the end-user’s device.bitHumanKit Swift Package for Mac/iPad/iPhone. Drop in voice + lip-synced avatar; everything inferences locally. Ships with bithuman-cli (Homebrew) and three reference apps.Best for: native consumer / pro-sumer apps, offline-first products, privacy-strict verticals.

How the avatar joins a room

This describes the three server-side surfaces (Cloud, Self-hosted CPU, Self-hosted GPU). The Apple Silicon Swift SDK doesn’t use LiveKit — see Swift SDK quickstart for that flow instead.
1

Your agent connects to a LiveKit room

Your AI agent (the code you write) connects to a LiveKit room and waits for a user to join. This is where the conversation will happen.
2

You create an AvatarSession

In your agent code, you create a bithuman.AvatarSession with either a cloud avatar_id or a local model_path. This tells bitHuman which avatar to use.
3

The avatar session starts

When you call avatar.start(session, room=ctx.room), bitHuman:
  • Cloud mode: Sends a request to bitHuman’s servers, which launch an avatar worker that joins your room
  • Self-hosted mode: Loads the .imx model (Essence) or hits your GPU container (Expression) and starts generating frames
4

The avatar appears in the room

The avatar joins the LiveKit room as a video participant. Users in the room see the avatar’s video feed — a lifelike face that moves and speaks.
5

Real-time conversation begins

As your AI agent produces speech audio, the avatar animates in real time:
  • Audio from TTS flows to the avatar
  • The avatar lip-syncs and generates video frames at 25 FPS
  • Video is published to the room for all participants to see

Visual flow

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│   Your User  │     │  AI Agent    │     │   Avatar     │
│  (browser)   │     │  (your code) │     │  (bitHuman)  │
└──────┬───────┘     └──────┬───────┘     └──────┬───────┘
       │                    │                    │
       │   User speaks      │                    │
       │ ──────────────────>│                    │
       │                    │                    │
       │    AI processes    │                    │
       │    & responds      │                    │
       │                    │  TTS audio         │
       │                    │ ──────────────────>│
       │                    │                    │
       │                    │  Animated video    │
       │<───────────────────│<───────────────────│
       │                    │                    │
       │  User sees avatar  │                    │
       │  speaking          │                    │
       └────────────────────┴────────────────────┘
                    LiveKit Room
For the on-device Swift flow, every box above lives inside the user’s app — speech recognition, LLM, TTS, and lip-sync all run on Apple Silicon.

What you need

ComponentWhat it isWhere to get it
API secret / API keyAuthenticates your appDeveloper → API Keys
Avatar sourceEssence: .imx model. Expression: any face image (server) or bundled portrait (Swift)Explore page or your own photo/video
LiveKit server (server-side flows only)Real-time communicationLiveKit Cloud (free tier) or self-hosted
AI agent (server-side flows)Conversation logicYour code + an LLM (OpenAI, Anthropic, etc.) — the Swift SDK includes an on-device LLM

Next steps

Quickstart (Python)

Get an avatar running in 5 minutes

Quickstart (Swift)

On-device voice + avatar in 10 minutes

Avatar Sessions

Cloud, CPU, GPU — every mode with code

Examples

Working examples for every surface