The Big Picture
A bitHuman avatar is a virtual character that moves its lips, face, and body in real-time based on audio input. Here’s what happens when someone talks to an avatar:Key Concepts
Avatar Model (.imx file)
Avatar Model (.imx file)
An
.imx file is a pre-built avatar model. It contains everything needed to animate a specific character: face data, lip-sync mappings, and appearance information.Think of it like a “character file” in a video game — it defines what the avatar looks like and how it moves.You can create your own avatar from any photo or video at bithuman.ai, or download models from the Explore page.LiveKit Room
LiveKit Room
A room is a virtual meeting space where participants communicate in real-time using audio and video — similar to a Zoom or Google Meet call.In a bitHuman session, the room typically has:
- Your user — the person talking to the avatar
- An AI agent — handles conversation logic (speech-to-text, AI response, text-to-speech)
- The avatar — renders animated video frames based on the agent’s speech
Avatar Session
Avatar Session
An AvatarSession is the main integration point. It connects your AI agent to a bitHuman avatar inside a LiveKit room.When you create an
AvatarSession, bitHuman:- Loads the avatar model (cloud or local)
- Joins the LiveKit room as a participant
- Listens for audio from your AI agent
- Generates animated video frames in real-time
- Publishes the video back to the room
API Secret
API Secret
Your API secret is the key that authenticates your application with bitHuman services. You can create one from Developer → API Keys.It’s used for:
- Verifying your identity
- Tracking usage and billing
- Downloading cloud avatar models
Which Approach Should I Use?
Start here:- No GPU? → Use Cloud Plugin (easiest) or Self-Hosted CPU (most private)
- Have a GPU? → Use Self-Hosted GPU for dynamic face images without pre-built models
- Want the fastest setup? → Cloud Plugin — just an API secret and agent ID
- Need privacy? → Self-Hosted CPU — audio never leaves your machine
| Cloud Plugin | Self-Hosted CPU | Self-Hosted GPU | |
|---|---|---|---|
| Setup time | ~2 min | ~5 min | ~10 min |
| GPU required | No | No | Yes (8 GB+ VRAM) |
| Privacy | Audio sent to cloud | Audio stays local | Audio stays local |
| Avatar source | Pre-built agent ID | .imx model file | Any face image |
| Best for | Web apps, quick demos | Edge, offline, privacy | Dynamic faces, high volume |
Three Ways to Use bitHuman
Choose the approach that fits your project:Cloud Plugin
Easiest. Avatar runs on bitHuman’s servers.No model files to manage. Just provide an Agent ID and API secret.Best for: getting started quickly, web apps, and production deployments.
Self-Hosted (CPU)
Most private. Avatar runs on your machine.Download an
.imx model and run locally. Works offline after setup.Best for: privacy-sensitive apps, edge devices, custom deployments.Self-Hosted GPU
Most flexible. GPU container on your infrastructure.Use any face image to create avatars on-the-fly. No pre-built models needed.Best for: dynamic avatars, high-volume, full infrastructure control.
How the Avatar Joins a Room
Here’s what happens step-by-step when an avatar session starts:Your agent connects to a LiveKit room
Your AI agent (the code you write) connects to a LiveKit room and waits for a user to join. This is where the conversation will happen.
You create an AvatarSession
In your agent code, you create a
bithuman.AvatarSession with either a cloud avatar_id or a local model_path. This tells bitHuman which avatar to use.The avatar session starts
When you call
avatar.start(session, room=ctx.room), bitHuman:- Cloud mode: Sends a request to bitHuman’s servers, which launch an avatar worker that joins your room
- Self-hosted mode: Loads the
.imxmodel locally and starts generating frames
The avatar appears in the room
The avatar joins the LiveKit room as a video participant. Users in the room see the avatar’s video feed — a lifelike face that moves and speaks.
Visual Flow
What You Need
| Component | What it is | Where to get it |
|---|---|---|
| API Secret | Authenticates your app | Developer → API Keys |
| Avatar Model | The character to animate | Explore page or create your own |
| LiveKit Server | Real-time communication | LiveKit Cloud (free tier) or self-hosted |
| AI Agent | Conversation logic | Your code + an LLM (OpenAI, Anthropic, etc.) |
