Skip to main content

The Big Picture

A bitHuman avatar is a virtual character that moves its lips, face, and body in real-time based on audio input. Here’s what happens when someone talks to an avatar:
You speak into a microphone

Audio is sent to an AI agent (like ChatGPT)

The AI generates a text response

Text is converted to speech (TTS)

bitHuman animates the avatar's face to match the speech

You see a lifelike avatar talking back to you
All of this happens in real-time — fast enough for a natural conversation.

Key Concepts

An .imx file is a pre-built avatar model. It contains everything needed to animate a specific character: face data, lip-sync mappings, and appearance information.Think of it like a “character file” in a video game — it defines what the avatar looks like and how it moves.You can create your own avatar from any photo or video at bithuman.ai, or download models from the Explore page.
A room is a virtual meeting space where participants communicate in real-time using audio and video — similar to a Zoom or Google Meet call.In a bitHuman session, the room typically has:
  • Your user — the person talking to the avatar
  • An AI agent — handles conversation logic (speech-to-text, AI response, text-to-speech)
  • The avatar — renders animated video frames based on the agent’s speech
LiveKit is the open-source platform that powers this real-time communication. You don’t need to understand LiveKit deeply — bitHuman handles the complex parts.
An AvatarSession is the main integration point. It connects your AI agent to a bitHuman avatar inside a LiveKit room.When you create an AvatarSession, bitHuman:
  1. Loads the avatar model (cloud or local)
  2. Joins the LiveKit room as a participant
  3. Listens for audio from your AI agent
  4. Generates animated video frames in real-time
  5. Publishes the video back to the room
You interact with just a few lines of code — the session handles everything else.
Your API secret is the key that authenticates your application with bitHuman services. You can create one from Developer → API Keys.It’s used for:
  • Verifying your identity
  • Tracking usage and billing
  • Downloading cloud avatar models

Which Approach Should I Use?

Start here:
  • No GPU? → Use Cloud Plugin (easiest) or Self-Hosted CPU (most private)
  • Have a GPU? → Use Self-Hosted GPU for dynamic face images without pre-built models
  • Want the fastest setup? → Cloud Plugin — just an API secret and agent ID
  • Need privacy? → Self-Hosted CPU — audio never leaves your machine
Cloud PluginSelf-Hosted CPUSelf-Hosted GPU
Setup time~2 min~5 min~10 min
GPU requiredNoNoYes (8 GB+ VRAM)
PrivacyAudio sent to cloudAudio stays localAudio stays local
Avatar sourcePre-built agent ID.imx model fileAny face image
Best forWeb apps, quick demosEdge, offline, privacyDynamic faces, high volume

Three Ways to Use bitHuman

Choose the approach that fits your project:

How the Avatar Joins a Room

Here’s what happens step-by-step when an avatar session starts:
1

Your agent connects to a LiveKit room

Your AI agent (the code you write) connects to a LiveKit room and waits for a user to join. This is where the conversation will happen.
2

You create an AvatarSession

In your agent code, you create a bithuman.AvatarSession with either a cloud avatar_id or a local model_path. This tells bitHuman which avatar to use.
3

The avatar session starts

When you call avatar.start(session, room=ctx.room), bitHuman:
  • Cloud mode: Sends a request to bitHuman’s servers, which launch an avatar worker that joins your room
  • Self-hosted mode: Loads the .imx model locally and starts generating frames
4

The avatar appears in the room

The avatar joins the LiveKit room as a video participant. Users in the room see the avatar’s video feed — a lifelike face that moves and speaks.
5

Real-time conversation begins

As your AI agent produces speech audio, the avatar animates in real-time:
  • Audio from TTS flows to the avatar
  • The avatar lip-syncs and generates video frames at 25 FPS
  • Video is published to the room for all participants to see

Visual Flow

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│   Your User  │     │  AI Agent    │     │   Avatar     │
│  (browser)   │     │  (your code) │     │  (bitHuman)  │
└──────┬───────┘     └──────┬───────┘     └──────┬───────┘
       │                    │                    │
       │   User speaks      │                    │
       │ ──────────────────>│                    │
       │                    │                    │
       │    AI processes    │                    │
       │    & responds      │                    │
       │                    │  TTS audio         │
       │                    │ ──────────────────>│
       │                    │                    │
       │                    │  Animated video    │
       │<───────────────────│<───────────────────│
       │                    │                    │
       │  User sees avatar  │                    │
       │  speaking          │                    │
       └────────────────────┴────────────────────┘
                    LiveKit Room

What You Need

ComponentWhat it isWhere to get it
API SecretAuthenticates your appDeveloper → API Keys
Avatar ModelThe character to animateExplore page or create your own
LiveKit ServerReal-time communicationLiveKit Cloud (free tier) or self-hosted
AI AgentConversation logicYour code + an LLM (OpenAI, Anthropic, etc.)

Next Steps