Skip to main content
An AvatarSession is how you bring a bitHuman avatar into a LiveKit room. This guide covers every way to do it, with complete working examples.
New to bitHuman? Start with How It Works to understand the core concepts first.

Choose Your Approach

ApproachBest ForModel FilesGPU RequiredInternet Required
Cloud PluginGetting started, web appsNoNoYes
Self-Hosted CPUPrivacy, edge devicesYes (.imx)NoOnly for auth
Self-Hosted GPUDynamic faces, custom imagesNo (uses images)YesOnly for auth

Prerequisites

All approaches need these basics: You also need a LiveKit server. If you don’t have one:
# Option 1: LiveKit Cloud (easiest)
# Sign up at https://cloud.livekit.io — free tier available

# Option 2: Self-hosted LiveKit
docker run --rm -p 7880:7880 -p 7881:7881 -p 7882:7882/udp \
    livekit/livekit-server --dev

Cloud Plugin

The cloud plugin runs the avatar on bitHuman’s servers. You just provide an Agent ID and API secret — no model files, no GPU.

Complete Working Example

import asyncio
import os
from livekit.agents import (
    Agent,
    AgentSession,
    JobContext,
    RoomOutputOptions,
    WorkerOptions,
    cli,
    llm,
)
from livekit.plugins import openai, silero, bithuman

# 1. Define your AI agent
class MyAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="""You are a helpful and friendly assistant.
            Keep responses concise — 1-2 sentences.""",
        )

# 2. Set up the session when a user connects
async def entrypoint(ctx: JobContext):
    await ctx.connect()

    # Wait for a user to join the room
    await ctx.wait_for_participant()

    # Create the avatar session (cloud-hosted)
    avatar = bithuman.AvatarSession(
        avatar_id=os.getenv("BITHUMAN_AGENT_ID"),    # e.g. "A78WKV4515"
        api_secret=os.getenv("BITHUMAN_API_SECRET"),
    )

    # Create the agent session with AI components
    session = AgentSession(
        stt=openai.STT(),                 # Speech-to-text
        llm=openai.LLM(),                 # AI language model
        tts=openai.TTS(),                 # Text-to-speech
        vad=silero.VAD.load(),            # Voice activity detection
    )

    # Start everything — avatar joins the room automatically
    await avatar.start(session, room=ctx.room)

    await session.start(
        agent=MyAgent(),
        room=ctx.room,
        room_output_options=RoomOutputOptions(audio_enabled=False),
    )

# 3. Launch
if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Environment Variables

# Required
export BITHUMAN_API_SECRET="your_api_secret"          # From www.bithuman.ai/#developer
export BITHUMAN_AGENT_ID="A78WKV4515"           # Your agent's ID
export OPENAI_API_KEY="sk-..."                   # For STT, LLM, TTS

# LiveKit connection
export LIVEKIT_URL="wss://your-project.livekit.cloud"
export LIVEKIT_API_KEY="APIxxxxxxxx"
export LIVEKIT_API_SECRET="xxxxxxxx"

Run It

python agent.py dev
Then open agents-playground.livekit.io to connect and talk to your avatar.

How It Works Behind the Scenes

When avatar.start() and session.start() run:
  1. The plugin sends a request to bitHuman’s cloud API
  2. A cloud avatar worker receives the request
  3. The worker downloads the avatar model (cached after first time)
  4. The worker joins your LiveKit room as a participant named bithuman-avatar-agent
  5. As your agent produces TTS audio, the worker generates animated video frames
  6. Video is published to the room — users see the avatar speaking
Essence vs Expression model: By default, the cloud plugin uses the Essence (CPU) model, which works with pre-built .imx avatars. Add model="expression" to use the Expression (GPU) model, which supports custom face images.

Using Expression Model (GPU) with Custom Image

from PIL import Image

avatar = bithuman.AvatarSession(
    avatar_image=Image.open("face.jpg"),    # Any face image
    api_secret=os.getenv("BITHUMAN_API_SECRET"),
    model="expression",
)

Self-Hosted CPU

Run the avatar entirely on your own machine using a downloaded .imx model file. Great for privacy and offline use.

Complete Working Example

import asyncio
import os
from livekit.agents import (
    Agent,
    AgentSession,
    JobContext,
    RoomOutputOptions,
    WorkerOptions,
    cli,
    llm,
)
from livekit.plugins import openai, silero, bithuman

class MyAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="You are a helpful assistant. Keep responses brief.",
        )

async def entrypoint(ctx: JobContext):
    await ctx.connect()
    await ctx.wait_for_participant()

    # Create the avatar session (self-hosted, CPU)
    avatar = bithuman.AvatarSession(
        model_path=os.getenv("BITHUMAN_MODEL_PATH"),  # e.g. "/models/avatar.imx"
        api_secret=os.getenv("BITHUMAN_API_SECRET"),
    )

    session = AgentSession(
        stt=openai.STT(),
        llm=openai.LLM(),
        tts=openai.TTS(),
        vad=silero.VAD.load(),
    )

    await avatar.start(session, room=ctx.room)

    await session.start(
        agent=MyAgent(),
        room=ctx.room,
        room_output_options=RoomOutputOptions(audio_enabled=False),
    )

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Environment Variables

# Required
export BITHUMAN_API_SECRET="your_api_secret"
export BITHUMAN_MODEL_PATH="/path/to/avatar.imx"
export OPENAI_API_KEY="sk-..."

# LiveKit connection
export LIVEKIT_URL="wss://your-project.livekit.cloud"
export LIVEKIT_API_KEY="APIxxxxxxxx"
export LIVEKIT_API_SECRET="xxxxxxxx"

How It Differs from Cloud

AspectCloudSelf-Hosted CPU
Model locationbitHuman’s serversYour machine
Avatar parameteravatar_id="A78WKV4515"model_path="/path/to/avatar.imx"
Internet neededYes (always)Only for authentication
First frame latency2-4 seconds~20 seconds (model load)
PrivacyAudio sent to cloudAudio stays local

System Requirements

  • CPU: 4+ cores (8 recommended)
  • RAM: 8 GB minimum
  • Disk: ~500 MB per .imx model
  • OS: Linux (x64/ARM64), macOS (M2+), or Windows (WSL)

Self-Hosted GPU

Use a GPU container that generates avatars from any face image — no pre-built models needed.

Complete Working Example

import asyncio
import os
from livekit.agents import (
    Agent,
    AgentSession,
    JobContext,
    RoomOutputOptions,
    WorkerOptions,
    cli,
    llm,
)
from livekit.plugins import openai, silero, bithuman

class MyAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="You are a helpful assistant. Keep responses brief.",
        )

async def entrypoint(ctx: JobContext):
    await ctx.connect()
    await ctx.wait_for_participant()

    # Create the avatar session (self-hosted GPU container)
    avatar = bithuman.AvatarSession(
        api_url=os.getenv("CUSTOM_GPU_URL", "http://localhost:8089/launch"),
        api_secret=os.getenv("BITHUMAN_API_SECRET"),
        avatar_image="https://example.com/face.jpg",    # Any face image URL
    )

    session = AgentSession(
        stt=openai.STT(),
        llm=openai.LLM(),
        tts=openai.TTS(),
        vad=silero.VAD.load(),
    )

    await avatar.start(session, room=ctx.room)

    await session.start(
        agent=MyAgent(),
        room=ctx.room,
        room_output_options=RoomOutputOptions(audio_enabled=False),
    )

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Start the GPU Container First

# Pull and run the GPU avatar container
docker run --gpus all -p 8089:8089 \
    -v /path/to/model-storage:/data/models \
    -e BITHUMAN_API_SECRET=your_api_secret \
    docker.io/sgubithuman/expression-avatar:latest

Environment Variables

# Required
export BITHUMAN_API_SECRET="your_api_secret"
export CUSTOM_GPU_URL="http://localhost:8089/launch"
export OPENAI_API_KEY="sk-..."

# LiveKit connection
export LIVEKIT_URL="wss://your-project.livekit.cloud"
export LIVEKIT_API_KEY="APIxxxxxxxx"
export LIVEKIT_API_SECRET="xxxxxxxx"
For detailed GPU container setup, see Self-Hosted GPU Container.

Adding Gestures (Dynamics)

Make your avatar perform gestures like waving, nodding, or laughing in response to conversation keywords.
Dynamics require a cloud-generated agent with gestures enabled. Create one at www.bithuman.ai.

Step 1: Check Available Gestures

import requests

agent_id = "A78WKV4515"
headers = {"api-secret": os.getenv("BITHUMAN_API_SECRET")}

resp = requests.get(
    f"https://api.bithuman.ai/v1/dynamics/{agent_id}",
    headers=headers,
)
gestures = resp.json()["data"].get("gestures", {})
print(list(gestures.keys()))
# Example: ["mini_wave_hello", "talk_head_nod_subtle", "laugh_react"]

Step 2: Trigger Gestures from Keywords

from livekit.agents import AgentSession, UserInputTranscribedEvent
from bithuman.api import VideoControl

KEYWORD_ACTION_MAP = {
    "hello": "mini_wave_hello",
    "hi": "mini_wave_hello",
    "funny": "laugh_react",
    "laugh": "laugh_react",
    "yes": "talk_head_nod_subtle",
}

# Inside your entrypoint, after session.start():
@session.on("user_input_transcribed")
def on_transcribed(event: UserInputTranscribedEvent):
    if not event.is_final:
        return
    text = event.transcript.lower()
    for keyword, action in KEYWORD_ACTION_MAP.items():
        if keyword in text:
            asyncio.create_task(
                avatar.runtime.push(VideoControl(action=action))
            )
            break

Controlling the Avatar via REST API

Once an avatar is running in a room, you can control it from any backend using the REST API — no LiveKit connection needed.

Make the Avatar Speak

curl -X POST "https://api.bithuman.ai/v1/agent/A78WKV4515/speak" \
  -H "api-secret: $BITHUMAN_API_SECRET" \
  -H "Content-Type: application/json" \
  -d '{"message": "Hello! Welcome to our demo."}'

Add Context (Silent Knowledge)

curl -X POST "https://api.bithuman.ai/v1/agent/A78WKV4515/add-context" \
  -H "api-secret: $BITHUMAN_API_SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "context": "The customer just purchased a premium plan.",
    "type": "add_context"
  }'
The avatar won’t say this aloud, but it will use the information in future responses.
These REST API calls work from any language or platform — use them to integrate avatars into existing apps without touching the agent code.

Using the SDK Without LiveKit

If you don’t need real-time rooms (e.g., generating video files or building a custom UI), use the Python SDK directly:
import asyncio
import cv2
from bithuman import AsyncBithuman
from bithuman.audio import load_audio, float32_to_int16

async def main():
    # Initialize the runtime
    runtime = await AsyncBithuman.create(
        model_path="avatar.imx",
        api_secret="your_api_secret",
    )
    await runtime.start()

    # Load an audio file and push it
    audio, sr = load_audio("speech.wav")
    audio_int16 = float32_to_int16(audio)
    await runtime.push_audio(audio_int16.tobytes(), sr)
    await runtime.flush()

    # Get animated video frames
    async for frame in runtime.run():
        if frame.has_image:
            cv2.imshow("Avatar", frame.bgr_image)
            cv2.waitKey(1)

        if frame.end_of_speech:
            break

asyncio.run(main())
This gives you raw numpy frames — display them however you want.

Complete Docker Example

For the fastest path to a working demo, use the Docker example that packages everything together:
# Clone the examples repo
git clone https://github.com/bithuman-product/examples.git
cd examples/essence-selfhosted

# Configure
cat > .env << 'EOF'
BITHUMAN_API_SECRET=your_api_secret
OPENAI_API_KEY=sk-...
LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=APIxxxxxxxx
LIVEKIT_API_SECRET=xxxxxxxx
EOF

# Add your avatar model
mkdir -p models
cp ~/Downloads/avatar.imx models/

# Launch
docker compose up
Open http://localhost:4202 to talk to your avatar.

Troubleshooting

Cloud mode: Check that your avatar_id exists — look it up in your Library. Verify your API secret is valid with:
curl -X POST https://api.bithuman.ai/v1/validate \
  -H "api-secret: $BITHUMAN_API_SECRET"
Self-hosted mode: Check that the .imx file path is correct and the file is not corrupted:
bithuman validate --model-path /path/to/avatar.imx
The avatar needs audio input to animate. Ensure:
  1. Your TTS is producing audio (test with openai.TTS() separately)
  2. Ensure avatar.start(session, room=ctx.room) is called before session.start()
  3. Check agent logs for audio pipeline errors
  • Verify your API secret is correct (copy-paste from Developer → API Keys)
  • Check you have credits remaining in your account
  • Ensure the BITHUMAN_API_SECRET environment variable is set
Cloud: First request downloads the model (~2-4 seconds). Subsequent requests use cache (~1-2 seconds).Self-hosted CPU: First load takes ~20 seconds (model initialization). Keep the process running for fast subsequent sessions.Self-hosted GPU: Cold start takes ~30-40 seconds. Use long-running containers with preset avatars for ~4 second startup.
All avatar workers are busy. The system retries automatically (up to 5 times with backoff). If it persists:
  • Check your usage limits
  • Try again in a few seconds
  • For self-hosted: increase the number of worker replicas

Billing & Credits

Avatar sessions consume credits based on the deployment mode and session duration.
DeploymentCredit CostBilled ByNotes
Cloud PluginPer session minuteSession durationIncludes GPU rendering
Self-Hosted CPUPer authenticationAuth callRendering is free (your hardware)
Self-Hosted GPUPer authenticationAuth callRendering is free (your hardware)
Check your remaining credits at www.bithuman.ai — your credit balance is shown in the top navigation bar. Credits are consumed only for active sessions — idle containers cost nothing.

Next Steps