Avatar Sessions: Cloud, CPU & GPU Deployment Guide

An AvatarSession is how you bring a bitHuman avatar into a LiveKit room. This guide covers every way to do it, with complete working examples.

New to bitHuman? Start with How It Works to understand the core concepts first.

Choose Your Approach

Approach	Best For	Model Files	GPU Required	Internet Required
Cloud Plugin	Getting started, web apps	No	No	Yes
Self-Hosted CPU	Privacy, edge devices	Yes (.imx)	No	Only for auth
Self-Hosted GPU	Dynamic faces, custom images	No (uses images)	Yes	Only for auth
On-Device macOS	Apple Silicon, privacy-first	No (uses images)	No (Apple M3+)	Only for auth

Prerequisites

All approaches need these basics: You also need a LiveKit server. If you don’t have one:

# Option 1: LiveKit Cloud (easiest)
# Sign up at https://cloud.livekit.io — free tier available

# Option 2: Self-hosted LiveKit
docker run --rm -p 7880:7880 -p 7881:7881 -p 7882:7882/udp \
    livekit/livekit-server --dev

Cloud Plugin

The cloud plugin runs the avatar on bitHuman’s servers. You just provide an Agent ID and API secret — no model files, no GPU.

Complete Working Example

import asyncio
import os
from livekit.agents import (
    Agent,
    AgentSession,
    JobContext,
    RoomOutputOptions,
    WorkerOptions,
    cli,
    llm,
)
from livekit.plugins import openai, silero, bithuman

# 1. Define your AI agent
class MyAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="""You are a helpful and friendly assistant.
            Keep responses concise — 1-2 sentences.""",
        )

# 2. Set up the session when a user connects
async def entrypoint(ctx: JobContext):
    await ctx.connect()

    # Wait for a user to join the room
    await ctx.wait_for_participant()

    # Create the avatar session (cloud-hosted)
    avatar = bithuman.AvatarSession(
        avatar_id=os.getenv("BITHUMAN_AGENT_ID"),    # e.g. "A78WKV4515"
        api_secret=os.getenv("BITHUMAN_API_SECRET"),
    )

    # Create the agent session with AI components
    session = AgentSession(
        stt=openai.STT(),                 # Speech-to-text
        llm=openai.LLM(),                 # AI language model
        tts=openai.TTS(),                 # Text-to-speech
        vad=silero.VAD.load(),            # Voice activity detection
    )

    # Start everything — avatar joins the room automatically
    await avatar.start(session, room=ctx.room)

    await session.start(
        agent=MyAgent(),
        room=ctx.room,
        room_output_options=RoomOutputOptions(audio_enabled=False),
    )

# 3. Launch
if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Environment Variables

# Required
export BITHUMAN_API_SECRET="your_api_secret"          # From www.bithuman.ai/#developer
export BITHUMAN_AGENT_ID="A78WKV4515"           # Your agent's ID
export OPENAI_API_KEY="sk-..."                   # For STT, LLM, TTS

# LiveKit connection
export LIVEKIT_URL="wss://your-project.livekit.cloud"
export LIVEKIT_API_KEY="APIxxxxxxxx"
export LIVEKIT_API_SECRET="xxxxxxxx"

Run It

python agent.py dev

Then open agents-playground.livekit.io to connect and talk to your avatar.

How It Works Behind the Scenes

When avatar.start() and session.start() run:

The plugin sends a request to bitHuman’s cloud API
A cloud avatar worker receives the request
The worker downloads the avatar model (cached after first time)
The worker joins your LiveKit room as a participant named bithuman-avatar-agent
As your agent produces TTS audio, the worker generates animated video frames
Video is published to the room — users see the avatar speaking

Essence vs Expression model: By default, the cloud plugin uses the Essence (CPU) model, which works with pre-built .imx avatars. Add model="expression" to use the Expression (GPU) model, which supports custom face images.

Using Expression Model (GPU) with Custom Image

from PIL import Image

avatar = bithuman.AvatarSession(
    avatar_image=Image.open("face.jpg"),    # Any face image
    api_secret=os.getenv("BITHUMAN_API_SECRET"),
    model="expression",
)

Self-Hosted CPU

Run the avatar entirely on your own machine using a downloaded .imx model file. Great for privacy and offline use.

Complete Working Example

import asyncio
import os
from livekit.agents import (
    Agent,
    AgentSession,
    JobContext,
    RoomOutputOptions,
    WorkerOptions,
    cli,
    llm,
)
from livekit.plugins import openai, silero, bithuman

class MyAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="You are a helpful assistant. Keep responses brief.",
        )

async def entrypoint(ctx: JobContext):
    await ctx.connect()
    await ctx.wait_for_participant()

    # Create the avatar session (self-hosted, CPU)
    avatar = bithuman.AvatarSession(
        model_path=os.getenv("BITHUMAN_MODEL_PATH"),  # e.g. "/models/avatar.imx"
        api_secret=os.getenv("BITHUMAN_API_SECRET"),
    )

    session = AgentSession(
        stt=openai.STT(),
        llm=openai.LLM(),
        tts=openai.TTS(),
        vad=silero.VAD.load(),
    )

    await avatar.start(session, room=ctx.room)

    await session.start(
        agent=MyAgent(),
        room=ctx.room,
        room_output_options=RoomOutputOptions(audio_enabled=False),
    )

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Environment Variables

# Required
export BITHUMAN_API_SECRET="your_api_secret"
export BITHUMAN_MODEL_PATH="/path/to/avatar.imx"
export OPENAI_API_KEY="sk-..."

# LiveKit connection
export LIVEKIT_URL="wss://your-project.livekit.cloud"
export LIVEKIT_API_KEY="APIxxxxxxxx"
export LIVEKIT_API_SECRET="xxxxxxxx"

How It Differs from Cloud

Aspect	Cloud	Self-Hosted CPU
Model location	bitHuman’s servers	Your machine
Avatar parameter	`avatar_id="A78WKV4515"`	`model_path="/path/to/avatar.imx"`
Internet needed	Yes (always)	Only for authentication
First frame latency	2-4 seconds	~20 seconds (model load)
Privacy	Audio sent to cloud	Audio stays local

System Requirements

CPU: 1–2 cores sustain 25 FPS on modern chips; 4+ is comfortable for headroom
RAM: 4 GB minimum, 8 GB recommended
Disk: ~500 MB per .imx model
OS: Linux (x86_64 / ARM64), macOS 13+ (Intel or Apple Silicon), or Windows 10+

Self-Hosted GPU

Use a GPU container that generates avatars from any face image — no pre-built models needed.

Complete Working Example

import asyncio
import os
from livekit.agents import (
    Agent,
    AgentSession,
    JobContext,
    RoomOutputOptions,
    WorkerOptions,
    cli,
    llm,
)
from livekit.plugins import openai, silero, bithuman

class MyAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="You are a helpful assistant. Keep responses brief.",
        )

async def entrypoint(ctx: JobContext):
    await ctx.connect()
    await ctx.wait_for_participant()

    # Create the avatar session (self-hosted GPU container)
    avatar = bithuman.AvatarSession(
        api_url=os.getenv("CUSTOM_GPU_URL", "http://localhost:8089/launch"),
        api_secret=os.getenv("BITHUMAN_API_SECRET"),
        avatar_image="https://example.com/face.jpg",    # Any face image URL
    )

    session = AgentSession(
        stt=openai.STT(),
        llm=openai.LLM(),
        tts=openai.TTS(),
        vad=silero.VAD.load(),
    )

    await avatar.start(session, room=ctx.room)

    await session.start(
        agent=MyAgent(),
        room=ctx.room,
        room_output_options=RoomOutputOptions(audio_enabled=False),
    )

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Start the GPU Container First

# Pull and run the GPU avatar container
docker run --gpus all -p 8089:8089 \
    -v /path/to/model-storage:/data/models \
    -e BITHUMAN_API_SECRET=your_api_secret \
    docker.io/sgubithuman/expression-avatar:latest

Environment Variables

# Required
export BITHUMAN_API_SECRET="your_api_secret"
export CUSTOM_GPU_URL="http://localhost:8089/launch"
export OPENAI_API_KEY="sk-..."

# LiveKit connection
export LIVEKIT_URL="wss://your-project.livekit.cloud"
export LIVEKIT_API_KEY="APIxxxxxxxx"
export LIVEKIT_API_SECRET="xxxxxxxx"

For detailed GPU container setup, see Self-Hosted GPU Container.

On-Device Apple Silicon

Run the Expression model entirely on the end-user’s Apple Silicon device — no LiveKit, no AvatarSession, no server. The Swift SDK bundles speech recognition, an on-device LLM, TTS, and the lip-sync engine. The only network traffic is a 1-request-per-minute billing heartbeat to api.bithuman.ai.

Vanilla integration (audio + avatar)

import bitHumanKit

// 1. Download / verify the universal weights bundle (~1.6 GB on first launch).
let weights = try await ExpressionWeights.ensureAvailable()
let portrait = AgentCatalog.thumbnailURL(for: AgentCatalog.defaultAgent)!

// 2. Configure. The avatar pipeline is metered (2 cr/min); audio-only mode is free.
var config = VoiceChatConfig()
config.systemPrompt = "You are a calm assistant. One sentence per turn."
config.voice = .preset("Aiden")
config.avatar = AvatarConfig(modelPath: weights, portraitPath: portrait)
config.apiKey = ProcessInfo.processInfo.environment["BITHUMAN_API_KEY"]

// 3. Boot. Authenticates via heartbeat synchronously — bad keys fail fast.
let chat = VoiceChat(config: config)
try await chat.start()

Hardware floor

Platform	Minimum
macOS	M3+ Apple Silicon, macOS 26 (Tahoe)
iPad	iPad Pro M4+, 16 GB unified memory, iPadOS 26
iPhone	iPhone 16 Pro+ (A18 Pro), iOS 26

HardwareCheck.evaluate() gates this at runtime — under-spec devices see a polite refusal screen.

Try without writing code

brew tap bithuman-product/bithuman
brew install bithuman-cli
bithuman-cli video        # voice + lip-synced floating avatar window

Full guide: Swift SDK overview →, 10-minute quickstart →, bithuman-cli reference →.

What’s Next

Once your avatar session is running, explore these features:

Gestures & Dynamics

Add wave, nod, and laugh animations (Essence only)

Control via REST API

Make avatars speak or inject context from any backend

Python SDK (No LiveKit)

Generate video frames directly without real-time rooms

Docker Examples

Pre-built Docker stacks for every deployment mode

Troubleshooting

Avatar doesn't appear in the room

Cloud mode: Check that your avatar_id exists — look it up in your Library. Verify your API secret is valid with:

curl -X POST https://api.bithuman.ai/v1/validate \
  -H "api-secret: $BITHUMAN_API_SECRET"

Self-hosted mode: Check that the .imx file path is correct and the file is not corrupted:

bithuman validate --model-path /path/to/avatar.imx

Avatar appears but no lip movement

The avatar needs audio input to animate. Ensure:

Your TTS is producing audio (test with openai.TTS() separately)
Ensure avatar.start(session, room=ctx.room) is called before session.start()
Check agent logs for audio pipeline errors

'Authentication failed' error

Verify your API secret is correct (copy-paste from Developer → API Keys)
Check you have credits remaining in your account
Ensure the BITHUMAN_API_SECRET environment variable is set

High latency / slow first frame

Cloud: First request downloads the model (~2-4 seconds). Subsequent requests use cache (~1-2 seconds).Self-hosted CPU: First load takes ~20 seconds (model initialization). Keep the process running for fast subsequent sessions.Self-hosted GPU: Cold start takes ~30-40 seconds. Use long-running containers with preset avatars for ~4 second startup.

'No available workers' or 503 errors

All avatar workers are busy. The system retries automatically (up to 5 times with backoff). If it persists:

Check your usage limits
Try again in a few seconds
For self-hosted: increase the number of worker replicas

Session Lifecycle

Understanding how sessions behave helps you build reliable integrations.

Behavior	Essence (CPU)	Expression (GPU server)	Expression (on-device)
Idle timeout	None — sessions run indefinitely	10 minutes of inactivity	None — runs while app is open
Gestures	Supported	Not supported	Not supported
Use case	Kiosks, always-on displays	Interactive conversations	Native consumer apps

Essence sessions are designed for 24/7 deployments like museum kiosks and lobby displays. They run until the client disconnects — there is no idle timeout. Expression sessions automatically close after 10 minutes of inactivity to free GPU resources.

Billing & Credits

Avatar sessions consume credits based on the deployment mode and session duration.

Deployment	Model	Credit Cost	Notes
Cloud	Essence	2 cr/min	CPU rendering on bitHuman servers
Cloud	Expression	4 cr/min	GPU rendering on bitHuman servers
Self-Hosted	Essence	1 cr/min	CPU rendering on your hardware
Self-Hosted	Expression	2 cr/min	GPU rendering on your hardware
On-Device	Expression (Swift SDK)	2 cr/min	Active avatar minutes only — audio-only mode is unmetered

Check your remaining credits at www.bithuman.ai — your credit balance is shown in the top navigation bar. Credits are consumed only for active sessions — idle containers cost nothing.

Next Steps

Dynamics API

Add gestures and movements

Webhooks

Get notified about session events

Embed Avatars

Put avatars on any website

Getting Started

Swift SDK

Deployment

Integrations

Changelog

Avatar Sessions: Cloud, CPU & GPU Deployment Guide

Choose Your Approach

Prerequisites

Cloud Plugin

Complete Working Example

Environment Variables

Run It

How It Works Behind the Scenes

Using Expression Model (GPU) with Custom Image

Self-Hosted CPU

Complete Working Example

Environment Variables

How It Differs from Cloud

System Requirements

Self-Hosted GPU

Complete Working Example

Start the GPU Container First

Environment Variables

On-Device Apple Silicon

Vanilla integration (audio + avatar)

Hardware floor

Try without writing code

What’s Next

Gestures & Dynamics

Control via REST API

Python SDK (No LiveKit)

Docker Examples

Troubleshooting

Session Lifecycle

Billing & Credits

Next Steps

Dynamics API

Webhooks

Embed Avatars

Getting Started

Swift SDK

Deployment

Integrations

Changelog

Documentation Index

​Choose Your Approach

​Prerequisites

​Cloud Plugin

​Complete Working Example

​Environment Variables

​Run It

​How It Works Behind the Scenes

​Using Expression Model (GPU) with Custom Image

​Self-Hosted CPU

​Complete Working Example

​Environment Variables

​How It Differs from Cloud

​System Requirements

​Self-Hosted GPU

​Complete Working Example

​Start the GPU Container First

​Environment Variables

​On-Device Apple Silicon

​Vanilla integration (audio + avatar)

​Hardware floor

​Try without writing code

​What’s Next

Gestures & Dynamics

Control via REST API

Python SDK (No LiveKit)

Docker Examples

​Troubleshooting

​Session Lifecycle

​Billing & Credits

​Next Steps

Dynamics API

Webhooks

Embed Avatars

Choose Your Approach

Prerequisites

Cloud Plugin

Complete Working Example

Environment Variables

Run It

How It Works Behind the Scenes

Using Expression Model (GPU) with Custom Image

Self-Hosted CPU

Complete Working Example

Environment Variables

How It Differs from Cloud

System Requirements

Self-Hosted GPU

Complete Working Example

Start the GPU Container First

Environment Variables

On-Device Apple Silicon

Vanilla integration (audio + avatar)

Hardware floor

Try without writing code

What’s Next

Troubleshooting

Session Lifecycle

Billing & Credits

Next Steps