Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.bithuman.ai/llms.txt

Use this file to discover all available pages before exploring further.

An AvatarSession is how you bring a bitHuman avatar into a LiveKit room. This guide covers every way to do it, with complete working examples.
New to bitHuman? Start with How It Works to understand the core concepts first.

Choose Your Approach

ApproachBest ForModel FilesGPU RequiredInternet Required
Cloud PluginGetting started, web appsNoNoYes
Self-Hosted CPUPrivacy, edge devicesYes (.imx)NoOnly for auth
Self-Hosted GPUDynamic faces, custom imagesNo (uses images)YesOnly for auth
On-Device macOSApple Silicon, privacy-firstNo (uses images)No (Apple M3+)Only for auth

Prerequisites

All approaches need these basics: You also need a LiveKit server. If you don’t have one:
# Option 1: LiveKit Cloud (easiest)
# Sign up at https://cloud.livekit.io — free tier available

# Option 2: Self-hosted LiveKit
docker run --rm -p 7880:7880 -p 7881:7881 -p 7882:7882/udp \
    livekit/livekit-server --dev

Cloud Plugin

The cloud plugin runs the avatar on bitHuman’s servers. You just provide an Agent ID and API secret — no model files, no GPU.

Complete Working Example

import asyncio
import os
from livekit.agents import (
    Agent,
    AgentSession,
    JobContext,
    RoomOutputOptions,
    WorkerOptions,
    cli,
    llm,
)
from livekit.plugins import openai, silero, bithuman

# 1. Define your AI agent
class MyAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="""You are a helpful and friendly assistant.
            Keep responses concise — 1-2 sentences.""",
        )

# 2. Set up the session when a user connects
async def entrypoint(ctx: JobContext):
    await ctx.connect()

    # Wait for a user to join the room
    await ctx.wait_for_participant()

    # Create the avatar session (cloud-hosted)
    avatar = bithuman.AvatarSession(
        avatar_id=os.getenv("BITHUMAN_AGENT_ID"),    # e.g. "A78WKV4515"
        api_secret=os.getenv("BITHUMAN_API_SECRET"),
    )

    # Create the agent session with AI components
    session = AgentSession(
        stt=openai.STT(),                 # Speech-to-text
        llm=openai.LLM(),                 # AI language model
        tts=openai.TTS(),                 # Text-to-speech
        vad=silero.VAD.load(),            # Voice activity detection
    )

    # Start everything — avatar joins the room automatically
    await avatar.start(session, room=ctx.room)

    await session.start(
        agent=MyAgent(),
        room=ctx.room,
        room_output_options=RoomOutputOptions(audio_enabled=False),
    )

# 3. Launch
if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Environment Variables

# Required
export BITHUMAN_API_SECRET="your_api_secret"          # From www.bithuman.ai/#developer
export BITHUMAN_AGENT_ID="A78WKV4515"           # Your agent's ID
export OPENAI_API_KEY="sk-..."                   # For STT, LLM, TTS

# LiveKit connection
export LIVEKIT_URL="wss://your-project.livekit.cloud"
export LIVEKIT_API_KEY="APIxxxxxxxx"
export LIVEKIT_API_SECRET="xxxxxxxx"

Run It

python agent.py dev
Then open agents-playground.livekit.io to connect and talk to your avatar.

How It Works Behind the Scenes

When avatar.start() and session.start() run:
  1. The plugin sends a request to bitHuman’s cloud API
  2. A cloud avatar worker receives the request
  3. The worker downloads the avatar model (cached after first time)
  4. The worker joins your LiveKit room as a participant named bithuman-avatar-agent
  5. As your agent produces TTS audio, the worker generates animated video frames
  6. Video is published to the room — users see the avatar speaking
Essence vs Expression model: By default, the cloud plugin uses the Essence (CPU) model, which works with pre-built .imx avatars. Add model="expression" to use the Expression (GPU) model, which supports custom face images.

Using Expression Model (GPU) with Custom Image

from PIL import Image

avatar = bithuman.AvatarSession(
    avatar_image=Image.open("face.jpg"),    # Any face image
    api_secret=os.getenv("BITHUMAN_API_SECRET"),
    model="expression",
)

Self-Hosted CPU

Run the avatar entirely on your own machine using a downloaded .imx model file. Great for privacy and offline use.

Complete Working Example

import asyncio
import os
from livekit.agents import (
    Agent,
    AgentSession,
    JobContext,
    RoomOutputOptions,
    WorkerOptions,
    cli,
    llm,
)
from livekit.plugins import openai, silero, bithuman

class MyAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="You are a helpful assistant. Keep responses brief.",
        )

async def entrypoint(ctx: JobContext):
    await ctx.connect()
    await ctx.wait_for_participant()

    # Create the avatar session (self-hosted, CPU)
    avatar = bithuman.AvatarSession(
        model_path=os.getenv("BITHUMAN_MODEL_PATH"),  # e.g. "/models/avatar.imx"
        api_secret=os.getenv("BITHUMAN_API_SECRET"),
    )

    session = AgentSession(
        stt=openai.STT(),
        llm=openai.LLM(),
        tts=openai.TTS(),
        vad=silero.VAD.load(),
    )

    await avatar.start(session, room=ctx.room)

    await session.start(
        agent=MyAgent(),
        room=ctx.room,
        room_output_options=RoomOutputOptions(audio_enabled=False),
    )

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Environment Variables

# Required
export BITHUMAN_API_SECRET="your_api_secret"
export BITHUMAN_MODEL_PATH="/path/to/avatar.imx"
export OPENAI_API_KEY="sk-..."

# LiveKit connection
export LIVEKIT_URL="wss://your-project.livekit.cloud"
export LIVEKIT_API_KEY="APIxxxxxxxx"
export LIVEKIT_API_SECRET="xxxxxxxx"

How It Differs from Cloud

AspectCloudSelf-Hosted CPU
Model locationbitHuman’s serversYour machine
Avatar parameteravatar_id="A78WKV4515"model_path="/path/to/avatar.imx"
Internet neededYes (always)Only for authentication
First frame latency2-4 seconds~20 seconds (model load)
PrivacyAudio sent to cloudAudio stays local

System Requirements

  • CPU: 1–2 cores sustain 25 FPS on modern chips; 4+ is comfortable for headroom
  • RAM: 4 GB minimum, 8 GB recommended
  • Disk: ~500 MB per .imx model
  • OS: Linux (x86_64 / ARM64), macOS 13+ (Intel or Apple Silicon), or Windows 10+

Self-Hosted GPU

Use a GPU container that generates avatars from any face image — no pre-built models needed.

Complete Working Example

import asyncio
import os
from livekit.agents import (
    Agent,
    AgentSession,
    JobContext,
    RoomOutputOptions,
    WorkerOptions,
    cli,
    llm,
)
from livekit.plugins import openai, silero, bithuman

class MyAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="You are a helpful assistant. Keep responses brief.",
        )

async def entrypoint(ctx: JobContext):
    await ctx.connect()
    await ctx.wait_for_participant()

    # Create the avatar session (self-hosted GPU container)
    avatar = bithuman.AvatarSession(
        api_url=os.getenv("CUSTOM_GPU_URL", "http://localhost:8089/launch"),
        api_secret=os.getenv("BITHUMAN_API_SECRET"),
        avatar_image="https://example.com/face.jpg",    # Any face image URL
    )

    session = AgentSession(
        stt=openai.STT(),
        llm=openai.LLM(),
        tts=openai.TTS(),
        vad=silero.VAD.load(),
    )

    await avatar.start(session, room=ctx.room)

    await session.start(
        agent=MyAgent(),
        room=ctx.room,
        room_output_options=RoomOutputOptions(audio_enabled=False),
    )

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Start the GPU Container First

# Pull and run the GPU avatar container
docker run --gpus all -p 8089:8089 \
    -v /path/to/model-storage:/data/models \
    -e BITHUMAN_API_SECRET=your_api_secret \
    docker.io/sgubithuman/expression-avatar:latest

Environment Variables

# Required
export BITHUMAN_API_SECRET="your_api_secret"
export CUSTOM_GPU_URL="http://localhost:8089/launch"
export OPENAI_API_KEY="sk-..."

# LiveKit connection
export LIVEKIT_URL="wss://your-project.livekit.cloud"
export LIVEKIT_API_KEY="APIxxxxxxxx"
export LIVEKIT_API_SECRET="xxxxxxxx"
For detailed GPU container setup, see Self-Hosted GPU Container.

On-Device Apple Silicon

Run the Expression model entirely on the end-user’s Apple Silicon device — no LiveKit, no AvatarSession, no server. The Swift SDK bundles speech recognition, an on-device LLM, TTS, and the lip-sync engine. The only network traffic is a 1-request-per-minute billing heartbeat to api.bithuman.ai.

Vanilla integration (audio + avatar)

import bitHumanKit

// 1. Download / verify the universal weights bundle (~1.6 GB on first launch).
let weights = try await ExpressionWeights.ensureAvailable()
let portrait = AgentCatalog.thumbnailURL(for: AgentCatalog.defaultAgent)!

// 2. Configure. The avatar pipeline is metered (2 cr/min); audio-only mode is free.
var config = VoiceChatConfig()
config.systemPrompt = "You are a calm assistant. One sentence per turn."
config.voice = .preset("Aiden")
config.avatar = AvatarConfig(modelPath: weights, portraitPath: portrait)
config.apiKey = ProcessInfo.processInfo.environment["BITHUMAN_API_KEY"]

// 3. Boot. Authenticates via heartbeat synchronously — bad keys fail fast.
let chat = VoiceChat(config: config)
try await chat.start()

Hardware floor

PlatformMinimum
macOSM3+ Apple Silicon, macOS 26 (Tahoe)
iPadiPad Pro M4+, 16 GB unified memory, iPadOS 26
iPhoneiPhone 16 Pro+ (A18 Pro), iOS 26
HardwareCheck.evaluate() gates this at runtime — under-spec devices see a polite refusal screen.

Try without writing code

brew tap bithuman-product/bithuman
brew install bithuman-cli
bithuman-cli video        # voice + lip-synced floating avatar window
Full guide: Swift SDK overview →, 10-minute quickstart →, bithuman-cli reference →.

What’s Next

Once your avatar session is running, explore these features:

Gestures & Dynamics

Add wave, nod, and laugh animations (Essence only)

Control via REST API

Make avatars speak or inject context from any backend

Python SDK (No LiveKit)

Generate video frames directly without real-time rooms

Docker Examples

Pre-built Docker stacks for every deployment mode

Troubleshooting

Cloud mode: Check that your avatar_id exists — look it up in your Library. Verify your API secret is valid with:
curl -X POST https://api.bithuman.ai/v1/validate \
  -H "api-secret: $BITHUMAN_API_SECRET"
Self-hosted mode: Check that the .imx file path is correct and the file is not corrupted:
bithuman validate --model-path /path/to/avatar.imx
The avatar needs audio input to animate. Ensure:
  1. Your TTS is producing audio (test with openai.TTS() separately)
  2. Ensure avatar.start(session, room=ctx.room) is called before session.start()
  3. Check agent logs for audio pipeline errors
  • Verify your API secret is correct (copy-paste from Developer → API Keys)
  • Check you have credits remaining in your account
  • Ensure the BITHUMAN_API_SECRET environment variable is set
Cloud: First request downloads the model (~2-4 seconds). Subsequent requests use cache (~1-2 seconds).Self-hosted CPU: First load takes ~20 seconds (model initialization). Keep the process running for fast subsequent sessions.Self-hosted GPU: Cold start takes ~30-40 seconds. Use long-running containers with preset avatars for ~4 second startup.
All avatar workers are busy. The system retries automatically (up to 5 times with backoff). If it persists:
  • Check your usage limits
  • Try again in a few seconds
  • For self-hosted: increase the number of worker replicas

Session Lifecycle

Understanding how sessions behave helps you build reliable integrations.
BehaviorEssence (CPU)Expression (GPU server)Expression (on-device)
Idle timeoutNone — sessions run indefinitely10 minutes of inactivityNone — runs while app is open
GesturesSupportedNot supportedNot supported
Use caseKiosks, always-on displaysInteractive conversationsNative consumer apps
Essence sessions are designed for 24/7 deployments like museum kiosks and lobby displays. They run until the client disconnects — there is no idle timeout. Expression sessions automatically close after 10 minutes of inactivity to free GPU resources.

Billing & Credits

Avatar sessions consume credits based on the deployment mode and session duration.
DeploymentModelCredit CostNotes
CloudEssence2 cr/minCPU rendering on bitHuman servers
CloudExpression4 cr/minGPU rendering on bitHuman servers
Self-HostedEssence1 cr/minCPU rendering on your hardware
Self-HostedExpression2 cr/minGPU rendering on your hardware
On-DeviceExpression (Swift SDK)2 cr/minActive avatar minutes only — audio-only mode is unmetered
Check your remaining credits at www.bithuman.ai — your credit balance is shown in the top navigation bar. Credits are consumed only for active sessions — idle containers cost nothing.

Next Steps

Dynamics API

Add gestures and movements

Webhooks

Get notified about session events

Embed Avatars

Put avatars on any website