Avatar Sessions: Cloud, CPU & GPU Deployment Guide
Complete guide to creating bitHuman avatar sessions — cloud plugin, self-hosted CPU (.imx), and self-hosted GPU (any face image). Working code examples for each.
An AvatarSession is how you bring a bitHuman avatar into a LiveKit room. This guide covers every way to do it, with complete working examples.
New to bitHuman? Start with How It Works to understand the core concepts first.
The plugin sends a request to bitHuman’s cloud API
A cloud avatar worker receives the request
The worker downloads the avatar model (cached after first time)
The worker joins your LiveKit room as a participant named bithuman-avatar-agent
As your agent produces TTS audio, the worker generates animated video frames
Video is published to the room — users see the avatar speaking
Essence vs Expression model: By default, the cloud plugin uses the Essence (CPU) model, which works with pre-built .imx avatars. Add model="expression" to use the Expression (GPU) model, which supports custom face images.
from PIL import Imageavatar = bithuman.AvatarSession( avatar_image=Image.open("face.jpg"), # Any face image api_secret=os.getenv("BITHUMAN_API_SECRET"), model="expression",)
# Pull and run the GPU avatar containerdocker run --gpus all -p 8089:8089 \ -v /path/to/model-storage:/data/models \ -e BITHUMAN_API_SECRET=your_api_secret \ docker.io/sgubithuman/expression-avatar:latest
from livekit.agents import AgentSession, UserInputTranscribedEventfrom bithuman.api import VideoControlKEYWORD_ACTION_MAP = { "hello": "mini_wave_hello", "hi": "mini_wave_hello", "funny": "laugh_react", "laugh": "laugh_react", "yes": "talk_head_nod_subtle",}# Inside your entrypoint, after session.start():@session.on("user_input_transcribed")def on_transcribed(event: UserInputTranscribedEvent): if not event.is_final: return text = event.transcript.lower() for keyword, action in KEYWORD_ACTION_MAP.items(): if keyword in text: asyncio.create_task( avatar.runtime.push(VideoControl(action=action)) ) break
Copy
from livekit import rtcimport jsonfrom datetime import datetimeKEYWORD_ACTION_MAP = { "hello": "mini_wave_hello", "funny": "laugh_react",}async def trigger_gesture(participant: rtc.LocalParticipant, target: str, action: str): await participant.perform_rpc( destination_identity=target, method="trigger_dynamics", payload=json.dumps({ "action": action, "identity": participant.identity, "timestamp": datetime.utcnow().isoformat(), }), )# Inside your entrypoint, after session.start():@session.on("user_input_transcribed")def on_transcribed(event: UserInputTranscribedEvent): if not event.is_final: return text = event.transcript.lower() for keyword, action in KEYWORD_ACTION_MAP.items(): if keyword in text: for identity in ctx.room.remote_participants.keys(): asyncio.create_task( trigger_gesture(ctx.room.local_participant, identity, action) ) break
Ensure the BITHUMAN_API_SECRET environment variable is set
High latency / slow first frame
Cloud: First request downloads the model (~2-4 seconds). Subsequent requests use cache (~1-2 seconds).Self-hosted CPU: First load takes ~20 seconds (model initialization). Keep the process running for fast subsequent sessions.Self-hosted GPU: Cold start takes ~30-40 seconds. Use long-running containers with preset avatars for ~4 second startup.
'No available workers' or 503 errors
All avatar workers are busy. The system retries automatically (up to 5 times with backoff). If it persists:
Check your usage limits
Try again in a few seconds
For self-hosted: increase the number of worker replicas
Avatar sessions consume credits based on the deployment mode and session duration.
Deployment
Credit Cost
Billed By
Notes
Cloud Plugin
Per session minute
Session duration
Includes GPU rendering
Self-Hosted CPU
Per authentication
Auth call
Rendering is free (your hardware)
Self-Hosted GPU
Per authentication
Auth call
Rendering is free (your hardware)
Check your remaining credits at www.bithuman.ai — your credit balance is shown in the top navigation bar. Credits are consumed only for active sessions — idle containers cost nothing.