# bitHuman — Complete Documentation
> Real-time avatar animation API. Turn any face image or pre-built .imx
> model into a lifelike talking avatar with audio-driven lip sync.
> Python SDK, REST API, LiveKit plugin, and Docker containers.
Base URL: https://api.bithuman.ai
Authentication: api-secret header on every request
Python SDK: pip install bithuman
LiveKit Plugin: pip install livekit-plugins-bithuman
GPU Container: docker.io/sgubithuman/expression-avatar:latest
Examples: https://github.com/bithuman-product/examples
Dashboard: https://www.bithuman.ai
---
# Getting Started
## bitHuman — Real-Time Avatar Animation API
URL: https://docs.bithuman.ai/introduction
bitHuman creates digital avatars that lip-sync to audio in real-time. Feed in audio — get back an animated face at 25 FPS. Use it to build AI companions, customer support avatars, virtual tutors, game NPCs, and anything that needs a visual character that speaks.
**Three ways to run:**
- **Cloud** — no GPU, no model files. Just an API secret.
- **Self-Hosted CPU** — download an `.imx` model, run on any machine.
- **Self-Hosted GPU** — any face image, 1.3B parameter model, 250+ FPS.
## Quick Start
```bash Docker (Recommended)
git clone https://github.com/bithuman-product/examples.git
cd examples/essence-cloud
# Add your API keys to .env
cp .env.example .env
# Edit .env: set BITHUMAN_API_SECRET, BITHUMAN_AGENT_ID, and OPENAI_API_KEY
docker compose up
# Open http://localhost:4202
```
```python Python SDK
from bithuman import AsyncBithuman
# Create runtime
runtime = await AsyncBithuman.create(
model_path="avatar.imx",
api_secret="your_api_secret"
)
await runtime.start()
# Push audio and get animated frames
await runtime.push_audio(audio_bytes, sample_rate=16000)
await runtime.flush()
async for frame in runtime.run():
frame.bgr_image # numpy array (H, W, 3)
frame.audio_chunk # synchronized audio output
frame.end_of_speech # True when utterance ends
```
## What Can You Build?
Replace hold music with a talking avatar that answers questions.
A patient virtual teacher that explains and adapts to students.
Lobby kiosks and tablets that greet and direct visitors.
Persistent characters with personality and memory.
Dynamic characters that react to player actions with gestures.
Batch-generate talking-head videos from scripts.
[See all use cases with architecture patterns →](/getting-started/use-cases)
## Developer Guides
Get an avatar running in 5 minutes
REST API for agent generation and management
10+ working examples from basic to advanced
## Core SDK API
| Method | Description |
|--------|-------------|
| `AsyncBithuman.create(model_path, api_secret)` | Initialize the avatar runtime |
| `runtime.start()` | Begin processing |
| `runtime.push_audio(data, sample_rate)` | Send audio for lip-sync |
| `runtime.flush()` | Signal end of audio input |
| `runtime.run()` | Async generator yielding video + audio frames |
| `runtime.get_frame_size()` | Returns `(width, height)` of output |
## REST API
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/v1/validate` | POST | Verify API secret |
| `/v1/agent/generate` | POST | Generate new avatar agent |
| `/v1/agent/{code}` | GET/POST | Get or update agent |
| `/v1/agent/{code}/speak` | POST | Make avatar speak text |
| `/v1/agent/{code}/add-context` | POST | Inject silent knowledge |
| `/v1/files/upload` | POST | Upload image/video/audio |
| `/v1/dynamics/generate` | POST | Generate gesture animations |
Base URL: `https://api.bithuman.ai` — Auth: `api-secret` header — [Full reference →](/api-reference/overview)
## Platform Support
| Platform | Status | Notes |
|----------|--------|-------|
| **Linux (x86_64)** | Full Support | Production ready |
| **Linux (ARM64)** | Full Support | Edge deployments |
| **macOS (Apple Silicon)** | Full Support | M2+, M4 ideal |
| **Windows** | Full Support | Via WSL |
## AI Agent Integration
bitHuman provides `llms.txt` and an OpenAPI specification for AI coding agent discoverability:
- **[llms.txt](/llms.txt)** — Curated documentation index for LLM consumption
- **[llms-full.txt](/llms-full.txt)** — Complete documentation in single markdown file
- **[OpenAPI Spec](/api-reference/openapi.yaml)** — Machine-readable API contract
- **[AGENTS.md](https://github.com/bithuman-product/examples/blob/main/AGENTS.md)** — Repository-level agent instructions
## Quick Start: Real-Time Avatar API in 5 Minutes
URL: https://docs.bithuman.ai/getting-started/quickstart
## 1. Get Credentials
Create an account at [www.bithuman.ai](https://www.bithuman.ai)
Go to the Developer page and copy your **API Secret**.
Download an avatar model (`.imx` file) from [Community Models](https://www.bithuman.ai/#community).
## 2. Install
```bash
pip install bithuman opencv-python --upgrade
```
## 3. Run Your First Avatar
You need a `.wav` audio file to drive the avatar. A sample `speech.wav` is included in each
[example directory](https://github.com/bithuman-product/examples), or generate your own with any TTS service.
```python
import asyncio
import cv2
from bithuman import AsyncBithuman
from bithuman.audio import load_audio, float32_to_int16
async def main():
# Initialize
runtime = await AsyncBithuman.create(
model_path="avatar.imx",
api_secret="your_api_secret"
)
await runtime.start()
# Load and push audio
audio, sr = load_audio("speech.wav")
await runtime.push_audio(
float32_to_int16(audio).tobytes(), sr
)
await runtime.flush()
# Display animated frames
async for frame in runtime.run():
if frame.has_image:
cv2.imshow("Avatar", frame.bgr_image) # numpy (H, W, 3)
cv2.waitKey(1)
asyncio.run(main())
```
[Full working example on GitHub](https://github.com/bithuman-product/examples/tree/main/essence-selfhosted)
---
## Key Concepts
| Concept | Description |
|---------|-------------|
| **Runtime** | `AsyncBithuman` instance that processes audio into video |
| **push_audio** | Feed audio bytes — avatar lip-syncs in real-time |
| **flush** | Signals end of audio input |
| **run()** | Async generator that yields frames at 25 FPS |
| **Frame** | Contains `.bgr_image` (numpy), `.audio_chunk`, `.end_of_speech` |
---
## Troubleshooting
The SDK is not installed. Run:
```bash
pip install bithuman --upgrade
```
Make sure you're using the correct Python environment (virtualenv, conda, etc.).
Your API secret is invalid or missing. Check:
1. You copied the full secret from [Developer Dashboard](https://www.bithuman.ai/#developer)
2. The `api_secret` parameter or `BITHUMAN_API_SECRET` env var is set correctly
3. Your account is active with available credits
Quick test:
```bash
curl -X POST https://api.bithuman.ai/v1/validate \
-H "api-secret: YOUR_SECRET"
```
The avatar needs audio input to animate:
1. Ensure you're calling `push_audio()` with valid audio data
2. Call `flush()` after pushing all audio
3. Check that the audio is 16-bit PCM format (use `float32_to_int16()` helper)
4. Verify audio sample rate matches the file (typically 16000 or 44100)
This is normal for the first session — the `.imx` model takes time to load and initialize. Subsequent sessions in the same process start instantly.
To reduce perceived latency, keep the runtime alive between sessions instead of recreating it.
The model file path is wrong. Check:
1. The `.imx` file exists at the path you specified
2. Use an absolute path if running from a different directory
3. Download a model from [Community Models](https://www.bithuman.ai/#community) if you don't have one
---
## Next Steps
Play audio file through avatar (5 min)
Real-time mic input (10 min)
OpenAI voice chat (15 min)
Or jump straight to the [Docker App](https://github.com/bithuman-product/examples/tree/main/essence-selfhosted) for a complete end-to-end setup.
### Guides
- **[Prompt Guide](/getting-started/prompts)** — Master the CO-STAR framework for avatar personality
- **[Media Guide](/getting-started/media-guide)** — Upload voice, image, and video assets
- **[Animal Mode](/getting-started/animal-mode)** — Create animal avatars
### System Requirements
- Python 3.9+, 4+ CPU cores, 8GB RAM
- macOS (M2+), Linux (x64/ARM64), or Windows (WSL)
## How bitHuman Works: Audio-to-Avatar Architecture
URL: https://docs.bithuman.ai/getting-started/how-it-works
## The Big Picture
A bitHuman avatar is a virtual character that moves its lips, face, and body in real-time based on audio input. Here's what happens when someone talks to an avatar:
```
You speak into a microphone
↓
Audio is sent to an AI agent (like ChatGPT)
↓
The AI generates a text response
↓
Text is converted to speech (TTS)
↓
bitHuman animates the avatar's face to match the speech
↓
You see a lifelike avatar talking back to you
```
All of this happens in real-time — fast enough for a natural conversation.
---
## Key Concepts
An `.imx` file is a pre-built avatar model. It contains everything needed to animate a specific character: face data, lip-sync mappings, and appearance information.
Think of it like a "character file" in a video game — it defines what the avatar looks like and how it moves.
You can create your own avatar from any photo or video using the [bitHuman dashboard](https://www.bithuman.ai), or download community models.
A **room** is a virtual meeting space where participants communicate in real-time using audio and video — similar to a Zoom or Google Meet call.
In a bitHuman session, the room typically has:
- **Your user** — the person talking to the avatar
- **An AI agent** — handles conversation logic (speech-to-text, AI response, text-to-speech)
- **The avatar** — renders animated video frames based on the agent's speech
LiveKit is the open-source platform that powers this real-time communication. You don't need to understand LiveKit deeply — bitHuman handles the complex parts.
An **AvatarSession** is the main integration point. It connects your AI agent to a bitHuman avatar inside a LiveKit room.
When you create an `AvatarSession`, bitHuman:
1. Loads the avatar model (cloud or local)
2. Joins the LiveKit room as a participant
3. Listens for audio from your AI agent
4. Generates animated video frames in real-time
5. Publishes the video back to the room
You interact with just a few lines of code — the session handles everything else.
Your **API secret** is the key that authenticates your application with bitHuman services. You can create one from the [Developer Dashboard](https://www.bithuman.ai/#developer).
It's used for:
- Verifying your identity
- Tracking usage and billing
- Downloading cloud avatar models
---
## Which Approach Should I Use?
Start here:
- **No GPU?** → Use **Cloud Plugin** (easiest) or **Self-Hosted CPU** (most private)
- **Have a GPU?** → Use **Self-Hosted GPU** for dynamic face images without pre-built models
- **Want the fastest setup?** → Cloud Plugin — just an API secret and agent ID
- **Need privacy?** → Self-Hosted CPU — audio never leaves your machine
| | Cloud Plugin | Self-Hosted CPU | Self-Hosted GPU |
|---|---|---|---|
| **Setup time** | ~2 min | ~5 min | ~10 min |
| **GPU required** | No | No | Yes (8 GB+ VRAM) |
| **Privacy** | Audio sent to cloud | Audio stays local | Audio stays local |
| **Avatar source** | Pre-built agent ID | `.imx` model file | Any face image |
| **Best for** | Web apps, quick demos | Edge, offline, privacy | Dynamic faces, high volume |
## Three Ways to Use bitHuman
Choose the approach that fits your project:
**Easiest.** Avatar runs on bitHuman's servers.
No model files to manage. Just provide an Agent ID and API secret.
Best for: getting started quickly, web apps, and production deployments.
**Most private.** Avatar runs on your machine.
Download an `.imx` model and run locally. Works offline after setup.
Best for: privacy-sensitive apps, edge devices, custom deployments.
**Most flexible.** GPU container on your infrastructure.
Use any face image to create avatars on-the-fly. No pre-built models needed.
Best for: dynamic avatars, high-volume, full infrastructure control.
---
## How the Avatar Joins a Room
Here's what happens step-by-step when an avatar session starts:
Your AI agent (the code you write) connects to a LiveKit room and waits for a user to join. This is where the conversation will happen.
In your agent code, you create a `bithuman.AvatarSession` with either a cloud `avatar_id` or a local `model_path`. This tells bitHuman which avatar to use.
When you call `avatar.start(session, room=ctx.room)`, bitHuman:
- **Cloud mode:** Sends a request to bitHuman's servers, which launch an avatar worker that joins your room
- **Self-hosted mode:** Loads the `.imx` model locally and starts generating frames
The avatar joins the LiveKit room as a video participant. Users in the room see the avatar's video feed — a lifelike face that moves and speaks.
As your AI agent produces speech audio, the avatar animates in real-time:
- Audio from TTS flows to the avatar
- The avatar lip-syncs and generates video frames at 25 FPS
- Video is published to the room for all participants to see
### Visual Flow
```
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Your User │ │ AI Agent │ │ Avatar │
│ (browser) │ │ (your code) │ │ (bitHuman) │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
│ User speaks │ │
│ ──────────────────>│ │
│ │ │
│ AI processes │ │
│ & responds │ │
│ │ TTS audio │
│ │ ──────────────────>│
│ │ │
│ │ Animated video │
│<───────────────────│<───────────────────│
│ │ │
│ User sees avatar │ │
│ speaking │ │
└────────────────────┴────────────────────┘
LiveKit Room
```
---
## What You Need
| Component | What it is | Where to get it |
|-----------|-----------|-----------------|
| **API Secret** | Authenticates your app | [Developer Dashboard](https://www.bithuman.ai/#developer) |
| **Avatar Model** | The character to animate | [Community Models](https://www.bithuman.ai/#community) or create your own |
| **LiveKit Server** | Real-time communication | [LiveKit Cloud](https://cloud.livekit.io) (free tier) or self-hosted |
| **AI Agent** | Conversation logic | Your code + an LLM (OpenAI, Anthropic, etc.) |
---
## Next Steps
Get an avatar running in 5 minutes
Complete guide to all avatar session modes
## Use Cases
URL: https://docs.bithuman.ai/getting-started/use-cases
## What Can You Build?
bitHuman turns audio into a real-time talking avatar. Anywhere you need a visual character that speaks — that's where bitHuman fits.
---
## Customer Support Avatar
Replace hold music with a face. An avatar greets visitors, answers FAQs, and escalates to a human when needed.
**Architecture:** Website embed (iframe) → bitHuman cloud → OpenAI for conversation
```python
avatar = bithuman.AvatarSession(
avatar_id="YOUR_SUPPORT_AGENT",
api_secret=os.getenv("BITHUMAN_API_SECRET"),
)
```
**Best deployment:** [Cloud Plugin](/deployment/livekit-cloud-plugin) for fastest setup, or [Website Embed](/integrations/embed) for dropping into existing pages.
---
## AI Tutor / Virtual Teacher
A patient avatar that explains concepts, answers questions, and adapts to the student's pace. Lip-sync creates presence that text chat can't match.
**Architecture:** LiveKit room → AI agent (GPT-4 + domain knowledge) → bitHuman avatar
**Best deployment:** [Cloud Plugin](/deployment/livekit-cloud-plugin) with custom system prompt via [Prompt Guide](/getting-started/prompts).
---
## Digital Receptionist / Kiosk
A lobby screen or tablet that greets visitors, provides directions, and handles check-in. Runs on a Raspberry Pi or any Linux machine.
**Architecture:** Kiosk browser → LiveKit → AI agent → bitHuman (self-hosted CPU)
**Best deployment:** [Self-Hosted CPU](/examples/self-hosted-plugin) for offline capability, or [Raspberry Pi](/examples/raspberry-pi) for dedicated hardware.
---
## AI Companion / Virtual Friend
A character with a persistent personality that remembers past conversations. Use [context injection](/api-reference/agent-context) to maintain relationship state.
**Architecture:** Mobile app (Flutter/React) → LiveKit → AI agent with memory → bitHuman
**Best deployment:** [Cloud Plugin](/deployment/livekit-cloud-plugin) + [Flutter integration](/integrations/flutter).
---
## Game NPC / Interactive Character
Non-player characters that respond dynamically to player actions. Use [dynamics](/api-reference/dynamics) for gestures (wave, nod, laugh) triggered by game events.
**Architecture:** Game client → WebSocket → AI agent → bitHuman (GPU for custom faces)
**Best deployment:** [Self-Hosted GPU](/deployment/self-hosted-gpu) for dynamic face generation from character art.
---
## Accessibility Tool
Give a face and voice to text-to-speech output. Visual lip-sync helps hearing-impaired users follow along. Audio output helps visually impaired users interact with content.
**Architecture:** Screen reader / TTS → bitHuman SDK → overlay window
**Best deployment:** [Python SDK directly](/deployment/avatar-sessions#using-the-sdk-without-livekit) (no LiveKit needed).
---
## Content Creation / Video Generation
Generate talking-head videos from scripts without recording. Batch-process audio files through avatars to create training videos, announcements, or social content.
**Architecture:** Script → TTS → bitHuman SDK → video frames → ffmpeg → MP4
```python
runtime = await AsyncBithuman.create(model_path="presenter.imx", api_secret="...")
await runtime.start()
await runtime.push_audio(tts_audio, sample_rate=16000)
await runtime.flush()
# Collect frames and encode to video
frames = []
async for frame in runtime.run():
if frame.has_image:
frames.append(frame.bgr_image)
```
**Best deployment:** [Python SDK directly](/deployment/avatar-sessions#using-the-sdk-without-livekit) with [Self-Hosted GPU](/deployment/self-hosted-gpu) for custom faces.
---
## Choosing the Right Deployment
| Use Case | Recommended Deployment | Why |
|----------|----------------------|-----|
| Customer support | Cloud Plugin | Fast setup, scales automatically |
| AI tutor | Cloud Plugin | Low latency, no infrastructure |
| Receptionist kiosk | Self-Hosted CPU | Offline capable, privacy |
| AI companion | Cloud Plugin + Flutter | Mobile-friendly, cross-platform |
| Game NPC | Self-Hosted GPU | Dynamic faces, low latency |
| Accessibility | Python SDK | Lightweight, no WebRTC overhead |
| Content creation | Python SDK + GPU | Batch processing, custom faces |
---
## Next Steps
Run your first avatar in 5 minutes
Complete deployment guide
Working code for every use case
## Avatar Prompt Engineering: CO-STAR Framework
URL: https://docs.bithuman.ai/getting-started/prompts
Learn the structure that won Singapore's GPT-4 prompt engineering competition.
## The CO-STAR Framework
The **CO-STAR framework** is an award-winning method for creating effective prompts. It considers all key aspects that influence an AI's response quality.
### C - Context
**Provide background information.** Give your avatar the setting and situation they need to understand.
```text
CONTEXT: You are working as a customer service representative for a tech company.
Customers often call frustrated with technical issues.
```
### O - Objective
**Define the specific task.** Be crystal clear about what you want your avatar to accomplish.
```text
OBJECTIVE: Help customers solve their technical problems while making them feel
heard and valued. Always aim to resolve issues on the first interaction.
```
### S - Style
**Specify the communication style.** This could be like a famous person, profession, or communication approach.
```text
STYLE: Communicate like an experienced Apple Genius Bar technician -
knowledgeable but approachable, using analogies to explain technical concepts.
```
### T - Tone
**Set the emotional attitude.** Define how your avatar should "feel" in their responses.
```text
TONE: Patient, empathetic, and solution-focused. Remain calm even when
customers are frustrated.
```
### A - Audience
**Identify who they're talking to.** Tailor responses to the specific audience characteristics.
```text
AUDIENCE: Everyday technology users with varying technical skill levels,
from beginners to intermediate users.
```
### R - Response
**Specify the output format.** Define exactly how responses should be structured.
```text
RESPONSE: Always follow this format:
1. Acknowledge the customer's concern
2. Ask one clarifying question if needed
3. Provide step-by-step solution
4. Confirm understanding
5. Offer additional help
```
---
## Complete CO-STAR Examples
### E-commerce Assistant
```text
CONTEXT: You work for an online fashion retailer during the busy holiday season.
Customers are shopping for gifts and need quick, helpful guidance.
OBJECTIVE: Help customers find the perfect products for their needs and guide
them through purchase decisions confidently.
STYLE: Like a knowledgeable personal shopper at a high-end boutique -
attentive, stylish, and detail-oriented.
TONE: Enthusiastic, helpful, and fashion-forward while being respectful
of different budgets and styles.
AUDIENCE: Online shoppers aged 25-45 looking for clothing and accessories,
with varying fashion knowledge and budget ranges.
RESPONSE:
- Start with a warm greeting
- Ask 2-3 targeted questions about their needs
- Suggest 3 specific product options with reasons
- Mention current promotions if relevant
- End with "How else can I help you today?"
```
### Educational Tutor
```text
CONTEXT: You are an online tutor helping high school students with mathematics
during exam preparation season. Students are stressed and need both academic
and emotional support.
OBJECTIVE: Explain mathematical concepts clearly, help solve specific problems,
and build student confidence in their abilities.
STYLE: Like an award-winning high school teacher who makes complex topics
accessible - using real-world examples and breaking down problems step-by-step.
TONE: Encouraging, patient, and supportive. Celebrate small victories and
reframe mistakes as learning opportunities.
AUDIENCE: High school students (ages 14-18) with varying math abilities,
some struggling with confidence and test anxiety.
RESPONSE:
- Acknowledge their question/concern
- Break complex problems into smaller steps
- Use encouraging phrases like "Great question!" or "You're on the right track!"
- Provide visual or real-world analogies when possible
- End with a confidence-building statement
```
### Healthcare Assistant
```text
CONTEXT: You work for a telehealth platform where patients schedule appointments
and ask general health questions. You cannot provide medical diagnoses but can
offer guidance and support.
OBJECTIVE: Help patients understand their symptoms, schedule appropriate care,
and provide reassurance while maintaining appropriate medical boundaries.
STYLE: Like an experienced nurse practitioner - knowledgeable, professional,
but warm and approachable in explanations.
TONE: Compassionate, professional, and reassuring while being appropriately
cautious about medical advice.
AUDIENCE: Patients of all ages with varying health literacy levels, often
anxious about their symptoms or conditions.
RESPONSE:
- Express empathy for their concern
- Provide general health education when appropriate
- Always recommend consulting healthcare providers for medical advice
- Offer to help schedule appointments
- Use clear, non-medical language
```
---
## Tips for CO-STAR Success
### Do This
**Be Specific in Context**
```text
Bad: "You work in customer service"
Good: "You work as a Level 2 technical support specialist for a cloud software company,
handling escalated cases from customers who've already tried basic troubleshooting"
```
**Use Professional Examples in Style**
```text
Bad: "Be professional"
Good: "Communicate like a McKinsey consultant -- structured, data-driven, and confident
while remaining accessible to non-experts"
```
**Define Clear Response Formats**
```text
Bad: "Give helpful responses"
Good: "Always structure responses as: Problem Summary | Root Cause Analysis |
3 Recommended Solutions | Next Steps"
```
### Avoid This
- **Vague objectives** — "Be helpful" vs "Increase customer satisfaction scores by resolving issues in under 5 minutes"
- **Conflicting tones** — Don't mix "professional" with "casual and fun"
- **Unclear audiences** — "Everyone" vs "Small business owners with 10-50 employees"
- **Missing context** — Jumping straight to objectives without setting the scene
---
## Quick CO-STAR Template
Use this template for any avatar:
```text
CONTEXT: [Describe the situation/setting where your avatar operates]
OBJECTIVE: [What specific goal should your avatar achieve?]
STYLE: [How should they communicate? Like which profession/person?]
TONE: [What emotional attitude should they convey?]
AUDIENCE: [Who are they talking to? Demographics/characteristics?]
RESPONSE: [What format/structure should responses follow?]
```
---
## Next Steps
1. **Write your CO-STAR prompt** using the template above
2. **Test with sample conversations** to refine it
3. **Try it in the [Examples](/examples/overview)** to see it in action
## Media Upload Guide: Images, Video & Audio for Avatars
URL: https://docs.bithuman.ai/getting-started/media-guide
Learn how to prepare and upload media for optimal avatar generation results.
---
## Image Upload
**Perfect for**: Facial likeness and character appearance
### Requirements
| Requirement | Value |
|-------------|-------|
| File Size | Less than 10MB |
| Characters | One person only |
| Position | Centered in frame |
| Orientation | Front-facing |
| Expression | Calm and gentle |
| Quality | High resolution, well-lit |
### Best Practices
- **Good lighting** — avoid shadows on face
- **Clear focus** — sharp, not blurry
- **Solo shots** — no other people visible
- **Neutral expression** — avoid extreme emotions
- **Professional quality** — passport-style photos work well
---
## Video Upload
**Perfect for**: Movement patterns and dynamic expressions
### Requirements
| Requirement | Value |
|-------------|-------|
| Duration | Less than 30 seconds |
| Characters | One person only |
| Position | Centered in frame |
| Movement | Minimal distracting movement |
| Quality | High resolution, stable footage |
### Best Practices
- **Stable camera** — use tripod if possible
- **Consistent framing** — keep character centered
- **Subtle movements** — gentle head movements, natural blinking
- **Good lighting** — consistent throughout video
- **Audio optional** — focus on visual quality
---
## Voice Upload
**Perfect for**: Voice cloning and personalized speech patterns
### Requirements
| Requirement | Value |
|-------------|-------|
| Duration | Less than 1 minute |
| Quality | Clear voice, no background noise |
| Format | MP3, WAV, or M4A |
| Content | Natural speech in your target language |
### Best Practices
- Record in a quiet environment
- Use a good quality microphone
- Speak naturally and clearly
- Avoid music or sound effects
- Include varied sentences for better voice modeling
---
## Media Priority System
Understanding how different uploads influence and overwrite each other:
```mermaid
graph TD
subgraph "User Uploads"
A[Prompt
Character Description]
B[Image
Face/Appearance]
C[Video
Face + Movement]
D[Voice
Speech Audio]
end
subgraph "Likeness Generation"
E{Video
Uploaded?}
E -->|Yes| F[Video OVERWRITES Image
Uses video for likeness]
E -->|No| G{Image
Uploaded?}
G -->|Yes| H[Image for Likeness
Auto-generates persona
Prompt becomes optional]
G -->|No| I[Prompt-Only
Generates appearance
from description]
end
subgraph "Voice Generation"
J{Voice
Uploaded?}
J -->|Yes| K[Uses Uploaded Voice
Clones speech patterns]
J -->|No| L[Auto-Generated Voice
Matches persona/appearance]
end
subgraph "Final Result"
M[Complete Avatar
Likeness + Voice + Personality]
end
A --> E
B --> E
C --> E
F --> J
H --> J
I --> J
K --> M
L --> M
```
### Key Priority Rules
1. **Video > Image** — Video always overwrites image for likeness
2. **Image = Auto-Prompt** — Images auto-generate persona, making manual prompts optional
3. **Voice** — When uploaded, replaces auto-generated voice
4. **Prompt** — Required only when no image/video provided
### Upload Combinations
| Combination | What Happens |
|-------------|-------------|
| **Prompt Only** | Generates likeness, voice, and movement from text description |
| **Image Only** | Uses image for likeness, auto-generates persona and voice |
| **Voice + Image** | Image for likeness, voice for speech patterns |
| **Video + Voice + Prompt** | Full character control — video for likeness, voice for speech, prompt for personality |
---
## Best Practices
**Start simple.** Upload an image for instant results, or use prompts for creative characters. You can always add voice or refine later.
**Recommended Approaches:**
- **Prompts Only** — Good for creative/fictional characters
- **Image Only** — Instant avatar from photo (no prompt needed)
- **Image + Voice** — Realistic character recreation
**Common Issues and Fixes:**
| Issue | Fix |
|-------|-----|
| Poor lighting in images/videos | Use photo editing to improve lighting |
| Background noise in audio | Record audio in quiet spaces |
| Multiple people in frame | Crop images to show only target person |
| Excessive movement in videos | Keep movements subtle and natural |
## Animal Avatars: Create Talking Animal Characters
URL: https://docs.bithuman.ai/getting-started/animal-mode
Transform animals into interactive avatars using animal mode.
---
## Available Animal Characters
Use prompts to generate or upload your own pet photos:
**Capybara**
**Cat in Hat**
**Rainbow Creature**
**Koala**
**Pixar Turtle**
**Bunny with Glasses**
**White Bunny**
**English Sheepdog**
**Teddy Bear**
**Character D12**
**Character D7**
**Fluffy Creature**
---
## Automatic Face Detection
The AI system automatically locates the character's face and body in animal images, enabling natural movement and expression mapping.
**What works automatically:**
- Eye tracking for natural gaze
- Mouth detection for speech sync
- Expression mapping for emotions
- Facial landmarks for precise animation
---
## Manual Face Marking
When the AI cannot locate facial features automatically, you'll be prompted to manually mark key points.
### Marking Process
When the "Help Needed" prompt appears, click the **Mark Face** button.
Draw a rectangle around the entire facial area — eyes, nose, mouth, and chin.
The system will extract facial landmarks from your selection.
The rectangle should cover all key features: both eyes, nose, mouth, and chin. No need to select individual points — just one bounding rectangle.
---
## Best Practices
**For optimal results:**
- **Clear facial features** — ensure eyes, nose, mouth are visible
- **Front-facing pose** — straight-on view works best
- **Good contrast** — features should stand out from background
- **High resolution** — more detail means better detection
**Troubleshooting:**
| Problem | Solution |
|---------|----------|
| Face not detected | Use a front-facing photo with clear eyes, nose, and mouth visible |
| Poor lip-sync | Try a higher-resolution image with more contrast around the mouth |
| Unnatural movement | Avoid side profiles — straight-on views work best |
**Tips:**
- Start with the pre-built animals above for guaranteed compatibility
- Use well-lit, high-contrast images
- For custom pets, crop the image so the face fills most of the frame
- Test with simple expressions first
---
## Getting Started
Pick an animal character from the grid above
The system automatically attempts face detection
Manually mark facial points if prompted
Your interactive animal avatar is ready
---
# Deployment
## Avatar Sessions: Cloud, CPU & GPU Deployment Guide
URL: https://docs.bithuman.ai/deployment/avatar-sessions
An **AvatarSession** is how you bring a bitHuman avatar into a LiveKit room. This guide covers every way to do it, with complete working examples.
**New to bitHuman?** Start with [How It Works](/getting-started/how-it-works) to understand the core concepts first.
---
## Choose Your Approach
| Approach | Best For | Model Files | GPU Required | Internet Required |
|----------|----------|-------------|--------------|-------------------|
| [Cloud Plugin](#cloud-plugin) | Getting started, web apps | No | No | Yes |
| [Self-Hosted CPU](#self-hosted-cpu) | Privacy, edge devices | Yes (.imx) | No | Only for auth |
| [Self-Hosted GPU](#self-hosted-gpu) | Dynamic faces, custom images | No (uses images) | Yes | Only for auth |
---
## Prerequisites
All approaches need these basics:
You also need a LiveKit server. If you don't have one:
```bash
# Option 1: LiveKit Cloud (easiest)
# Sign up at https://cloud.livekit.io — free tier available
# Option 2: Self-hosted LiveKit
docker run --rm -p 7880:7880 -p 7881:7881 -p 7882:7882/udp \
livekit/livekit-server --dev
```
---
## Cloud Plugin
The cloud plugin runs the avatar on bitHuman's servers. You just provide an Agent ID and API secret — no model files, no GPU.
### Complete Working Example
```python
import asyncio
import os
from livekit.agents import (
Agent,
AgentSession,
JobContext,
RoomOutputOptions,
WorkerOptions,
cli,
llm,
)
from livekit.plugins import openai, silero, bithuman
# 1. Define your AI agent
class MyAgent(Agent):
def __init__(self):
super().__init__(
instructions="""You are a helpful and friendly assistant.
Keep responses concise — 1-2 sentences.""",
)
# 2. Set up the session when a user connects
async def entrypoint(ctx: JobContext):
await ctx.connect()
# Wait for a user to join the room
await ctx.wait_for_participant()
# Create the avatar session (cloud-hosted)
avatar = bithuman.AvatarSession(
avatar_id=os.getenv("BITHUMAN_AGENT_ID"), # e.g. "A78WKV4515"
api_secret=os.getenv("BITHUMAN_API_SECRET"),
)
# Create the agent session with AI components
session = AgentSession(
stt=openai.STT(), # Speech-to-text
llm=openai.LLM(), # AI language model
tts=openai.TTS(), # Text-to-speech
vad=silero.VAD.load(), # Voice activity detection
)
# Start everything — avatar joins the room automatically
await avatar.start(session, room=ctx.room)
await session.start(
agent=MyAgent(),
room=ctx.room,
room_output_options=RoomOutputOptions(audio_enabled=False),
)
# 3. Launch
if __name__ == "__main__":
cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
```
### Environment Variables
```bash
# Required
export BITHUMAN_API_SECRET="your_api_secret" # From www.bithuman.ai/#developer
export BITHUMAN_AGENT_ID="A78WKV4515" # Your agent's ID
export OPENAI_API_KEY="sk-..." # For STT, LLM, TTS
# LiveKit connection
export LIVEKIT_URL="wss://your-project.livekit.cloud"
export LIVEKIT_API_KEY="APIxxxxxxxx"
export LIVEKIT_API_SECRET="xxxxxxxx"
```
### Run It
```bash
python agent.py dev
```
Then open [agents-playground.livekit.io](https://agents-playground.livekit.io) to connect and talk to your avatar.
### How It Works Behind the Scenes
When `avatar.start()` and `session.start()` run:
1. The plugin sends a request to bitHuman's cloud API
2. A cloud avatar worker receives the request
3. The worker downloads the avatar model (cached after first time)
4. The worker joins your LiveKit room as a participant named `bithuman-avatar-agent`
5. As your agent produces TTS audio, the worker generates animated video frames
6. Video is published to the room — users see the avatar speaking
**Essence vs Expression model:** By default, the cloud plugin uses the **Essence** (CPU) model, which works with pre-built `.imx` avatars. Add `model="expression"` to use the **Expression** (GPU) model, which supports custom face images.
### Using Expression Model (GPU) with Custom Image
```python
from PIL import Image
avatar = bithuman.AvatarSession(
avatar_image=Image.open("face.jpg"), # Any face image
api_secret=os.getenv("BITHUMAN_API_SECRET"),
model="expression",
)
```
---
## Self-Hosted CPU
Run the avatar entirely on your own machine using a downloaded `.imx` model file. Great for privacy and offline use.
### Complete Working Example
```python
import asyncio
import os
from livekit.agents import (
Agent,
AgentSession,
JobContext,
RoomOutputOptions,
WorkerOptions,
cli,
llm,
)
from livekit.plugins import openai, silero, bithuman
class MyAgent(Agent):
def __init__(self):
super().__init__(
instructions="You are a helpful assistant. Keep responses brief.",
)
async def entrypoint(ctx: JobContext):
await ctx.connect()
await ctx.wait_for_participant()
# Create the avatar session (self-hosted, CPU)
avatar = bithuman.AvatarSession(
model_path=os.getenv("BITHUMAN_MODEL_PATH"), # e.g. "/models/avatar.imx"
api_secret=os.getenv("BITHUMAN_API_SECRET"),
)
session = AgentSession(
stt=openai.STT(),
llm=openai.LLM(),
tts=openai.TTS(),
vad=silero.VAD.load(),
)
await avatar.start(session, room=ctx.room)
await session.start(
agent=MyAgent(),
room=ctx.room,
room_output_options=RoomOutputOptions(audio_enabled=False),
)
if __name__ == "__main__":
cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
```
### Environment Variables
```bash
# Required
export BITHUMAN_API_SECRET="your_api_secret"
export BITHUMAN_MODEL_PATH="/path/to/avatar.imx"
export OPENAI_API_KEY="sk-..."
# LiveKit connection
export LIVEKIT_URL="wss://your-project.livekit.cloud"
export LIVEKIT_API_KEY="APIxxxxxxxx"
export LIVEKIT_API_SECRET="xxxxxxxx"
```
### How It Differs from Cloud
| Aspect | Cloud | Self-Hosted CPU |
|--------|-------|-----------------|
| Model location | bitHuman's servers | Your machine |
| Avatar parameter | `avatar_id="A78WKV4515"` | `model_path="/path/to/avatar.imx"` |
| Internet needed | Yes (always) | Only for authentication |
| First frame latency | 2-4 seconds | ~20 seconds (model load) |
| Privacy | Audio sent to cloud | Audio stays local |
### System Requirements
- **CPU:** 4+ cores (8 recommended)
- **RAM:** 8 GB minimum
- **Disk:** ~500 MB per `.imx` model
- **OS:** Linux (x64/ARM64), macOS (M2+), or Windows (WSL)
---
## Self-Hosted GPU
Use a GPU container that generates avatars from any face image — no pre-built models needed.
### Complete Working Example
```python
import asyncio
import os
from livekit.agents import (
Agent,
AgentSession,
JobContext,
RoomOutputOptions,
WorkerOptions,
cli,
llm,
)
from livekit.plugins import openai, silero, bithuman
class MyAgent(Agent):
def __init__(self):
super().__init__(
instructions="You are a helpful assistant. Keep responses brief.",
)
async def entrypoint(ctx: JobContext):
await ctx.connect()
await ctx.wait_for_participant()
# Create the avatar session (self-hosted GPU container)
avatar = bithuman.AvatarSession(
api_url=os.getenv("CUSTOM_GPU_URL", "http://localhost:8089/launch"),
api_secret=os.getenv("BITHUMAN_API_SECRET"),
avatar_image="https://example.com/face.jpg", # Any face image URL
)
session = AgentSession(
stt=openai.STT(),
llm=openai.LLM(),
tts=openai.TTS(),
vad=silero.VAD.load(),
)
await avatar.start(session, room=ctx.room)
await session.start(
agent=MyAgent(),
room=ctx.room,
room_output_options=RoomOutputOptions(audio_enabled=False),
)
if __name__ == "__main__":
cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
```
### Start the GPU Container First
```bash
# Pull and run the GPU avatar container
docker run --gpus all -p 8089:8089 \
-v /path/to/model-storage:/data/models \
-e BITHUMAN_API_SECRET=your_api_secret \
docker.io/sgubithuman/expression-avatar:latest
```
### Environment Variables
```bash
# Required
export BITHUMAN_API_SECRET="your_api_secret"
export CUSTOM_GPU_URL="http://localhost:8089/launch"
export OPENAI_API_KEY="sk-..."
# LiveKit connection
export LIVEKIT_URL="wss://your-project.livekit.cloud"
export LIVEKIT_API_KEY="APIxxxxxxxx"
export LIVEKIT_API_SECRET="xxxxxxxx"
```
For detailed GPU container setup, see [Self-Hosted GPU Container](/deployment/self-hosted-gpu).
---
## Adding Gestures (Dynamics)
Make your avatar perform gestures like waving, nodding, or laughing in response to conversation keywords.
Dynamics require a cloud-generated agent with gestures enabled. Create one at [www.bithuman.ai](https://www.bithuman.ai).
### Step 1: Check Available Gestures
```python
import requests
agent_id = "A78WKV4515"
headers = {"api-secret": os.getenv("BITHUMAN_API_SECRET")}
resp = requests.get(
f"https://api.bithuman.ai/v1/dynamics/{agent_id}",
headers=headers,
)
gestures = resp.json()["data"].get("gestures", {})
print(list(gestures.keys()))
# Example: ["mini_wave_hello", "talk_head_nod_subtle", "laugh_react"]
```
### Step 2: Trigger Gestures from Keywords
```python
from livekit.agents import AgentSession, UserInputTranscribedEvent
from bithuman.api import VideoControl
KEYWORD_ACTION_MAP = {
"hello": "mini_wave_hello",
"hi": "mini_wave_hello",
"funny": "laugh_react",
"laugh": "laugh_react",
"yes": "talk_head_nod_subtle",
}
# Inside your entrypoint, after session.start():
@session.on("user_input_transcribed")
def on_transcribed(event: UserInputTranscribedEvent):
if not event.is_final:
return
text = event.transcript.lower()
for keyword, action in KEYWORD_ACTION_MAP.items():
if keyword in text:
asyncio.create_task(
avatar.runtime.push(VideoControl(action=action))
)
break
```
```python
from livekit import rtc
import json
from datetime import datetime
KEYWORD_ACTION_MAP = {
"hello": "mini_wave_hello",
"funny": "laugh_react",
}
async def trigger_gesture(participant: rtc.LocalParticipant, target: str, action: str):
await participant.perform_rpc(
destination_identity=target,
method="trigger_dynamics",
payload=json.dumps({
"action": action,
"identity": participant.identity,
"timestamp": datetime.utcnow().isoformat(),
}),
)
# Inside your entrypoint, after session.start():
@session.on("user_input_transcribed")
def on_transcribed(event: UserInputTranscribedEvent):
if not event.is_final:
return
text = event.transcript.lower()
for keyword, action in KEYWORD_ACTION_MAP.items():
if keyword in text:
for identity in ctx.room.remote_participants.keys():
asyncio.create_task(
trigger_gesture(ctx.room.local_participant, identity, action)
)
break
```
---
## Controlling the Avatar via REST API
Once an avatar is running in a room, you can control it from any backend using the REST API — no LiveKit connection needed.
### Make the Avatar Speak
```bash
curl -X POST "https://api.bithuman.ai/v1/agent/A78WKV4515/speak" \
-H "api-secret: $BITHUMAN_API_SECRET" \
-H "Content-Type: application/json" \
-d '{"message": "Hello! Welcome to our demo."}'
```
### Add Context (Silent Knowledge)
```bash
curl -X POST "https://api.bithuman.ai/v1/agent/A78WKV4515/add-context" \
-H "api-secret: $BITHUMAN_API_SECRET" \
-H "Content-Type: application/json" \
-d '{
"context": "The customer just purchased a premium plan.",
"type": "add_context"
}'
```
The avatar won't say this aloud, but it will use the information in future responses.
These REST API calls work from any language or platform — use them to integrate avatars into existing apps without touching the agent code.
---
## Using the SDK Without LiveKit
If you don't need real-time rooms (e.g., generating video files or building a custom UI), use the Python SDK directly:
```python
import asyncio
import cv2
from bithuman import AsyncBithuman
from bithuman.audio import load_audio, float32_to_int16
async def main():
# Initialize the runtime
runtime = await AsyncBithuman.create(
model_path="avatar.imx",
api_secret="your_api_secret",
)
await runtime.start()
# Load an audio file and push it
audio, sr = load_audio("speech.wav")
audio_int16 = float32_to_int16(audio)
await runtime.push_audio(audio_int16.tobytes(), sr)
await runtime.flush()
# Get animated video frames
async for frame in runtime.run():
if frame.has_image:
cv2.imshow("Avatar", frame.bgr_image)
cv2.waitKey(1)
if frame.end_of_speech:
break
asyncio.run(main())
```
This gives you raw numpy frames — display them however you want.
---
## Complete Docker Example
For the fastest path to a working demo, use the Docker example that packages everything together:
```bash
# Clone the examples repo
git clone https://github.com/bithuman-product/examples.git
cd examples/essence-selfhosted
# Configure
cat > .env << 'EOF'
BITHUMAN_API_SECRET=your_api_secret
OPENAI_API_KEY=sk-...
LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=APIxxxxxxxx
LIVEKIT_API_SECRET=xxxxxxxx
EOF
# Add your avatar model
mkdir -p models
cp ~/Downloads/avatar.imx models/
# Launch
docker compose up
```
Open [http://localhost:4202](http://localhost:4202) to talk to your avatar.
---
## Troubleshooting
**Cloud mode:** Check that your `avatar_id` exists — look it up in the [bitHuman dashboard](https://www.bithuman.ai). Verify your API secret is valid with:
```bash
curl -X POST https://api.bithuman.ai/v1/validate \
-H "api-secret: $BITHUMAN_API_SECRET"
```
**Self-hosted mode:** Check that the `.imx` file path is correct and the file is not corrupted:
```bash
bithuman validate --model-path /path/to/avatar.imx
```
The avatar needs audio input to animate. Ensure:
1. Your TTS is producing audio (test with `openai.TTS()` separately)
2. Ensure `avatar.start(session, room=ctx.room)` is called before `session.start()`
3. Check agent logs for audio pipeline errors
- Verify your API secret is correct (copy-paste from dashboard)
- Check you have credits remaining in your account
- Ensure the `BITHUMAN_API_SECRET` environment variable is set
**Cloud:** First request downloads the model (~2-4 seconds). Subsequent requests use cache (~1-2 seconds).
**Self-hosted CPU:** First load takes ~20 seconds (model initialization). Keep the process running for fast subsequent sessions.
**Self-hosted GPU:** Cold start takes ~30-40 seconds. Use long-running containers with preset avatars for ~4 second startup.
All avatar workers are busy. The system retries automatically (up to 5 times with backoff). If it persists:
- Check your usage limits
- Try again in a few seconds
- For self-hosted: increase the number of worker replicas
---
## Billing & Credits
Avatar sessions consume credits based on the deployment mode and session duration.
| Deployment | Credit Cost | Billed By | Notes |
|------------|-------------|-----------|-------|
| **Cloud Plugin** | Per session minute | Session duration | Includes GPU rendering |
| **Self-Hosted CPU** | Per authentication | Auth call | Rendering is free (your hardware) |
| **Self-Hosted GPU** | Per authentication | Auth call | Rendering is free (your hardware) |
Check your remaining credits at [www.bithuman.ai](https://www.bithuman.ai) > Developer section. Credits are consumed only for active sessions — idle containers cost nothing.
---
## Next Steps
Add gestures and movements
Get notified about session events
Put avatars on any website
## LiveKit Cloud Plugin: Zero-GPU Avatar Setup
URL: https://docs.bithuman.ai/deployment/livekit-cloud-plugin
Use existing bitHuman agents in real-time applications with our cloud-hosted LiveKit plugin. The avatar runs on bitHuman's servers — no model files, no GPU needed on your side.
**New here?** Read [How It Works](/getting-started/how-it-works) first to understand rooms, sessions, and avatars.
## Quick Start
The bitHuman plugin ships inside the livekit/agents repository. Remove any PyPI version first to avoid conflicts, then install from GitHub:
```bash
# Remove old PyPI version if present (safe to ignore "not installed" warnings)
uv pip uninstall livekit-plugins-bithuman
# Install the latest version
GIT_LFS_SKIP_SMUDGE=1 uv pip install git+https://github.com/livekit/agents@main#subdirectory=livekit-plugins/livekit-plugins-bithuman
```
Go to [www.bithuman.ai](https://www.bithuman.ai/#developer) and copy your **API Secret**.
Click on any agent card in [your dashboard](https://www.bithuman.ai). The **Agent Settings** dialog shows your Agent ID (e.g., `A78WKV4515`).
```bash
export BITHUMAN_API_SECRET="your_api_secret"
export BITHUMAN_AGENT_ID="A78WKV4515"
export OPENAI_API_KEY="sk-..."
# LiveKit (get from cloud.livekit.io)
export LIVEKIT_URL="wss://your-project.livekit.cloud"
export LIVEKIT_API_KEY="APIxxxxxxxx"
export LIVEKIT_API_SECRET="xxxxxxxx"
```
---
## Complete Working Example
Here's a full agent that uses a cloud-hosted avatar:
```python
import asyncio
import os
from livekit.agents import (
Agent,
AgentSession,
JobContext,
RoomOutputOptions,
WorkerOptions,
cli,
llm,
)
from livekit.plugins import openai, silero, bithuman
class MyAgent(Agent):
def __init__(self):
super().__init__(
instructions="""You are a friendly assistant.
Keep responses to 1-2 sentences.""",
)
async def entrypoint(ctx: JobContext):
# Connect to the LiveKit room
await ctx.connect()
# Wait for a human to join
await ctx.wait_for_participant()
# Create a cloud-hosted avatar
avatar = bithuman.AvatarSession(
avatar_id=os.getenv("BITHUMAN_AGENT_ID"),
api_secret=os.getenv("BITHUMAN_API_SECRET"),
)
# Wire up the AI pipeline
session = AgentSession(
stt=openai.STT(), # Listens to the user
llm=openai.LLM(), # Generates responses
tts=openai.TTS(), # Converts text to speech
vad=silero.VAD.load(), # Detects when user is speaking
)
# Start — avatar joins room and begins animating
await avatar.start(session, room=ctx.room)
await session.start(
agent=MyAgent(),
room=ctx.room,
room_output_options=RoomOutputOptions(audio_enabled=False),
)
if __name__ == "__main__":
cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
```
Run it:
```bash
python agent.py dev
```
Open [agents-playground.livekit.io](https://agents-playground.livekit.io) and talk to your avatar.
### What Happens When You Run This
1. Your agent connects to a LiveKit room and waits for a user
2. When a user joins, `AvatarSession` sends a request to bitHuman's cloud
3. A cloud avatar worker downloads the model (cached after first time) and joins the room
4. The user speaks → STT transcribes → LLM responds → TTS generates audio → Avatar animates
5. The avatar publishes video to the room — the user sees a talking face
---
## Avatar Modes
### Essence Model (CPU) — Default
Pre-built avatars with full body support, animal mode, and fast response times.
```python
avatar = bithuman.AvatarSession(
avatar_id="A78WKV4515",
api_secret="your_api_secret",
)
```
### Expression Model (GPU) — Agent ID
Higher-fidelity face animation for platform-created agents.
```python
avatar = bithuman.AvatarSession(
avatar_id="A78WKV4515",
api_secret="your_api_secret",
model="expression",
)
```
### Expression Model (GPU) — Custom Image
Create an avatar from any face image on-the-fly.
```python
from PIL import Image
avatar = bithuman.AvatarSession(
avatar_image=Image.open("face.jpg"),
api_secret="your_api_secret",
model="expression",
)
```
### Model Comparison
| Feature | Essence (CPU) | Expression (GPU) |
|---------|--------------|------------------|
| Personalities | Pre-trained | Dynamic |
| Response time | Faster (~2s) | Standard (~4s) |
| Body support | Full body + animal mode | Face and shoulders |
| Animal mode | Yes | No |
| Custom images | No | Yes |
---
## Adding Gestures (Dynamics)
Make the avatar wave, nod, or laugh in response to conversation keywords.
### Step 1: Get Available Gestures
```python
import requests
import os
agent_id = os.getenv("BITHUMAN_AGENT_ID")
headers = {"api-secret": os.getenv("BITHUMAN_API_SECRET")}
response = requests.get(
f"https://api.bithuman.ai/v1/dynamics/{agent_id}",
headers=headers,
)
gestures = response.json()["data"].get("gestures", {})
print(list(gestures.keys()))
# Example: ["mini_wave_hello", "talk_head_nod_subtle", "laugh_react"]
```
### Step 2: Trigger on Keywords
```python
from livekit.agents import UserInputTranscribedEvent
from livekit import rtc
import json
from datetime import datetime
KEYWORD_ACTION_MAP = {
"laugh": "laugh_react",
"funny": "laugh_react",
"hello": "mini_wave_hello",
"hi": "mini_wave_hello",
}
async def send_dynamics_trigger(
local_participant: rtc.LocalParticipant,
destination_identity: str,
action: str,
) -> None:
await local_participant.perform_rpc(
destination_identity=destination_identity,
method="trigger_dynamics",
payload=json.dumps({
"action": action,
"identity": local_participant.identity,
"timestamp": datetime.utcnow().isoformat(),
}),
)
# Add this after session.start() in your entrypoint:
@session.on("user_input_transcribed")
def on_user_input_transcribed(event: UserInputTranscribedEvent):
if not event.is_final:
return
transcript = event.transcript.lower()
for keyword, action in KEYWORD_ACTION_MAP.items():
if keyword in transcript:
for identity in ctx.room.remote_participants.keys():
asyncio.create_task(
send_dynamics_trigger(
ctx.room.local_participant, identity, action
)
)
break
```
Gesture actions vary by agent. Always check the Dynamics API response first to see what's available for your specific agent.
---
## Configuration
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `avatar_id` | string | Yes* | Agent ID from the bitHuman dashboard |
| `avatar_image` | PIL.Image | Yes* | Face image for on-the-fly avatar (Expression only) |
| `api_secret` | string | Yes | Your API secret |
| `model` | string | No | `"essence"` (default) or `"expression"` |
*Either `avatar_id` or `avatar_image` is required.
---
## Cloud Advantages
- **No Local Storage** — No large model files to download or manage
- **Auto-Updates** — Always uses the latest model versions
- **Scalability** — Handles multiple concurrent sessions automatically
- **Cross-Platform** — Works on any device with internet
---
## Pricing
Visit [www.bithuman.ai](https://www.bithuman.ai/#api) for current pricing.
**Free Tier:** 199 credits per month, community support
**Pro:** Unlimited credits, priority support
---
## Troubleshooting
| Problem | Solution |
|---------|----------|
| Authentication errors | Verify API secret at [www.bithuman.ai](https://www.bithuman.ai/#developer) |
| Avatar doesn't appear | Check agent_id exists in your dashboard |
| Network timeouts | Ensure stable internet; the plugin retries automatically |
| Plugin installation fails | Use `uv` with `GIT_LFS_SKIP_SMUDGE=1` flag |
| No lip movement | Ensure `avatar.start(session, room=ctx.room)` is called before `session.start()` |
---
## Next Steps
All avatar modes explained with complete examples
Run on your own infrastructure
Configure gestures and animations
## Self-Hosted GPU: Expression Avatar Docker Container
URL: https://docs.bithuman.ai/deployment/self-hosted-gpu
**Preview Feature** — 2 credits per minute while using the GPU container.
## Overview
The self-hosted GPU avatar container (`docker.io/sgubithuman/expression-avatar:latest`) enables production-grade avatar generation on your own GPU infrastructure.
- **Full Control** — Complete control over deployment, scaling, and configuration
- **Cost Optimization** — Pay only for the GPU resources you use
- **Data Privacy** — Avatar images and audio never leave your infrastructure
- **Customization** — Extend the worker with custom logic and integrations
### How It Works
The container is a GPU worker that joins a [LiveKit](https://livekit.io) room and streams avatar video frames in real time. Your application calls the `/launch` endpoint with LiveKit room credentials and an avatar image; the container connects to the room, listens for audio, and generates lip-synced video at 25 FPS — entirely on your GPU.
```
Your Agent (LiveKit)
│
│ POST /launch
│ { livekit_url, livekit_token, room_name, avatar_image }
▼
expression-avatar container
│
├─ Joins LiveKit room as video publisher
├─ Receives audio from agent via data stream
└─ Generates 25 FPS lip-synced video → streams to room
↑
100% local GPU — no cloud calls during inference
```
---
## Prerequisites
- NVIDIA GPU with **≥8 GB VRAM** (RTX 3080 or better)
- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) installed
- Docker 24+ with Compose v2
- bitHuman API secret from the [bitHuman Console](https://www.bithuman.ai)
- Model weights download automatically on first start (~5 GB, cached in Docker volume)
- A running [LiveKit server](https://docs.livekit.io/home/self-hosting/local/) (or LiveKit Cloud)
---
## Quick Start
Model weights download automatically on first run — just provide your API secret:
```bash
# 1. Pull the image (includes wav2vec2 audio encoder, ~360 MB)
docker pull docker.io/sgubithuman/expression-avatar:latest
# 2. Run — proprietary weights (~4.7 GB) download automatically on first start
docker run --gpus all -p 8089:8089 \
-v bithuman-models:/data/models \
-e BITHUMAN_API_SECRET=your_api_secret \
docker.io/sgubithuman/expression-avatar:latest
```
```bash
# 3. Wait for startup (first run: ~3 min download + ~48s GPU compilation)
# Subsequent starts: ~48s (weights already cached in the named volume)
curl http://localhost:8089/health
# {"status": "healthy", "service": "expression-avatar", "active_sessions": 0, "max_sessions": 8}
```
The `-v bithuman-models:/data/models` named volume caches the downloaded weights so you only pay the download cost once.
Once healthy, the container is ready to accept avatar sessions via `/launch`.
---
## Docker Compose Setup
Use the [full example](https://github.com/bithuman-product/examples/tree/main/expression-selfhosted) for a complete setup with LiveKit, an AI agent, and a web frontend:
```bash
git clone https://github.com/bithuman-product/examples.git
cd examples/expression-selfhosted
# Configure environment
cp .env.example .env
# Edit .env with your API secret, OpenAI key, and avatar image
# Copy your avatar image into ./avatars/
mkdir -p avatars
cp /path/to/your/avatar.jpg avatars/
# Model weights download automatically on first run — nothing to pre-download!
docker compose up
```
Open `http://localhost:4202` to start a conversation with your GPU avatar.
---
## Integration Guide
The container exposes a simple HTTP API. Your LiveKit agent calls `/launch` to start an avatar session. There are two ways to integrate:
### Option 1: LiveKit Python Plugin (Recommended)
Install the bitHuman LiveKit plugin:
```bash
pip install livekit-plugins-bithuman
```
In your LiveKit agent, point `AvatarSession` at your container's `/launch` endpoint:
```python
from livekit.agents import Agent, AgentSession, JobContext, WorkerOptions, WorkerType, cli
from livekit.plugins import bithuman, openai, silero
async def entrypoint(ctx: JobContext):
await ctx.connect()
await ctx.wait_for_participant()
avatar = bithuman.AvatarSession(
api_url="http://localhost:8089/launch", # your container
api_secret="your_api_secret", # for billing
avatar_image="/path/to/avatar.jpg", # local file or HTTPS URL
)
session = AgentSession(
llm=openai.realtime.RealtimeModel(voice="coral"),
vad=silero.VAD.load(),
)
await avatar.start(session, room=ctx.room)
await session.start(
agent=Agent(instructions="You are a helpful assistant."),
room=ctx.room,
)
if __name__ == "__main__":
cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint, worker_type=WorkerType.ROOM))
```
The plugin handles room token generation and calls `/launch` automatically when a participant joins.
### Option 2: Direct HTTP API
You can call `/launch` directly from any HTTP client. The container joins the LiveKit room as a video publisher.
```bash
# Generate a LiveKit room token first (using livekit-server-sdk or CLI)
TOKEN=$(livekit-token create --room my-room --identity avatar-worker \
--api-key devkey --api-secret your-livekit-secret)
# Launch with an image URL
curl -X POST http://localhost:8089/launch \
-F "livekit_url=ws://your-livekit-server:7880" \
-F "livekit_token=$TOKEN" \
-F "room_name=my-room" \
-F "avatar_image_url=https://example.com/avatar.jpg"
# Or upload an image file directly
curl -X POST http://localhost:8089/launch \
-F "livekit_url=ws://your-livekit-server:7880" \
-F "livekit_token=$TOKEN" \
-F "room_name=my-room" \
-F "avatar_image=@./avatar.jpg"
```
Response (async by default):
```json
{
"status": "pending",
"task_id": "a1b2c3d4",
"room_name": "my-room"
}
```
The avatar is live in the room within ~4–6 seconds.
---
## HTTP API Reference
All endpoints are served on port `8089` (default).
### `POST /launch`
Start an avatar session for a LiveKit room. The container joins the room and begins streaming lip-synced video.
**Content-Type:** `multipart/form-data`
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `livekit_url` | string | Yes | LiveKit server WebSocket URL (e.g. `ws://livekit:7880`) |
| `livekit_token` | string | Yes | LiveKit room token with publish permissions |
| `room_name` | string | Yes | LiveKit room name (must match token) |
| `avatar_image` | file | No* | Avatar image file upload (JPEG/PNG) |
| `avatar_image_url` | string | No* | Avatar image HTTPS URL (alternative to file upload) |
| `prompt` | string | No | Motion prompt (default: `"A person is talking naturally."`) |
| `api_secret` | string | No | Override billing secret (defaults to `BITHUMAN_API_SECRET`) |
| `async_mode` | bool | No | Return immediately (`true`, default) or wait for session to end |
*Provide either `avatar_image` or `avatar_image_url`. If neither is given, a default image is used.
**Response (async_mode=true):**
```json
{ "status": "pending", "task_id": "a1b2c3d4", "room_name": "my-room" }
```
**Error responses:**
- `503 Service Unavailable` — container still initializing, or at session capacity
- `400 Bad Request` — invalid image or download failed
---
### `GET /health`
Lightweight health check. Always returns 200 once the container is running (even during model loading).
```json
{
"status": "healthy",
"service": "expression-avatar",
"active_sessions": 2,
"max_sessions": 8
}
```
---
### `GET /ready`
Readiness check. Returns `200` only when the model is loaded and a session slot is available. Use this to gate traffic in load balancers or health checks.
```json
{
"status": "ready",
"model_ready": true,
"active_sessions": 2,
"available_sessions": 6,
"max_sessions": 8
}
```
Returns `503` with `"status": "not_ready"` during model loading, or `"status": "at_capacity"` when all session slots are in use.
---
### `GET /tasks`
List all sessions (active and completed).
```bash
curl http://localhost:8089/tasks
```
```json
{
"tasks": [
{
"task_id": "a1b2c3d4",
"room_name": "my-room",
"status": "running",
"created_at": "2024-01-01T12:00:00",
"completed_at": null,
"error": null
}
]
}
```
---
### `GET /tasks/{task_id}`
Check the status of a specific session.
```json
{
"task_id": "a1b2c3d4",
"room_name": "my-room",
"status": "running",
"created_at": "2024-01-01T12:00:00",
"completed_at": null,
"error": null
}
```
Status values: `pending` → `running` → `completed` / `failed` / `cancelled`
---
### `POST /tasks/{task_id}/stop`
Stop a running session and release the session slot.
```bash
curl -X POST http://localhost:8089/tasks/a1b2c3d4/stop
```
---
### `POST /benchmark`
Run an inference benchmark and return per-stage timing. Useful for verifying GPU performance.
```bash
curl -X POST "http://localhost:8089/benchmark?iterations=10"
```
```json
{
"iterations": 10,
"frames_per_generate": 24,
"avg_ms": 79.3,
"fps": 302.6,
"stages": {
"dit_ms": 41.2,
"vae_decode_ms": 13.1,
"vae_encode_ms": 8.5,
"color_correct_ms": 6.1,
"postprocess_ms": 2.8,
"audio_ms": 7.1
},
"vram_gb": 6.2,
"gpu": "NVIDIA GPU"
}
```
---
### `GET /test-frame`
Generate a few chunks and return the last frame as a JPEG. Useful for verifying the model is producing valid output.
```bash
curl http://localhost:8089/test-frame --output frame.jpg
open frame.jpg
```
---
## Environment Variables
| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `BITHUMAN_API_SECRET` | Yes | — | API secret for billing and weight download |
| `MAX_SESSIONS` | No | `8` | Max concurrent avatar sessions |
| `CUDA_VISIBLE_DEVICES` | No | all GPUs | Restrict to specific GPU (e.g. `0`) |
| `BITHUMAN_API_URL` | No | `https://api.bithuman.ai` | Override API endpoint (for testing) |
| `FAST_DECODER_CONFIG` | No | — | Path to fast decoder config JSON (optional speedup) |
| `FAST_DECODER_CHECKPOINT` | No | — | Path to fast decoder weights (optional speedup) |
Without `BITHUMAN_API_SECRET`, avatar sessions will run but usage will not be tracked or billed. This is not permitted for production use.
---
## Performance Characteristics
| GPU Tier | VRAM Usage | Concurrent Sessions |
|----------|------------|---------------------|
| High-end (data center) | ~6 GB | up to 8 concurrent |
| High-end (consumer) | ~6 GB | up to 4 concurrent |
| Mid-range | ~6 GB | up to 2 concurrent |
| Configuration | Time to First Frame | Description |
|---------------|---------------------|-------------|
| Long-running container | ~4–6 seconds | Model loaded at startup; new sessions encode image (~2s) then stream |
| Cold start | ~48 seconds | Full GPU model compilation on first start (cached on subsequent starts) |
### Long-Running Containers (Recommended)
Keep the container running between sessions. The model loads once at startup (~48s including GPU compilation), and subsequent sessions start in ~4–6 seconds.
```bash
docker run --gpus all -p 8089:8089 --restart always \
-v bithuman-models:/data/models \
-e BITHUMAN_API_SECRET=your_api_secret \
docker.io/sgubithuman/expression-avatar:latest
```
---
## Troubleshooting
| Problem | Solution |
|---------|----------|
| Container won't start | Check GPU: `nvidia-smi`; check logs: `docker logs ` |
| First start takes >5 minutes | Normal — weights are downloading (~4.7 GB). Check logs for download progress. |
| Download fails with 401 | Verify `BITHUMAN_API_SECRET` is set and valid |
| Download fails with connection error | Check outbound internet access from the container |
| `/health` returns connection refused | Container still initializing — wait for `PREWARM: Pipeline loaded` in logs |
| `/launch` returns `503 not_ready` | Model still loading — poll `/ready` until `model_ready: true` |
| `/launch` returns `503 at_capacity` | All session slots in use; increase `MAX_SESSIONS` or scale horizontally |
| Startup takes >2 minutes (after download) | GPU compilation runs once per container — subsequent starts reuse compiled cache |
| Out of memory | Use a GPU with ≥8 GB VRAM; reduce `MAX_SESSIONS` if needed |
| Billing not working | Verify `BITHUMAN_API_SECRET` is set; check logs for `[HEARTBEAT]` messages |
| Avatar image not showing | Check `/test-frame` — if it returns a valid JPEG, image encoding is working |
---
## Next Steps
Full GPU setup with LiveKit server, AI agent, and web frontend
LiveKit Agents Python SDK documentation
---
# Integrations
## Embed Avatars on Any Website (iframe)
URL: https://docs.bithuman.ai/integrations/embed
Embed a bitHuman avatar directly on your website so visitors can have real-time conversations without leaving your page.
---
## How Embedding Works
```
Your Website bitHuman Cloud
┌─────────────────────┐ ┌─────────────────────┐
│ │ │ │
│ │ │ │
│ │ │ AI Agent │
│ Your page content │ │ (conversation) │
│ │ │ │
└─────────────────────┘ └─────────────────────┘
```
The avatar runs entirely in bitHuman's cloud. Your website just needs a small embed snippet.
---
## Quick Start
Call the embed token API from your **backend** (never expose your API secret in the browser):
```python
import requests
resp = requests.post(
"https://api.bithuman.ai/v1/embed-tokens/request",
headers={"api-secret": "your_api_secret"},
json={
"agent_id": "A78WKV4515",
"fingerprint": "unique-visitor-id",
},
)
data = resp.json()["data"]
token = data["token"] # Short-lived JWT
session_id = data["sid"] # Session identifier
```
Send the token to your frontend via your own API endpoint:
```javascript
// Your backend endpoint (Express example)
app.get("/api/avatar-token", async (req, res) => {
const response = await fetch(
"https://api.bithuman.ai/v1/embed-tokens/request",
{
method: "POST",
headers: {
"api-secret": process.env.BITHUMAN_API_SECRET,
"Content-Type": "application/json",
},
body: JSON.stringify({
agent_id: "A78WKV4515",
fingerprint: req.query.fp || "anonymous",
}),
}
);
const data = await response.json();
res.json(data.data);
});
```
Use the token to load the avatar widget:
```html
```
---
## Token Details
| Property | Value |
|----------|-------|
| **Lifetime** | 1 hour |
| **Scope** | Single agent, single session |
| **JWT claims** | `userId`, `sessionId`, `agentCode`, `model`, `app` |
**Never put your API secret in frontend code.** Always generate embed tokens from your backend server. The API secret grants full access to your account.
---
## Complete Example (Python + HTML)
### Backend (Flask)
```python
from flask import Flask, jsonify, request
import requests
import os
app = Flask(__name__)
@app.route("/api/avatar-token")
def get_token():
resp = requests.post(
"https://api.bithuman.ai/v1/embed-tokens/request",
headers={"api-secret": os.environ["BITHUMAN_API_SECRET"]},
json={
"agent_id": "A78WKV4515",
"fingerprint": request.args.get("fp", "web-visitor"),
},
)
return jsonify(resp.json()["data"])
if __name__ == "__main__":
app.run(port=3000)
```
### Frontend (HTML)
```html
My Website with Avatar
Talk to Our AI Assistant
```
---
## Customization
### Responsive Sizing
```html
```
### Control the Avatar from Your Page
Use the REST API to send messages to an active avatar session:
```javascript
// Make the avatar say something
await fetch("https://api.bithuman.ai/v1/agent/A78WKV4515/speak", {
method: "POST",
headers: {
"api-secret": API_SECRET, // Call from backend!
"Content-Type": "application/json",
},
body: JSON.stringify({
message: "Welcome! How can I help you today?",
}),
});
```
---
## Troubleshooting
| Problem | Solution |
|---------|----------|
| Blank iframe | Check that the token is valid and not expired (1 hour TTL) |
| No audio | Ensure `allow="microphone"` is set on the iframe |
| CORS errors | Embed tokens must be generated from your backend, not frontend |
| Avatar not responding | Check agent has an active session — verify agent_id is correct |
---
## Next Steps
Get notified when users join sessions
Control what the avatar says programmatically
## Webhooks: Real-Time Avatar Event Notifications
URL: https://docs.bithuman.ai/integrations/webhooks
## Quick Setup
Go to [www.bithuman.ai/#developer](https://www.bithuman.ai/#developer) and open the **Webhooks** section.
Must be HTTPS.
Choose **room.join**, **chat.push**, or both.
```http
Authorization: Bearer your-api-token
X-API-Key: your-secret-key
```
---
## Payload Format
All payloads follow the same structure:
| Field | Type | Description |
|-------|------|-------------|
| `agent_id` | string | The agent that triggered the event |
| `event_type` | string | Event name (`room.join` or `chat.push`) |
| `data` | object | Event-specific data |
| `timestamp` | float | Unix timestamp |
### room.join
```json
{
"agent_id": "agent_abc123",
"event_type": "room.join",
"data": {
"room_name": "customer-support",
"participant_count": 1,
"session_id": "session_xyz789"
},
"timestamp": 1705312200.0
}
```
### chat.push
```json
{
"agent_id": "agent_abc123",
"event_type": "chat.push",
"data": {
"role": "user",
"message": "Hello, I need help with my order",
"session_id": "session_xyz789",
"timestamp": 1705312285.0
},
"timestamp": 1705312285.0
}
```
---
## Implementation Examples
```python Flask (Python)
from flask import Flask, request, jsonify
import hmac, hashlib
app = Flask(__name__)
WEBHOOK_SECRET = "your-webhook-secret"
@app.route('/webhook', methods=['POST'])
def handle_webhook():
signature = request.headers.get('X-bitHuman-Signature', '')
if not verify_signature(request.data, signature):
return jsonify({'error': 'Invalid signature'}), 401
data = request.json
event_type = data.get('event_type')
if event_type == 'room.join':
print(f"User joined session {data['data']['session_id']}")
elif event_type == 'chat.push':
print(f"[{data['data']['role']}] {data['data']['message']}")
return jsonify({'status': 'ok'})
def verify_signature(payload, signature):
expected = hmac.new(
WEBHOOK_SECRET.encode(), payload, hashlib.sha256
).hexdigest()
return hmac.compare_digest(f"sha256={expected}", signature)
if __name__ == '__main__':
app.run(port=3000)
```
```javascript Express (Node.js)
const express = require('express');
const crypto = require('crypto');
const app = express();
app.use(express.json());
const WEBHOOK_SECRET = 'your-webhook-secret';
app.post('/webhook', (req, res) => {
const signature = req.headers['x-bithuman-signature'] || '';
if (!verifySignature(req.body, signature)) {
return res.status(401).json({ error: 'Invalid signature' });
}
const { event_type, data } = req.body;
if (event_type === 'room.join') {
console.log(`User joined session ${data.session_id}`);
} else if (event_type === 'chat.push') {
console.log(`[${data.role}] ${data.message}`);
}
res.json({ status: 'ok' });
});
function verifySignature(payload, signature) {
const expected = crypto
.createHmac('sha256', WEBHOOK_SECRET)
.update(JSON.stringify(payload))
.digest('hex');
return crypto.timingSafeEqual(
Buffer.from(`sha256=${expected}`),
Buffer.from(signature)
);
}
app.listen(3000, () => console.log('Listening on port 3000'));
```
---
## Signature Verification
All webhook requests include an `X-bitHuman-Signature` header. Verify it using HMAC SHA-256:
1. Compute `HMAC-SHA256(secret, raw_request_body)`
2. Compare the hex digest against the signature header (strip `sha256=` prefix)
3. Use constant-time comparison to prevent timing attacks
Always use HTTPS. HTTP endpoints are rejected.
---
## Testing
### Local development with ngrok
```bash
ngrok http 3000
# Use the resulting HTTPS URL as your webhook endpoint
```
### Manual curl test
```bash
curl -X POST https://your-app.com/webhook \
-H "Content-Type: application/json" \
-H "X-bitHuman-Signature: sha256=test" \
-d '{
"agent_id": "test_agent",
"event_type": "room.join",
"data": {
"room_name": "test-room",
"participant_count": 1,
"session_id": "session_123"
},
"timestamp": 1705312200.0
}'
```
---
## Retry Policy
Failed deliveries (non-2xx responses) are retried automatically:
| Attempt | Delay |
|---------|-------|
| 1st retry | 1 second |
| 2nd retry | 5 seconds |
| 3rd retry | 30 seconds |
Maximum 3 retries. Your endpoint must respond within 30 seconds.
## Troubleshooting
| Issue | Solution |
|-------|----------|
| Signature invalid | Verify HMAC SHA-256 against raw request body |
| Timeout errors | Return 200 immediately, process async |
| 404 Not Found | Check endpoint URL in dashboard |
| SSL errors | Use a valid HTTPS certificate |
## Webhook Event Types: room.join & chat.push
URL: https://docs.bithuman.ai/integrations/events
Webhooks deliver HTTP POST requests to your endpoint when avatar events occur. For setup instructions, handler examples, and retry policies, see the [Webhook Integration Guide](/integrations/webhooks).
## Event Types
### room.join
Fired once when a user connects to an avatar session.
```json
{
"agent_id": "agent_customer_support",
"event_type": "room.join",
"data": {
"room_name": "customer-support-room",
"participant_count": 1,
"session_id": "session_xyz789"
},
"timestamp": 1705312200.0
}
```
### chat.push
Fired for each message sent in the conversation (both user and agent).
```json
{
"agent_id": "agent_customer_support",
"event_type": "chat.push",
"data": {
"role": "user",
"message": "I need help with my order #12345",
"session_id": "session_xyz789",
"timestamp": 1705312285.0
},
"timestamp": 1705312285.0
}
```
---
For complete handler examples (Flask, Express), signature verification, endpoint setup, testing, and retry policy, see the [Webhook Integration Guide](/integrations/webhooks).
---
## Async Processing
Return `200` immediately and process events in the background. Long-running work (database writes, API calls, analytics) should be offloaded to a task queue so your endpoint responds within the timeout window. Any standard job queue (Celery, BullMQ, Sidekiq, etc.) works.
---
## Resources
- [Webhook Integration Guide](/integrations/webhooks) — endpoint setup, signature verification, testing, and retry policy
- [Discord](https://discord.gg/ES953n7bPA) — community support
## Flutter + bitHuman: Mobile Avatar App
URL: https://docs.bithuman.ai/integrations/flutter
## Architecture
```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Flutter App │ │ LiveKit Room │ │ Python Agent │
│ Video View │◄──►│ Real-time │◄──►│ bitHuman │
│ Audio Capture │ │ Streaming │ │ Avatar + LLM │
└─────────────────┘ └─────────────────┘ └─────────────────┘
```
- **Flutter App**: Cross-platform UI, camera/microphone capture, video rendering
- **LiveKit Room**: Real-time media routing, participant management
- **Python Agent**: AI conversation processing, avatar rendering
## Prerequisites
- Flutter SDK 3.0+
- Python 3.11+
- bitHuman API Secret
- LiveKit Cloud account
- OpenAI API Key
---
## Quick Start
```bash
mkdir flutter-bithuman-avatar
cd flutter-bithuman-avatar
mkdir -p backend frontend/lib
```
```bash
cd backend
python3 -m venv .venv
source .venv/bin/activate
pip install "livekit-agents[openai,bithuman,silero]~=1.4" flask flask-cors python-dotenv
```
Create `.env`:
```bash
BITHUMAN_API_SECRET=your_api_secret
BITHUMAN_AGENT_ID=A33NZN6384
OPENAI_API_KEY=sk-proj_your_key_here
LIVEKIT_API_KEY=APIyour_key
LIVEKIT_API_SECRET=your_secret
LIVEKIT_URL=wss://your-project.livekit.cloud
```
```bash
cd ../frontend
flutter create . --org com.bithuman.avatar
```
Update `pubspec.yaml` dependencies:
```yaml
dependencies:
flutter:
sdk: flutter
livekit_components: 1.2.2+hotfix.1
livekit_client: ^2.5.3
provider: ^6.1.1
http: ^1.1.0
```
```bash
flutter pub get
```
```bash
# Terminal 1: Start Backend
cd backend && source .venv/bin/activate
python token_server.py &
python agent.py dev
# Terminal 2: Start Frontend
cd frontend
flutter run -d chrome --web-port 8080
```
---
## Token Server
LiveKit requires a JWT to join rooms. Never ship LiveKit API keys in client apps. Use a server endpoint to mint short-lived tokens.
```python token_server.py
from flask import Flask, request, jsonify
from livekit import api
from datetime import timedelta
import os
from dotenv import load_dotenv
load_dotenv()
app = Flask(__name__)
LIVEKIT_API_KEY = os.getenv("LIVEKIT_API_KEY")
LIVEKIT_API_SECRET = os.getenv("LIVEKIT_API_SECRET")
LIVEKIT_URL = os.getenv("LIVEKIT_URL")
@app.route('/token', methods=['POST'])
def create_token():
data = request.get_json() or {}
room = data.get('room', 'flutter-avatar-room')
identity = data.get('participant', 'Flutter User')
at = api.AccessToken(LIVEKIT_API_KEY, LIVEKIT_API_SECRET, identity=identity)
at.add_grant(api.VideoGrant(room_join=True, room=room))
at.ttl = timedelta(hours=1)
return jsonify({'token': at.to_jwt(), 'server_url': LIVEKIT_URL})
if __name__ == '__main__':
app.run(host='0.0.0.0', port=3000)
```
---
## Python Agent
```python agent.py
import os
from dotenv import load_dotenv
from livekit.agents import (
Agent,
AgentSession,
JobContext,
RoomOutputOptions,
WorkerOptions,
WorkerType,
cli,
)
from livekit.plugins import bithuman, openai, silero
load_dotenv()
async def entrypoint(ctx: JobContext):
await ctx.connect()
await ctx.wait_for_participant()
avatar = bithuman.AvatarSession(
avatar_id=os.getenv("BITHUMAN_AGENT_ID"),
api_secret=os.getenv("BITHUMAN_API_SECRET"),
)
session = AgentSession(
llm=openai.realtime.RealtimeModel(
voice="coral",
model="gpt-4o-mini-realtime-preview",
),
vad=silero.VAD.load(),
)
await avatar.start(session, room=ctx.room)
await session.start(
agent=Agent(
instructions="You are a helpful assistant. Respond concisely."
),
room=ctx.room,
room_output_options=RoomOutputOptions(audio_enabled=False),
)
if __name__ == "__main__":
cli.run_app(WorkerOptions(
entrypoint_fnc=entrypoint,
worker_type=WorkerType.ROOM,
job_memory_warn_mb=2000,
num_idle_processes=1,
initialize_process_timeout=180,
))
```
---
## Flutter App
### LiveKit Configuration
```dart config/livekit_config.dart
import 'dart:convert';
import 'dart:math';
import 'package:http/http.dart' as http;
class LiveKitConfig {
static const String serverUrl = 'wss://your-project.livekit.cloud';
static const String? tokenEndpoint = 'http://localhost:3000/token';
static String get roomName {
const chars = 'abcdefghijklmnopqrstuvwxyz0123456789';
final random = Random();
return 'room-${String.fromCharCodes(
Iterable.generate(12, (_) => chars.codeUnitAt(random.nextInt(chars.length)))
)}';
}
static String get participantName {
const chars = 'abcdefghijklmnopqrstuvwxyz0123456789';
final random = Random();
return 'user-${String.fromCharCodes(
Iterable.generate(8, (_) => chars.codeUnitAt(random.nextInt(chars.length)))
)}';
}
static Future getToken() async {
final response = await http.post(
Uri.parse(tokenEndpoint!),
headers: {'Content-Type': 'application/json'},
body: jsonEncode({
'room': roomName,
'participant': participantName,
}),
);
if (response.statusCode == 200) {
return jsonDecode(response.body)['token'] as String;
}
throw Exception('Token server returned ${response.statusCode}');
}
}
```
### Main App
```dart main.dart
import 'package:flutter/material.dart';
import 'package:livekit_client/livekit_client.dart' as lk;
import 'package:livekit_components/livekit_components.dart';
import 'config/livekit_config.dart';
void main() => runApp(const BitHumanFlutterApp());
class BitHumanFlutterApp extends StatelessWidget {
const BitHumanFlutterApp({super.key});
@override
Widget build(BuildContext context) {
return MaterialApp(
title: 'bitHuman Flutter Integration',
theme: LiveKitTheme().buildThemeData(context),
themeMode: ThemeMode.dark,
home: const ConnectionScreen(),
);
}
}
class ConnectionScreen extends StatefulWidget {
const ConnectionScreen({super.key});
@override
State createState() => _ConnectionScreenState();
}
class _ConnectionScreenState extends State {
bool _isConnecting = false;
@override
void initState() {
super.initState();
WidgetsBinding.instance.addPostFrameCallback((_) => _connect());
}
Future _connect() async {
setState(() => _isConnecting = true);
final token = await LiveKitConfig.getToken();
if (!mounted) return;
Navigator.of(context).pushReplacement(
MaterialPageRoute(
builder: (_) => VideoRoomScreen(
url: LiveKitConfig.serverUrl,
token: token,
roomName: LiveKitConfig.roomName,
),
),
);
}
@override
Widget build(BuildContext context) {
return Scaffold(
backgroundColor: const Color(0xFF1a1a1a),
body: Center(
child: Column(
mainAxisAlignment: MainAxisAlignment.center,
children: [
const CircularProgressIndicator(),
const SizedBox(height: 20),
Text(
_isConnecting ? 'Connecting...' : 'Failed',
style: const TextStyle(color: Colors.white70, fontSize: 18),
),
],
),
),
);
}
}
class VideoRoomScreen extends StatelessWidget {
final String url, token, roomName;
const VideoRoomScreen({
super.key, required this.url, required this.token, required this.roomName,
});
@override
Widget build(BuildContext context) {
return LivekitRoom(
roomContext: RoomContext(
url: url,
token: token,
connect: true,
roomOptions: lk.RoomOptions(adaptiveStream: true, dynacast: true),
),
builder: (context, roomCtx) {
return Scaffold(
appBar: AppBar(title: Text('Room: $roomName')),
backgroundColor: const Color(0xFF1a1a1a),
body: const Center(child: Text('AI Avatar Video Here')),
);
},
);
}
}
```
---
## Platform-Specific Setup
### iOS (`ios/Runner/Info.plist`)
```xml
NSCameraUsageDescription
Camera access for video calls with AI avatar
NSMicrophoneUsageDescription
Microphone access for voice interaction with AI avatar
```
### Android (`AndroidManifest.xml`)
```xml
```
---
## Deployment
```bash iOS
flutter build ios --release
```
```bash Android
flutter build apk --release
```
```bash Web
flutter build web --release
```
---
## Troubleshooting
| Problem | Solution |
|---------|----------|
| Avatar session failed | Check bitHuman API secret and avatar ID |
| Connection failed | Verify LiveKit server URL, ensure backend is running |
| No camera found | Check device permissions |
| Avatar not showing | Check backend logs, verify API key |
| Shader compilation errors | Run `flutter clean && flutter pub get` |
---
## Resources
- [Flutter Documentation](https://docs.flutter.dev)
- [LiveKit Flutter SDK](https://pub.dev/packages/livekit_client)
- [LiveKit Agents Documentation](https://docs.livekit.io/agents)
---
# Changelog
## Changelog
URL: https://docs.bithuman.ai/changelog
## February 2026
### Expression Avatar v2 — Turbo VAE Decoder
- 2.5x faster VAE decode (32ms → 13ms) with distilled Turbo-VAED decoder
- Total pipeline: 103ms → 79ms per chunk (24% faster)
- Throughput: 233 → 305 FPS on H100
- Per-session TRT contexts eliminate concurrent session artifacts
### Self-Hosted GPU Container
- Published `sgubithuman/expression-avatar:latest` Docker image
- Supports up to 8 concurrent sessions per GPU
- Cold start ~50s, warm start 4-6s
- ~5 GB auto-downloaded model weights (cached in Docker volume)
### Developer Examples Overhaul
- Fixed Docker Compose env_file handling across all 4 example stacks
- Standardized `.env.example` files with section headers and inline help
- Expanded READMEs with architecture diagrams, config tables, verification steps
- Added `api/test.py` for zero-friction API credential validation
- Added `AGENTS.md` for AI coding agent discoverability
- Added `llms.txt` and `llms-full.txt` for AI documentation indexing
- Published OpenAPI specification
### REST API
- `POST /v1/agent/{code}/speak` — make avatar speak text in active sessions
- `POST /v1/agent/{code}/add-context` — inject silent background knowledge
- Improved error responses with consistent error codes and messages
### SDK & Plugin
- `livekit-plugins-bithuman` — Expression model support with `model="expression"`
- `bithuman.AvatarSession` — unified interface for cloud, CPU, and GPU modes
- Animal mode support for Essence avatars
---
## January 2026
### Essence Avatar
- CPU-only avatar rendering from `.imx` model files
- 25 FPS real-time on 4+ core machines
- Cross-platform: Linux, macOS (M2+), Windows (WSL)
### Platform API
- Agent generation from text prompts + image/video/audio
- Agent management (CRUD operations)
- File upload (URL and base64)
- Dynamics/gesture generation and triggering
### Integrations
- LiveKit Cloud Plugin
- Website embed (iframe with JWT)
- Webhooks (room.join, chat.push events)
- Flutter full-stack example
---
For feature requests and bug reports, visit our [GitHub](https://github.com/bithuman-product/examples/issues) or [Discord](https://discord.gg/ES953n7bPA).
---
# API Reference
## bitHuman REST API Reference
URL: https://docs.bithuman.ai/api-reference/overview
The bitHuman API lets you programmatically create, manage, and interact with avatar agents.
## Base URL
```
https://api.bithuman.ai
```
## Authentication
All requests require the `api-secret` header. Get your API secret from [www.bithuman.ai](https://www.bithuman.ai/#developer).
```http
api-secret: YOUR_API_SECRET
```
## Agent Identifiers
All endpoints use the **agent code** (e.g. `A91XMB7113`) to identify agents. This is the same value across all endpoints, referred to as `{code}`, `{agent_code}`, or `{agent_id}` depending on the endpoint.
You receive this code when you [generate an agent](/api-reference/agent-generation) or find it in the [bitHuman dashboard](https://www.bithuman.ai).
## Available APIs
Create new avatar agents from prompts, images, or video
Validate credentials, retrieve agent details, update prompts
Send real-time messages and inject context into live sessions
Upload images, audio, video, and documents
Generate and manage avatar movements and gestures
## Common Error Format
All errors follow the same structure:
```json
{
"error": {
"code": "ERROR_CODE",
"message": "Human-readable error description",
"httpStatus": 401
},
"status": "error",
"status_code": 401
}
```
See the [Error Reference](/api-reference/errors) for all error codes.
| HTTP Status | Meaning |
|-------------|---------|
| `200` | Success |
| `400` | Invalid request parameters |
| `401` | Invalid or missing `api-secret` |
| `402` | Insufficient credits |
| `404` | Resource not found |
| `413` | Payload too large |
| `415` | Unsupported media type |
| `422` | Validation error |
| `429` | Rate limit exceeded |
| `500` | Internal server error |
| `503` | Service unavailable (workers busy) |
## Agent Generation API: Create Avatar Agents
URL: https://docs.bithuman.ai/api-reference/agent-generation
## Generate Agent
```
POST /v1/agent/generate
```
Creates a new avatar agent. Generation is asynchronous — poll the status endpoint for completion.
**Headers**
| Header | Value |
|--------|-------|
| `Content-Type` | `application/json` |
| `api-secret` | Your API secret |
**Request Body**
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `prompt` | string | No | Random | System prompt for the agent |
| `image` | string | No | — | Image URL or base64 data for the agent's appearance |
| `video` | string | No | — | Video URL or base64 data |
| `audio` | string | No | — | Audio URL or base64 data for the agent's voice |
| `aspect_ratio` | string | No | `16:9` | Aspect ratio for image generation (`16:9`, `9:16`, `1:1`) |
| `video_aspect_ratio` | string | No | `16:9` | Aspect ratio for video generation (`16:9`, `9:16`, `1:1`) |
| `agent_id` | string | No | Auto-generated | Custom agent identifier |
| `duration` | number | No | `10` | Video duration in seconds |
**Response**
```json
{
"success": true,
"message": "Agent generation started",
"agent_id": "A91XMB7113",
"status": "processing"
}
```
**Example**
```python
import requests
response = requests.post(
"https://api.bithuman.ai/v1/agent/generate",
headers={
"Content-Type": "application/json",
"api-secret": "YOUR_API_SECRET"
},
json={
"prompt": "You are a professional video content creator.",
"image": "https://example.com/avatar.jpg"
}
)
print(response.json())
```
---
## Get Agent Status
```
GET /v1/agent/status/{agent_id}
```
Returns the current status of an agent generation request.
**Status Values**
| Status | Description |
|--------|-------------|
| `processing` | Agent is being generated (initial state) |
| `generating` | Active generation in progress (sub-steps running) |
| `completed` | All generation steps finished (transitional, becomes `ready`) |
| `ready` | Generation completed successfully — model available for use |
| `failed` | Generation failed — check `error_message` for details |
For polling, check for `ready` or `failed` as terminal states. The `generating` and `completed` states are intermediate — keep polling.
**Response**
```json
{
"success": true,
"data": {
"agent_id": "A91XMB7113",
"event_type": "lip_created",
"status": "ready",
"error_message": null,
"created_at": "2025-08-01T13:58:51.907177+00:00",
"updated_at": "2025-08-01T09:59:15.159901+00:00",
"system_prompt": "You are a professional video content creator.",
"image_url": "https://...",
"video_url": "https://...",
"name": "agent name",
"model_url": "https://..."
}
}
```
**Example**
```python
import requests
response = requests.get(
"https://api.bithuman.ai/v1/agent/status/A91XMB7113",
headers={"api-secret": "YOUR_API_SECRET"}
)
print(response.json())
```
---
## Complete Example: Generate and Poll
```python
import requests
import time
API_SECRET = "YOUR_API_SECRET"
BASE = "https://api.bithuman.ai"
headers = {"Content-Type": "application/json", "api-secret": API_SECRET}
# Generate agent
resp = requests.post(f"{BASE}/v1/agent/generate", headers=headers, json={
"prompt": "You are a friendly AI assistant."
})
agent_id = resp.json()["agent_id"]
print(f"Agent created: {agent_id}")
# Poll until ready
while True:
status = requests.get(
f"{BASE}/v1/agent/status/{agent_id}",
headers={"api-secret": API_SECRET}
).json()
if status["data"]["status"] == "ready":
print(f"Agent ready: {status['data']['model_url']}")
break
elif status["data"]["status"] == "failed":
print(f"Generation failed: {status['data']['error_message']}")
break
time.sleep(5)
```
## Error Codes
| HTTP Status | Meaning |
|-------------|---------|
| `200` | Success |
| `400` | Invalid request parameters |
| `401` | Invalid or missing `api-secret` |
| `429` | Rate limit exceeded |
| `500` | Internal server error |
## Agent Management API: Validate, Get & Update Agents
URL: https://docs.bithuman.ai/api-reference/agent-management
## Validate API Secret
```
POST /v1/validate
```
Verify that your API secret is valid before making other API calls.
```python Python
import requests
response = requests.post(
"https://api.bithuman.ai/v1/validate",
headers={"api-secret": "YOUR_API_SECRET"}
)
result = response.json()
if result["valid"]:
print("API secret is valid.")
else:
print("Invalid API secret.")
```
```javascript JavaScript
const response = await fetch('https://api.bithuman.ai/v1/validate', {
method: 'POST',
headers: { 'api-secret': 'YOUR_API_SECRET' }
});
const result = await response.json();
console.log('Valid:', result.valid);
```
**Response**
```json
{ "valid": true }
```
---
## Get Agent Info
```
GET /v1/agent/{code}
```
Retrieve detailed information about an agent by its code identifier.
**Path Parameters**
| Parameter | Type | Description |
|-----------|------|-------------|
| `code` | string | The agent code identifier (e.g., `A12345678`) |
**Response**
```json
{
"success": true,
"data": {
"agent_id": "A91XMB7113",
"event_type": "lip_created",
"status": "ready",
"error_message": null,
"created_at": "2025-08-01T13:58:51.907177+00:00",
"updated_at": "2025-08-01T09:59:15.159901+00:00",
"system_prompt": "You are a friendly AI assistant",
"image_url": "https://storage.supabase.co/image.jpg",
"video_url": "https://storage.supabase.co/video.mp4",
"name": "My Agent",
"model_url": "https://storage.supabase.co/model.imx"
}
}
```
```python Python
import requests
code = "A91XMB7113"
response = requests.get(
f"https://api.bithuman.ai/v1/agent/{code}",
headers={"api-secret": "YOUR_API_SECRET"}
)
data = response.json()
if data["success"]:
agent = data["data"]
print(f"Agent: {agent['name']}")
print(f"Status: {agent['status']}")
```
```javascript JavaScript
const code = 'A91XMB7113';
const response = await fetch(`https://api.bithuman.ai/v1/agent/${code}`, {
headers: { 'api-secret': 'YOUR_API_SECRET' }
});
const data = await response.json();
if (data.success) {
console.log('Agent:', data.data.name);
}
```
This endpoint uses the agent **code** (e.g., `A91XMB7113`), which is the same as the agent ID used across the platform. For checking generation progress, you can also use [`GET /v1/agent/status/{agent_id}`](/api-reference/agent-generation).
---
## Update Agent Prompt
```
POST /v1/agent/{code}
```
Update the system prompt of an existing agent without regenerating it.
**Request Body**
```json
{
"system_prompt": "You are a helpful customer service agent who speaks Spanish"
}
```
**Response**
```json
{
"agent_code": "A91XMB7113",
"updated": true
}
```
```python Python
import requests
code = "A91XMB7113"
response = requests.post(
f"https://api.bithuman.ai/v1/agent/{code}",
headers={
"Content-Type": "application/json",
"api-secret": "YOUR_API_SECRET"
},
json={
"system_prompt": "You are a professional sales assistant."
}
)
print(response.json())
```
```javascript JavaScript
const code = 'A91XMB7113';
const response = await fetch(`https://api.bithuman.ai/v1/agent/${code}`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'api-secret': 'YOUR_API_SECRET'
},
body: JSON.stringify({
system_prompt: 'You are a professional sales assistant.'
})
});
const result = await response.json();
console.log('Update result:', result);
```
---
## Complete Example
```python
import requests
import time
headers = {
"Content-Type": "application/json",
"api-secret": "YOUR_API_SECRET"
}
# Step 1: Create agent
response = requests.post(
"https://api.bithuman.ai/v1/agent/generate",
headers=headers,
json={"prompt": "You are a friendly greeter."}
)
agent_id = response.json()["agent_id"]
# Step 2: Wait for agent to be ready
while True:
status = requests.get(
f"https://api.bithuman.ai/v1/agent/status/{agent_id}",
headers={"api-secret": "YOUR_API_SECRET"}
).json()
if status["data"]["status"] == "ready":
break
time.sleep(5)
# Step 3: Get agent info
info = requests.get(
f"https://api.bithuman.ai/v1/agent/{agent_id}",
headers={"api-secret": "YOUR_API_SECRET"}
).json()
print(f"Current prompt: {info['data']['system_prompt']}")
# Step 4: Update the prompt
update = requests.post(
f"https://api.bithuman.ai/v1/agent/{agent_id}",
headers=headers,
json={"system_prompt": "You are now a technical support specialist."}
).json()
print(f"Prompt updated: {update}")
```
## Error Codes
| Code | Description |
|------|-------------|
| `UNAUTHORIZED` | Invalid or missing API secret |
| `MISSING_PARAM` | Required parameter not provided |
| `AGENT_NOT_FOUND` | No agent found with the given code |
| `VALIDATION_ERROR` | Invalid request body format |
## Agent Context API: Speak & Inject Knowledge
URL: https://docs.bithuman.ai/api-reference/agent-context
Send real-time messages to agents deployed on the [www.bithuman.ai](https://www.bithuman.ai) platform. Make agents speak proactively or inject background knowledge to improve their responses.
This API is for agents created on the bitHuman platform, not for local SDK agents.
## Make Agent Speak
```
POST /v1/agent/{agent_code}/speak
```
Triggers the agent to speak a message to users in the session.
**Request Body**
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `message` | string | Yes | Text the agent will speak |
| `room_id` | string | No | Target a specific room. If omitted, delivers to all active rooms. |
**Response**
```json
{
"agent_code": "A12345678",
"context_type": "speak",
"delivered_to_rooms": 1
}
```
**Example**
```python
import requests
response = requests.post(
"https://api.bithuman.ai/v1/agent/A12345678/speak",
headers={
"Content-Type": "application/json",
"api-secret": "YOUR_API_SECRET"
},
json={
"message": "We have a 20% discount available today.",
"room_id": "customer_session_1"
}
)
print(response.json())
```
---
## Add Context
```
POST /v1/agent/{agent_code}/add-context
```
Adds background knowledge the agent will use to inform future responses. Can also trigger speech by setting `type` to `speak`.
**Request Body**
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `context` | string | Yes | — | Context text or message to speak |
| `type` | string | No | `add_context` | `add_context` to inject knowledge silently, `speak` to trigger speech |
| `room_id` | string | No | — | Target a specific room. If omitted, delivers to all active rooms. |
**Response**
```json
{
"agent_code": "A12345678",
"context_type": "add_context",
"delivered_to_rooms": 1
}
```
### Adding background context
```python
import requests
response = requests.post(
"https://api.bithuman.ai/v1/agent/A12345678/add-context",
headers={
"Content-Type": "application/json",
"api-secret": "YOUR_API_SECRET"
},
json={
"context": "Customer has VIP status. Preferred name: Alex. Account since 2021.",
"type": "add_context",
"room_id": "vip_session_42"
}
)
```
### Triggering speech via add-context
```python
response = requests.post(
"https://api.bithuman.ai/v1/agent/A12345678/add-context",
headers={
"Content-Type": "application/json",
"api-secret": "YOUR_API_SECRET"
},
json={
"context": "Your issue has been resolved. Let me know if you need anything else.",
"type": "speak",
"room_id": "support_session_1"
}
)
```
## Error Codes
| HTTP Status | Error Code | Description |
|-------------|------------|-------------|
| `401` | `UNAUTHORIZED` | Invalid or missing `api-secret` |
| `404` | `AGENT_NOT_FOUND` | No agent with the given code exists |
| `404` | `NO_ACTIVE_ROOMS` | Agent has no active sessions |
| `422` | `VALIDATION_ERROR` | Invalid request body (e.g., bad `type` value) |
## File Upload API: Images, Video & Audio
URL: https://docs.bithuman.ai/api-reference/file-upload
Upload files to the system for processing. Supports both URL downloads and direct file uploads.
## Upload File
```
POST /v1/files/upload
```
Files are automatically organized by type:
| Category | Storage Path | Extensions |
|----------|-------------|------------|
| **Images** | `assets/image/` | `.jpg`, `.jpeg`, `.png`, `.gif`, `.webp`, `.bmp`, `.svg` |
| **Videos** | `assets/video/` | `.mp4`, `.avi`, `.mov`, `.wmv`, `.flv`, `.webm`, `.mkv` |
| **Audio** | `assets/audio/` | `.mp3`, `.wav`, `.flac`, `.aac`, `.ogg`, `.m4a` |
| **Documents** | `assets/docs/` | `.pdf`, `.doc`, `.docx`, `.txt`, `.ppt`, `.pptx`, `.xls`, `.xlsx`, `.csv` |
---
### Method 1: URL Upload
Download a file from a URL.
```json
{
"file_url": "https://example.com/document.pdf",
"file_type": "auto"
}
```
| Parameter | Type | Description |
|-----------|------|-------------|
| `file_url` | string | URL of the file to download |
| `file_type` | string | Type of file (`pdf`, `image`, `audio`, `video`, `auto`) |
```python
import requests
response = requests.post(
"https://api.bithuman.ai/v1/files/upload",
headers={
"Content-Type": "application/json",
"api-secret": "YOUR_API_SECRET"
},
json={
"file_url": "https://example.com/presentation.pdf",
"file_type": "auto"
}
)
print(response.json())
```
### Method 2: Direct Upload
Upload base64-encoded file data directly.
```json
{
"file_data": "JVBERi0xLjQKJcOkw7zDtsO...",
"file_name": "document.pdf",
"file_type": "auto"
}
```
| Parameter | Type | Description |
|-----------|------|-------------|
| `file_data` | string | Base64 encoded file data |
| `file_name` | string | Original filename |
| `file_type` | string | Type of file (`pdf`, `image`, `audio`, `video`, `auto`) |
```python
import requests
import base64
with open("document.pdf", "rb") as f:
file_data = base64.b64encode(f.read()).decode('utf-8')
response = requests.post(
"https://api.bithuman.ai/v1/files/upload",
headers={
"Content-Type": "application/json",
"api-secret": "YOUR_API_SECRET"
},
json={
"file_data": file_data,
"file_name": "document.pdf",
"file_type": "auto"
}
)
print(response.json())
```
---
### Response (both methods)
```json
{
"success": true,
"message": "File uploaded successfully",
"data": {
"file_url": "https://storage.supabase.co/assets/docs/20250115_103000_abc12345.pdf",
"original_source": "https://example.com/document.pdf",
"file_type": "auto",
"file_size": 1024000,
"mime_type": "application/pdf",
"asset_category": "docs",
"uploaded_at": "2025-01-15T10:30:00Z"
}
}
```
---
## Size Limits
| Category | Max Size |
|----------|----------|
| **Images** | 10 MB |
| **Videos** | 100 MB |
| **Audio** | 50 MB |
| **Documents** | 10 MB |
Exceeding these limits returns HTTP `413`.
## Upload Methods Comparison
| Method | Best For | Pros | Cons |
|--------|----------|------|------|
| **URL Upload** | External files, cloud storage | No file size limits, efficient | Requires accessible URL |
| **Direct Upload** | Local files, form uploads | Works with any file source | Limited by request size |
## Complete Examples
### Batch Upload
```python
import requests
import base64
from pathlib import Path
def batch_upload_files(directory_path):
results = []
for file_path in Path(directory_path).iterdir():
if file_path.is_file():
with open(file_path, "rb") as f:
file_data = base64.b64encode(f.read()).decode('utf-8')
response = requests.post(
"https://api.bithuman.ai/v1/files/upload",
headers={
"Content-Type": "application/json",
"api-secret": "YOUR_API_SECRET"
},
json={
"file_data": file_data,
"file_name": file_path.name,
"file_type": "auto"
}
)
results.append({
"filename": file_path.name,
"status": "success" if response.status_code == 200 else "error"
})
return results
results = batch_upload_files("./documents")
for r in results:
print(f"{r['filename']}: {r['status']}")
```
## Error Codes
| HTTP Status | Meaning |
|-------------|---------|
| `200` | Success |
| `400` | Bad request (invalid parameters) |
| `401` | Unauthorized (invalid API secret) |
| `413` | File too large |
| `415` | Unsupported file type |
| `500` | Internal server error |
## Dynamics API: Gestures & Animations
URL: https://docs.bithuman.ai/api-reference/dynamics
## Generate Dynamics
```
POST /v1/dynamics/generate
```
Generate dynamic movements and animations for an agent. Returns immediately with a "processing" status — use the GET endpoint to check completion.
**Request Body**
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `agent_id` | string | Yes | — | Agent ID to generate dynamics for |
| `image_url` | string | No | (from agent) | Agent image URL (fetched from agent data if not provided) |
| `duration` | number | No | `5` | Duration of each motion in seconds |
| `model` | string | No | `seedance` | Model to use (`seedance`, `kling`) |
**Response**
```json
{
"success": true,
"message": "Dynamics generation started",
"agent_id": "A91XMB7113",
"status": "processing"
}
```
```python
import requests
response = requests.post(
"https://api.bithuman.ai/v1/dynamics/generate",
headers={
"Content-Type": "application/json",
"api-secret": "YOUR_API_SECRET"
},
json={
"agent_id": "A91XMB7113",
"duration": 5,
"model": "seedance"
}
)
print(response.json())
```
---
## Get Dynamics
```
GET /v1/dynamics/{agent_id}
```
Retrieve the current dynamics configuration and available gestures for an agent.
**Response (dynamics generated)**
```json
{
"success": true,
"data": {
"url": "https://storage.supabase.co/dynamics-model.imx",
"status": "ready",
"agent_id": "A91XMB7113",
"gestures": {
"mini_wave_hello": "https://storage.supabase.co/mini_wave_hello.mp4",
"talk_head_nod_subtle": "https://storage.supabase.co/talk_head_nod_subtle.mp4",
"blow_kiss_heart": "https://storage.supabase.co/blow_kiss_heart.mp4"
}
}
}
```
**Response (not yet generated)**
```json
{
"success": true,
"data": {
"url": null,
"status": "ready",
"agent_id": "A91XMB7113",
"gestures": {}
}
}
```
**Response Fields**
| Field | Type | Description |
|-------|------|-------------|
| `url` | string \| null | URL to the dynamics model file, or null if not generated |
| `status` | string | `generating` while in progress, `ready` when complete |
| `agent_id` | string | The agent ID |
| `gestures` | object | Map of gesture action names to video URLs (e.g. `mini_wave_hello`, `talk_head_nod_subtle`) |
Gesture names like `mini_wave_hello` and `talk_head_nod_subtle` are the action identifiers you pass to `VideoControl(action=...)` or the RPC `trigger_dynamics` method. See [Avatar Sessions](/deployment/avatar-sessions#adding-gestures-dynamics) for integration examples.
```python
agent_id = "A91XMB7113"
response = requests.get(
f"https://api.bithuman.ai/v1/dynamics/{agent_id}",
headers={"api-secret": "YOUR_API_SECRET"}
)
print(response.json())
```
---
## Update Dynamics
```
PUT /v1/dynamics/{agent_id}
```
Update dynamics configuration for an agent. After a successful update, movements regeneration is automatically triggered in the background.
**Request Body**
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `dynamics` | object | Yes | Dynamics configuration to merge with existing data |
| `dynamics.enabled` | boolean | No | Enable or disable dynamics for this agent |
| `dynamics.batch_results` | object | No | Map of gesture names to video generation results |
| `dynamics.result` | object | No | Result model path and hash (set when dynamics generation completes) |
| `dynamics.talking` | object | No | Default talking model path and hash (used when dynamics are disabled) |
| `toggle_enabled` | boolean | No | `true` to switch to dynamics model, `false` to restore default talking model |
**Example: Enable dynamics after generation**
```json
{
"dynamics": {
"enabled": true
},
"toggle_enabled": true
}
```
**Response (with regeneration)**
```json
{
"success": true,
"message": "Dynamics updated successfully and movements regeneration started",
"agent_id": "A91XMB7113",
"regeneration_status": "started"
}
```
**Response (regeneration failed to start)**
```json
{
"success": true,
"message": "Dynamics updated successfully, but movements regeneration failed to start",
"agent_id": "A91XMB7113",
"regeneration_status": "failed",
"regeneration_error": "Connection refused"
}
```
---
## Gesture Names
When dynamics are generated, the available gestures use descriptive action names:
| Gesture Action | Category | Typical Use |
|----------------|----------|-------------|
| `mini_wave_hello` | wave | Greeting |
| `talk_head_nod_subtle` | nod | Agreement, acknowledgment |
| `blow_kiss_heart` | expression | Playful reaction |
| `laugh_react` | expression | Humor response |
| `idle_subtle` | idle | Background movement |
The exact gesture names depend on what was generated. Use `GET /v1/dynamics/{agent_id}` to discover available gestures for each agent.
## Configuration Options
**Duration Settings:**
- `1-3 seconds`: Quick gestures (waves, nods)
- `3-5 seconds`: Standard motions (default)
- `5-10 seconds`: Extended animations
**Model Options:**
- `seedance`: High-quality motion generation (default)
- `kling`: Alternative motion model
---
## Integration Example
```python
import requests
import time
headers = {"Content-Type": "application/json", "api-secret": "YOUR_API_SECRET"}
# Step 1: Create an agent
resp = requests.post(
"https://api.bithuman.ai/v1/agent/generate",
headers=headers,
json={"prompt": "You are a friendly customer service representative."}
)
agent_id = resp.json()["agent_id"]
# Step 2: Wait for agent to be ready
while True:
status = requests.get(
f"https://api.bithuman.ai/v1/agent/status/{agent_id}",
headers={"api-secret": "YOUR_API_SECRET"}
).json()
if status["data"]["status"] in ("ready", "failed"):
break
time.sleep(5)
# Step 3: Generate dynamics
resp = requests.post(
"https://api.bithuman.ai/v1/dynamics/generate",
headers=headers,
json={"agent_id": agent_id, "duration": 5, "model": "seedance"}
)
print("Dynamics generation started:", resp.json())
# Step 4: Check available gestures
time.sleep(30) # Wait for generation
resp = requests.get(
f"https://api.bithuman.ai/v1/dynamics/{agent_id}",
headers={"api-secret": "YOUR_API_SECRET"}
)
gestures = resp.json()["data"].get("gestures", {})
print(f"Available gestures: {list(gestures.keys())}")
```
## Error Codes
| HTTP Status | Meaning |
|-------------|---------|
| `200` | Success |
| `400` | Invalid parameters |
| `401` | Unauthorized |
| `402` | Insufficient credits |
| `404` | Agent not found |
| `500` | Internal server error |
## Rate Limits & Quotas
URL: https://docs.bithuman.ai/api-reference/rate-limits
## Request Limits
API endpoints are rate-limited to protect service quality. Limits are applied per API secret.
| Tier | Concurrent Sessions | Agent Generations/day |
|------|---------------------|----------------------|
| **Free** | 2 | 5 |
| **Pro** | 10 | 50 |
| **Enterprise** | Custom | Custom |
Check your current tier and usage at [www.bithuman.ai](https://www.bithuman.ai) > Developer section.
## Handling Errors
If you exceed limits or run out of credits, the API returns an error:
```json
{
"error": {
"code": "INSUFFICIENT_BALANCE",
"message": "Insufficient credits",
"httpStatus": 402
},
"status": "error",
"status_code": 402
}
```
Common status codes: `402` (no credits), `429` (rate limited), `503` (workers busy).
### Recommended Retry Strategy
Use exponential backoff with jitter:
```python
import time
import random
import requests
def api_request_with_retry(url, headers, max_retries=3):
for attempt in range(max_retries):
resp = requests.post(url, headers=headers)
if resp.status_code not in (429, 503):
return resp
# Exponential backoff with jitter
wait = (2 ** attempt) + random.uniform(0, 1)
time.sleep(wait)
return resp # Return last response if all retries exhausted
```
## Concurrency Limits
Avatar sessions have per-account concurrency limits:
| Resource | Limit | Notes |
|----------|-------|-------|
| **Cloud avatar sessions** | Based on tier | Active WebRTC sessions |
| **Agent generation** | 3 concurrent | Queued if exceeded |
| **Dynamics generation** | 2 concurrent | Queued if exceeded |
## Endpoint Guidelines
| Endpoint | Guidance | Notes |
|----------|----------|-------|
| `POST /v1/validate` | Lightweight | Use for health checks |
| `POST /v1/agent/generate` | Heavy | Triggers GPU pipeline, ~2-5 min |
| `GET /v1/agent/status/*` | Poll at 5s intervals | Avoid sub-second polling |
| `POST /v1/agent/*/speak` | Per active session | Agent must be in a room |
| `POST /v1/files/upload` | 10 MB image, 100 MB video | Size limits enforced |
| `POST /v1/dynamics/generate` | Heavy | Triggers video generation |
## Best Practices
Instead of polling `/v1/agent/status/{id}` in a loop, configure [webhooks](/integrations/webhooks) to get notified when generation completes.
Agent data rarely changes. Cache `GET /v1/agent/{code}` responses locally and refresh only when needed.
Keep avatar sessions alive between conversations instead of creating new ones. Session creation is the most expensive operation.
Use `POST /v1/validate` to verify your account is active before starting agent generation or dynamics creation.
## Need Higher Limits?
Contact us via [Discord](https://discord.gg/ES953n7bPA) or email for enterprise tier pricing with custom limits.
## Error Reference
URL: https://docs.bithuman.ai/api-reference/errors
## Error Response Format
All error responses follow a consistent format:
```json
{
"error": {
"code": "ERROR_CODE",
"message": "Human-readable description of what went wrong.",
"httpStatus": 401
},
"status": "error",
"status_code": 401
}
```
## HTTP Status Codes
| Status | Meaning | Common Cause |
|--------|---------|-------------|
| `200` | Success | Request completed |
| `400` | Bad Request | Invalid parameters or missing required fields |
| `401` | Unauthorized | Invalid or missing `api-secret` header |
| `404` | Not Found | Agent, resource, or endpoint doesn't exist |
| `413` | Payload Too Large | File exceeds size limit |
| `415` | Unsupported Media Type | File type not supported |
| `422` | Validation Error | Parameters are present but invalid |
| `429` | Rate Limited | Too many requests — see [Rate Limits](/api-reference/rate-limits) |
| `500` | Internal Error | Server-side error — retry or contact support |
| `503` | Service Unavailable | All workers busy — retry with backoff |
## Error Codes
### Authentication
| Code | HTTP | Message | Resolution |
|------|------|---------|------------|
| `UNAUTHORIZED` | 401 | Invalid API secret | Check your `api-secret` header value. Get a valid secret from [Developer Dashboard](https://www.bithuman.ai/#developer). |
| `MISSING_AUTH` | 401 | Missing api-secret header | Add `api-secret` header to your request. |
| `ACCOUNT_SUSPENDED` | 401 | Account suspended | Contact support via [Discord](https://discord.gg/ES953n7bPA). |
| `INSUFFICIENT_BALANCE` | 402 | Insufficient credits | Top up credits at [www.bithuman.ai](https://www.bithuman.ai). |
### Agent Operations
| Code | HTTP | Message | Resolution |
|------|------|---------|------------|
| `AGENT_NOT_FOUND` | 404 | Agent not found | Check the agent code. Use `POST /v1/validate` to verify your API secret has access. |
| `AGENT_PROCESSING` | 409 | Agent is still generating | Wait for generation to complete. Poll `/v1/agent/status/{id}`. |
| `AGENT_FAILED` | 400 | Agent generation failed | Check generation logs. Retry with different parameters. |
| `VALIDATION_ERROR` | 422 | prompt is required | Include all required fields. See endpoint documentation. |
| `NO_ACTIVE_ROOMS` | 404 | No active rooms for agent | The agent must be in an active LiveKit session for `/speak` and `/add-context`. |
### File Operations
| Code | HTTP | Message | Resolution |
|------|------|---------|------------|
| `FILE_TOO_LARGE` | 413 | File exceeds size limit | Images: 10 MB max. Videos: 100 MB max. Audio: 50 MB max. |
| `UNSUPPORTED_TYPE` | 415 | Unsupported file type | Supported: JPEG, PNG, WebP, MP4, WAV, MP3, OGG. |
| `DOWNLOAD_FAILED` | 400 | Could not download URL | Ensure the URL is publicly accessible and returns a valid file. |
### Dynamics
| Code | HTTP | Message | Resolution |
|------|------|---------|------------|
| `DYNAMICS_NOT_FOUND` | 404 | No dynamics for agent | Generate dynamics first with `POST /v1/dynamics/generate`. |
| `DYNAMICS_PROCESSING` | 409 | Dynamics still generating | Wait for generation to complete. |
### Session & Infrastructure
| Code | HTTP | Message | Resolution |
|------|------|---------|------------|
| `RATE_LIMITED` | 429 | Rate limit exceeded | Back off and retry. See [Rate Limits](/api-reference/rate-limits). |
| `NO_AVAILABLE_WORKERS` | 503 | All workers busy | Retry with exponential backoff (up to 5 times). |
| `SESSION_LIMIT` | 429 | Concurrent session limit reached | Wait for an existing session to end, or upgrade your tier. |
| `INTERNAL_ERROR` | 500 | Internal server error | Retry once. If persistent, report via [Discord](https://discord.gg/ES953n7bPA). |
## Handling Errors in Python
```python
import requests
resp = requests.post(
"https://api.bithuman.ai/v1/agent/generate",
headers={"api-secret": api_secret, "Content-Type": "application/json"},
json={"prompt": "You are a helpful assistant"},
)
if resp.status_code == 200:
result = resp.json()
print(f"Agent {result['agent_id']} is generating...")
elif resp.status_code == 401:
print("Invalid API secret. Check BITHUMAN_API_SECRET.")
elif resp.status_code == 429:
print("Rate limited. Wait a moment and retry.")
elif resp.status_code == 503:
print("Workers busy. Retry in a few seconds.")
else:
error = resp.json().get("error", {})
print(f"Error {error.get('code')}: {error.get('message')}")
```
## GPU Container Errors
The self-hosted expression-avatar container returns its own error responses:
| Endpoint | Error | Resolution |
|----------|-------|------------|
| `GET /health` | Connection refused | Container not started or still initializing |
| `GET /ready` | `503 Not Ready` | Model still loading (~50s cold start) or all session slots full |
| `POST /launch` | `401 Unauthorized` | Invalid `BITHUMAN_API_SECRET` in container env |
| `POST /launch` | `400 No face detected` | Image has no detectable face. Use a clear front-facing photo. |
| `POST /launch` | `503 No capacity` | All session slots in use. Wait or add more containers. |
For GPU container troubleshooting, see [Self-Hosted GPU](/deployment/self-hosted-gpu#troubleshooting).
---
# Examples
## bitHuman Code Examples: Audio, Microphone, AI Chat & More
URL: https://docs.bithuman.ai/examples/overview
---
## Platform API
Programmatic agent management -- no SDK or local runtime needed.
| Example | What It Does | Source |
|---------|-------------|--------|
| **Agent Management** | Validate credentials, get/update agents | [api/](https://github.com/bithuman-product/examples/tree/main/api) |
| **Agent Generation** | Create agents from prompt, poll status | [api/](https://github.com/bithuman-product/examples/tree/main/api) |
| **Dynamics** | Generate gestures, list available gestures | [api/](https://github.com/bithuman-product/examples/tree/main/api) |
## Avatar Integration
Four combinations of model type and deployment mode.
| Example | Model | Deployment | Source |
|---------|-------|------------|--------|
| **Essence + Cloud** | Essence (CPU) | bitHuman Cloud | [essence-cloud/](https://github.com/bithuman-product/examples/tree/main/essence-cloud) |
| **[Essence + Self-Hosted](/examples/audio-clip)** | Essence (CPU) | Your machine | [essence-selfhosted/](https://github.com/bithuman-product/examples/tree/main/essence-selfhosted) |
| **Expression + Cloud** | Expression (GPU) | bitHuman Cloud | [expression-cloud/](https://github.com/bithuman-product/examples/tree/main/expression-cloud) |
| **Expression + Self-Hosted** | Expression (GPU) | Your machine | [expression-selfhosted/](https://github.com/bithuman-product/examples/tree/main/expression-selfhosted) |
## Full-Stack & Integration Examples
| Example | What It Does | Source |
|---------|-------------|--------|
| **[Apple Local Agent](/examples/apple-local)** | 100% offline on macOS (Siri + Ollama) | [integrations/macos-offline/](https://github.com/bithuman-product/examples/tree/main/integrations/macos-offline) |
| **[Raspberry Pi](/examples/raspberry-pi)** | Edge deployment on Raspberry Pi | — |
| **Web UI** | Browser-based Gradio interface | [integrations/web-ui/](https://github.com/bithuman-product/examples/tree/main/integrations/web-ui) |
| **Java Client** | WebSocket streaming from Java | [integrations/java/](https://github.com/bithuman-product/examples/tree/main/integrations/java) |
| **Next.js UI** | Drop-in LiveKit web interface | [integrations/nextjs-ui/](https://github.com/bithuman-product/examples/tree/main/integrations/nextjs-ui) |
---
## Prerequisites
---
**New to bitHuman?** Start with [Essence + Cloud](https://github.com/bithuman-product/examples/tree/main/essence-cloud) -- the simplest setup with no models to download.
## Example: Play Audio Through a Talking Avatar (Python)
URL: https://docs.bithuman.ai/examples/audio-clip
A simple first example that works reliably.
## Quick Start
```bash
pip install bithuman --upgrade opencv-python sounddevice
```
```bash
export BITHUMAN_API_SECRET="your_secret"
export BITHUMAN_MODEL_PATH="/path/to/model.imx"
export BITHUMAN_AUDIO_PATH="/path/to/audio.wav" # optional
```
```bash
python examples/avatar-with-audio-clip.py
```
[View source code on GitHub](https://github.com/bithuman-product/examples/tree/main/essence-selfhosted)
- **Press `1`** — Play audio with avatar
- **Press `2`** — Stop playback
- **Press `q`** — Quit
---
## What It Does
1. Loads your audio file (WAV, MP3, M4A supported)
2. Creates synchronized avatar animation
3. Shows real-time video in OpenCV window
4. Plays audio through speakers with sounddevice
**Key features:**
- Smooth audio playback with buffering
- Real-time video display at 25 FPS
- Keyboard controls for interaction
- Supports multiple audio formats
---
## Command Line Options
```bash
# Use specific files
python examples/avatar-with-audio-clip.py \
--model /path/to/model.imx \
--audio-file /path/to/audio.wav \
--api-secret your_secret
# Use JWT token instead of API secret
python examples/avatar-with-audio-clip.py \
--token your_jwt_token \
--model /path/to/model.imx
```
| Option | Description |
|--------|-------------|
| `--model` | Path to .imx model file |
| `--audio-file` | Path to audio file |
| `--api-secret` | Your bitHuman API secret |
| `--token` | JWT token (alternative to API secret) |
| `--insecure` | Disable SSL verification (dev only) |
---
## Common Issues
| Problem | Solution |
|---------|----------|
| No audio playing | Install sounddevice: `pip install sounddevice`. Try WAV format. |
| Avatar not loading | Verify `BITHUMAN_API_SECRET` and `BITHUMAN_MODEL_PATH`. |
| Video choppy | Close other applications using GPU/CPU. |
| Controls not working | Click on the OpenCV window to focus it. |
---
## Technical Details
| Component | Specification |
|-----------|--------------|
| Audio sample rate | 16kHz (auto-converted) |
| Audio channels | Mono (stereo auto-converted) |
| Video resolution | 512x512 pixels |
| Frame rate | 25 FPS |
| Audio formats | WAV, MP3, M4A, FLAC |
---
## Next Steps
Real-time interaction with your voice
Full OpenAI voice chat with avatar
## Example: Real-Time Microphone to Avatar Lip-Sync
URL: https://docs.bithuman.ai/examples/microphone
Speak and see your avatar respond instantly.
## Quick Start
```bash
pip install bithuman --upgrade livekit-rtc livekit-agents
```
```bash
export BITHUMAN_API_SECRET="your_secret"
export BITHUMAN_MODEL_PATH="/path/to/model.imx"
```
```bash
python examples/avatar-with-microphone.py
```
[View source code on GitHub](https://github.com/bithuman-product/examples/tree/main/essence-selfhosted)
- **Speak into microphone** — Avatar animates in real-time
- **Stay quiet** — Avatar stops after silence timeout (3 seconds)
- **Press `q`** — Quit application
---
## What It Does
1. Captures audio from your default microphone
2. Creates real-time avatar animation as you speak
3. Shows live video using LocalVideoPlayer
4. Automatically detects voice activity and silence
**Key features:**
- Real-time audio processing at 24kHz
- Voice activity detection with configurable threshold (-40dB)
- Automatic silence detection (3-second timeout)
- Local audio/video processing (no web interface)
---
## Command Line Options
```bash
# Adjust volume and silence detection
python examples/avatar-with-microphone.py \
--volume 1.5 \
--silent-threshold-db -35
# Enable audio echo for testing
python examples/avatar-with-microphone.py --echo
```
| Option | Default | Description |
|--------|---------|-------------|
| `--model` | env | Path to .imx model file |
| `--api-secret` | env | Your bitHuman API secret |
| `--volume` | 1.0 | Audio volume multiplier |
| `--silent-threshold-db` | -40 | Silence threshold in dB |
| `--echo` | off | Enable audio echo for testing |
---
## Advanced Usage
```bash
# More sensitive (picks up quieter voices)
python examples/avatar-with-microphone.py --silent-threshold-db -50
# Less sensitive (only loud voices)
python examples/avatar-with-microphone.py --silent-threshold-db -30
# Boost quiet microphones
python examples/avatar-with-microphone.py --volume 2.0
```
---
## Common Issues
| Problem | Solution |
|---------|----------|
| No microphone input | Check microphone permissions in system settings |
| Avatar not responding | Speak louder or adjust `--silent-threshold-db` to lower value |
| Performance lag | Close other audio applications, use wired microphone |
| Audio echo/feedback | Don't use `--echo` flag, use headphones |
---
## Technical Details
| Component | Specification |
|-----------|--------------|
| Audio sample rate | 24kHz |
| Input | Mono microphone |
| Buffer | 240 samples per chunk (10ms) |
| Silence detection | -40dB threshold, 3s timeout |
---
## Next Steps
Full OpenAI voice chat with avatar
100% private on-device processing
## Example: AI Voice Chat with Avatar (OpenAI + LiveKit)
URL: https://docs.bithuman.ai/examples/ai-conversation
Complete chatbot with avatar that users can talk to on the web.
## Quick Start
```bash
pip install bithuman --upgrade livekit-agents openai
```
- **bitHuman**: [www.bithuman.ai](https://www.bithuman.ai)
- **OpenAI**: [openai.com](https://openai.com)
- **LiveKit**: [livekit.io](https://livekit.io) (free)
```bash
export BITHUMAN_API_SECRET="your_secret"
export BITHUMAN_MODEL_PATH="/path/to/model.imx"
export OPENAI_API_KEY="your_openai_key"
export LIVEKIT_API_KEY="your_livekit_key"
export LIVEKIT_API_SECRET="your_livekit_secret"
export LIVEKIT_URL="wss://your-project.livekit.cloud"
```
```bash
git clone https://github.com/livekit/agents-playground.git
cd agents-playground
npm install && npm run dev
```
[View source code on GitHub](https://github.com/bithuman-product/examples/tree/main/essence-selfhosted)
```bash Web streaming (recommended)
python examples/agent-livekit-openai.py dev
```
```bash Command line testing
python examples/agent-livekit-openai.py console
```
Go to `http://localhost:3000` and join a room to chat.
---
## What It Does
1. User speaks in browser
2. AI processes speech and responds intelligently
3. Avatar shows AI's response with dynamic movement
4. Works from any device with internet
**Built with:**
- **OpenAI GPT-4** for intelligent conversation
- **LiveKit** for web streaming
- **bitHuman** for avatar animation
---
## Run Modes
| Mode | Use Case | Description |
|------|----------|-------------|
| `dev` | Production | Connects to LiveKit for web browsers |
| `console` | Testing | Runs in terminal for debugging |
---
## Customization
Change the agent's personality by editing the `instructions`:
```python
agent=Agent(
instructions=(
"You are a helpful customer service assistant. "
"Be friendly, professional, and solve problems quickly."
)
)
```
**Example personalities:**
- **Tech Support**: "You are a patient tech expert who explains things simply"
- **Sales Assistant**: "You are an enthusiastic product advisor"
- **Teacher**: "You are an encouraging tutor who makes learning fun"
---
## Common Issues
| Problem | Solution |
|---------|----------|
| Agent won't start | Check all API keys are set |
| No audio in browser | Allow microphone permissions, try Chrome |
| Can't connect | Check LiveKit URL format: `wss://your-project.livekit.cloud` |
---
## Next Steps
Full privacy — speech never leaves your Mac
Edge deployment on IoT devices
## Example: 100% Local Avatar on macOS (Apple Silicon)
URL: https://docs.bithuman.ai/examples/apple-local
Full privacy — speech never leaves your Mac.
## Quick Start
- macOS 13+ (Apple Silicon recommended)
- Microphone permissions
```bash
pip install https://github.com/bithuman-product/examples/releases/download/v0.1/bithuman_voice-1.3.2-py3-none-any.whl
```
```bash
bithuman-voice serve --port 8091
```
macOS will ask for Speech permissions — approve this.
```bash
pip install bithuman --upgrade livekit-agents openai livekit-plugins-silero
```
```bash
export BITHUMAN_API_SECRET="your_secret"
export BITHUMAN_MODEL_PATH="/path/to/model.imx"
export LIVEKIT_API_KEY="your_livekit_key"
export LIVEKIT_API_SECRET="your_livekit_secret"
export LIVEKIT_URL="wss://your-project.livekit.cloud"
export OPENAI_API_KEY="your_openai_key" # Only for AI brain
```
[View source code on GitHub](https://github.com/bithuman-product/examples/tree/main/integrations/macos-offline)
```bash Web streaming
python examples/agent-livekit-apple-local.py dev
```
```bash Command line testing
python examples/agent-livekit-apple-local.py console
```
---
## What It Does
**Stays on your Mac:**
- Speech-to-text (Apple Speech Framework)
- Text-to-speech (Apple Voice Synthesis)
- Avatar animation (bitHuman)
- Voice activity detection (Silero)
**Uses internet:**
- Only AI conversation (OpenAI LLM)
**Privacy benefits:**
- Voice patterns never leave your device
- Apple's hardware-accelerated speech processing
- Full control over your data
---
## Make it 100% Private
For 100% local operation with no internet required, use the complete Docker setup:
[Complete macOS Offline Example](https://github.com/bithuman-product/examples/tree/main/integrations/macos-offline)
**What you get:**
- **Apple Speech Recognition** — Local STT
- **Apple Voices/Siri** — Local TTS
- **Ollama LLM** — Local language models (Llama 3.2)
- **bitHuman Avatar** — Real-time facial animation
- **LiveKit + Web UI** — Complete conversation interface
- **Zero Internet Dependency**
```bash
git clone https://github.com/bithuman-product/examples.git
cd examples/integrations/macos-offline
pip install https://github.com/bithuman-product/examples/releases/download/v0.1/bithuman_voice-1.3.2-py3-none-any.whl
bithuman-voice serve --port 8000
ollama run llama3.2:1b
docker compose up
# Access at http://localhost:4202
```
**Enterprise Offline Mode:** Contact bitHuman for offline tokens to eliminate all internet requirements for authentication and metering.
---
## Common Issues
| Problem | Solution |
|---------|----------|
| Voice service won't start | Check microphone permissions, enable "Speech Recognition" in Privacy & Security |
| No speech recognition | Restart `bithuman-voice` service, test with built-in dictation |
| Permission errors | Run voice service from Terminal (not IDE) |
---
## Performance
**Recommended specs:**
- M2+ Mac (M4 ideal)
- 16GB+ RAM
- macOS 13+
---
## Next Steps
Edge deployment on IoT devices
Simpler cloud-based setup
## Example: Avatar on Raspberry Pi (Edge / IoT / Kiosk)
URL: https://docs.bithuman.ai/examples/raspberry-pi
## Quick Start
- Raspberry Pi 4B (8GB RAM recommended)
- microSD card (32GB+, Class 10)
- USB microphone
- Stable internet connection
- **Separate computer** for web interface (recommended)
Use **Raspberry Pi OS (64-bit)** with Raspberry Pi Imager.
```bash
sudo apt update && sudo apt upgrade -y
sudo apt install python3.11 python3.11-venv -y
python3.11 -m venv bithuman-env
source bithuman-env/bin/activate
```
```bash
pip install bithuman --upgrade livekit-agents openai
sudo apt install portaudio19-dev -y
```
```bash
export BITHUMAN_API_SECRET="your_secret"
export BITHUMAN_MODEL_PATH="/home/pi/model.imx"
export LIVEKIT_API_KEY="your_livekit_key"
export LIVEKIT_API_SECRET="your_livekit_secret"
export LIVEKIT_URL="wss://your-project.livekit.cloud"
export OPENAI_API_KEY="your_openai_key"
export LOADING_MODE="SYNC" # Important for Pi performance
```
[View source code on GitHub](https://github.com/bithuman-product/examples/tree/main/essence-selfhosted)
```bash Web streaming (recommended)
python examples/agent-livekit-rasp-pi.py dev
```
```bash Command line testing
python examples/agent-livekit-rasp-pi.py console
```
For best results, run the web interface on a **separate computer**. Running both agent and web UI on the same Pi causes significant slowdown.
---
## What It Does
1. Runs avatar agent optimized for Raspberry Pi
2. Uses `SYNC` loading mode for memory efficiency
3. Connects to web browsers via LiveKit
4. Suited for always-on edge applications
**Pi-specific optimizations:**
- Synchronous model loading (`LOADING_MODE="SYNC"`)
- Lower memory limits (1500MB warning threshold)
- Single process mode for stability
- Extended initialization timeout (120s)
---
## Auto-start Service
Make it run automatically on boot:
```ini /etc/systemd/system/bithuman-agent.service
[Unit]
Description=bitHuman Avatar Agent
After=network.target
[Service]
Type=simple
User=pi
WorkingDirectory=/home/pi
Environment=LOADING_MODE=SYNC
Environment=BITHUMAN_API_SECRET=your_secret
Environment=BITHUMAN_MODEL_PATH=/home/pi/model.imx
ExecStart=/home/pi/bithuman-env/bin/python examples/agent-livekit-rasp-pi.py dev
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
```
```bash
sudo systemctl enable bithuman-agent
sudo systemctl start bithuman-agent
sudo systemctl status bithuman-agent
```
---
## Performance Tips
```bash
# Enable performance governor
echo 'performance' | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
# Disable unnecessary services
sudo systemctl disable bluetooth
sudo systemctl disable wifi # if using ethernet
```
- Use swap file for extra memory
- Store models on USB SSD if possible
- Monitor with `htop` or `free -h`
---
## Common Issues
| Problem | Solution |
|---------|----------|
| Out of memory | Use Pi 4B 8GB, enable swap: `sudo dphys-swapfile swapon` |
| Slow performance | Use ethernet, check CPU temp: `vcgencmd measure_temp` |
| Audio problems | Check USB mic: `arecord -l`, test: `arecord -d 5 test.wav` |
| Model loading timeout | Ensure `LOADING_MODE="SYNC"`, use faster storage |
---
## Hardware Add-ons
```python
import board
import adafruit_dht
# Add environmental awareness
dht = adafruit_dht.DHT22(board.D4)
temperature = dht.temperature
humidity = dht.humidity
```
---
## Next Steps
- **Add sensors** — Integrate environmental awareness
- **Add camera** — Use Pi camera for visual context
- **Scale up** — Deploy multiple Pi devices
- **Go local** — Replace OpenAI with local LLM
## Example: Self-Hosted LiveKit Agent with Gestures
URL: https://docs.bithuman.ai/examples/self-hosted-plugin
Use bitHuman agents in real-time applications with self-hosted deployment, featuring direct model file access and VideoControl-based gesture triggering.
## Quick Start
```bash
cd examples/self-hosted
pip install -r requirements.txt
```
- **API Secret**: [www.bithuman.ai](https://www.bithuman.ai/#developer)
- **Model File**: Download your `.imx` model from the platform
```bash
# bitHuman Configuration
BITHUMAN_API_SECRET=your_api_secret_here
BITHUMAN_MODEL_PATH=/path/to/your/avatar_model.imx
BITHUMAN_AGENT_ID=A31KJC8622 # Optional: for dynamics gestures
# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key_here
# LiveKit Configuration
LIVEKIT_URL=wss://your-livekit-server.com
LIVEKIT_API_KEY=your_livekit_api_key
LIVEKIT_API_SECRET=your_livekit_api_secret
```
```bash
python agent.py dev
```
---
## Basic Self-Hosted Agent
Standard avatar interactions without dynamics:
```python
from livekit.plugins import bithuman
bithuman_avatar = bithuman.AvatarSession(
api_secret=os.getenv("BITHUMAN_API_SECRET"),
model_path=os.getenv("BITHUMAN_MODEL_PATH"),
)
```
**Key features:**
- Direct model file access (`.imx` format)
- Native AsyncBithuman runtime integration
- High-performance streaming with VideoGenerator pattern
- Real-time audio/video processing
---
## Self-Hosted Agent with Dynamics
For reactive avatar gestures triggered by user speech keywords.
### Step 1: Get Available Gestures
Retrieve available gesture actions for your agent via the [Dynamics API](/api-reference/dynamics).
```python
import requests
agent_id = "A31KJC8622"
url = f"https://api.bithuman.ai/v1/dynamics/{agent_id}"
headers = {"api-secret": "YOUR_API_SECRET"}
response = requests.get(url, headers=headers)
dynamics_data = response.json()
if dynamics_data.get("success"):
gestures_dict = dynamics_data["data"].get("gestures", {})
available_gestures = list(gestures_dict.keys())
print(f"Available gestures: {available_gestures}")
# Example: ["mini_wave_hello", "talk_head_nod_subtle", "laugh_react"]
```
Gesture actions are user-defined and vary based on your agent's dynamics generation. Always check the API response to see what's available.
### Step 2: Set Up Keyword-to-Action Mapping
```python
from livekit.agents import AgentSession, JobContext, UserInputTranscribedEvent
from livekit.plugins import bithuman
from bithuman.api import VideoControl
import asyncio
import os
KEYWORD_ACTION_MAP = {
"laugh": "laugh_react",
"laughing": "laugh_react",
"haha": "laugh_react",
"funny": "laugh_react",
"hello": "mini_wave_hello",
"hi": "mini_wave_hello",
}
async def entrypoint(ctx: JobContext):
await ctx.connect()
await ctx.wait_for_participant()
bithuman_avatar = bithuman.AvatarSession(
api_secret=os.getenv("BITHUMAN_API_SECRET"),
model_path=os.getenv("BITHUMAN_MODEL_PATH"),
)
session = AgentSession(...)
await bithuman_avatar.start(session, room=ctx.room)
@session.on("user_input_transcribed")
def on_user_input_transcribed(event: UserInputTranscribedEvent):
if not event.is_final:
return
transcript = event.transcript.lower()
for keyword, action in KEYWORD_ACTION_MAP.items():
if keyword in transcript:
asyncio.create_task(
bithuman_avatar.runtime.push(VideoControl(action=action))
)
break
```
**How it works:**
1. Get available gestures from the Dynamics API
2. Map keywords to gesture action names
3. Listen for user speech via `user_input_transcribed` events
4. Trigger gestures via `VideoControl(action=action)`
Always verify that a gesture action exists in the API response before using it. Non-existent gestures will be silently ignored.
---
## Configuration
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `model_path` | string | Yes | Path to the `.imx` model file |
| `api_secret` | string | Yes | Authentication secret |
| `api_token` | string | No | Optional API token |
| `agent_id` | string | No | Agent ID for fetching dynamics gestures |
## Self-Hosted Advantages
- **Full Control** — Complete control over model files and deployment
- **Privacy** — Models stay on your infrastructure
- **Customization** — Modify and extend agent behavior
- **Performance** — Optimize for your specific hardware
- **Offline Capable** — Works without internet after initial setup
---
## Common Issues
| Problem | Solution |
|---------|----------|
| Model loading errors | Verify model file path and permissions |
| Memory issues | Minimum 4GB RAM, recommended 8GB+ |
| Gesture not triggering | Verify gesture name exists in dynamics API response |
| Connection issues | Verify LiveKit server URL and credentials |
---
## Model Requirements
| Specification | Value |
|---------------|-------|
| Format | `.imx` files |
| Minimum RAM | 4GB |
| Recommended RAM | 8GB+ |
| Initialization time | ~20 seconds |
| Frame rate | 25 FPS |
---
## Next Steps
Cloud-hosted deployment option
Configure gestures and animations