# bitHuman — Complete Documentation > Real-time avatar animation API. Turn any face image or pre-built .imx > model into a lifelike talking avatar with audio-driven lip sync. > Python SDK, REST API, LiveKit plugin, and Docker containers. Base URL: https://api.bithuman.ai Authentication: api-secret header on every request Python SDK: pip install bithuman LiveKit Plugin: pip install livekit-plugins-bithuman GPU Container: docker.io/sgubithuman/expression-avatar:latest Examples: https://github.com/bithuman-product/examples Dashboard: https://www.bithuman.ai --- # Getting Started ## bitHuman — Real-Time Avatar Animation API URL: https://docs.bithuman.ai/introduction bitHuman creates digital avatars that lip-sync to audio in real-time. Feed in audio — get back an animated face at 25 FPS. Use it to build AI companions, customer support avatars, virtual tutors, game NPCs, and anything that needs a visual character that speaks. **Three ways to run:** - **Cloud** — no GPU, no model files. Just an API secret. - **Self-Hosted CPU** — download an `.imx` model, run on any machine. - **Self-Hosted GPU** — any face image, 1.3B parameter model, 250+ FPS. ## Quick Start ```bash Docker (Recommended) git clone https://github.com/bithuman-product/examples.git cd examples/essence-cloud # Add your API keys to .env cp .env.example .env # Edit .env: set BITHUMAN_API_SECRET, BITHUMAN_AGENT_ID, and OPENAI_API_KEY docker compose up # Open http://localhost:4202 ``` ```python Python SDK from bithuman import AsyncBithuman # Create runtime runtime = await AsyncBithuman.create( model_path="avatar.imx", api_secret="your_api_secret" ) await runtime.start() # Push audio and get animated frames await runtime.push_audio(audio_bytes, sample_rate=16000) await runtime.flush() async for frame in runtime.run(): frame.bgr_image # numpy array (H, W, 3) frame.audio_chunk # synchronized audio output frame.end_of_speech # True when utterance ends ``` ## What Can You Build? Replace hold music with a talking avatar that answers questions. A patient virtual teacher that explains and adapts to students. Lobby kiosks and tablets that greet and direct visitors. Persistent characters with personality and memory. Dynamic characters that react to player actions with gestures. Batch-generate talking-head videos from scripts. [See all use cases with architecture patterns →](/getting-started/use-cases) ## Developer Guides Get an avatar running in 5 minutes REST API for agent generation and management 10+ working examples from basic to advanced ## Core SDK API | Method | Description | |--------|-------------| | `AsyncBithuman.create(model_path, api_secret)` | Initialize the avatar runtime | | `runtime.start()` | Begin processing | | `runtime.push_audio(data, sample_rate)` | Send audio for lip-sync | | `runtime.flush()` | Signal end of audio input | | `runtime.run()` | Async generator yielding video + audio frames | | `runtime.get_frame_size()` | Returns `(width, height)` of output | ## REST API | Endpoint | Method | Description | |----------|--------|-------------| | `/v1/validate` | POST | Verify API secret | | `/v1/agent/generate` | POST | Generate new avatar agent | | `/v1/agent/{code}` | GET/POST | Get or update agent | | `/v1/agent/{code}/speak` | POST | Make avatar speak text | | `/v1/agent/{code}/add-context` | POST | Inject silent knowledge | | `/v1/files/upload` | POST | Upload image/video/audio | | `/v1/dynamics/generate` | POST | Generate gesture animations | Base URL: `https://api.bithuman.ai` — Auth: `api-secret` header — [Full reference →](/api-reference/overview) ## Platform Support | Platform | Status | Notes | |----------|--------|-------| | **Linux (x86_64)** | Full Support | Production ready | | **Linux (ARM64)** | Full Support | Edge deployments | | **macOS (Apple Silicon)** | Full Support | M2+, M4 ideal | | **Windows** | Full Support | Via WSL | ## AI Agent Integration bitHuman provides `llms.txt` and an OpenAPI specification for AI coding agent discoverability: - **[llms.txt](/llms.txt)** — Curated documentation index for LLM consumption - **[llms-full.txt](/llms-full.txt)** — Complete documentation in single markdown file - **[OpenAPI Spec](/api-reference/openapi.yaml)** — Machine-readable API contract - **[AGENTS.md](https://github.com/bithuman-product/examples/blob/main/AGENTS.md)** — Repository-level agent instructions ## Quick Start: Real-Time Avatar API in 5 Minutes URL: https://docs.bithuman.ai/getting-started/quickstart ## 1. Get Credentials Create an account at [www.bithuman.ai](https://www.bithuman.ai) Go to the Developer page and copy your **API Secret**. Download an avatar model (`.imx` file) from [Community Models](https://www.bithuman.ai/#community). ## 2. Install ```bash pip install bithuman opencv-python --upgrade ``` ## 3. Run Your First Avatar You need a `.wav` audio file to drive the avatar. A sample `speech.wav` is included in each [example directory](https://github.com/bithuman-product/examples), or generate your own with any TTS service. ```python import asyncio import cv2 from bithuman import AsyncBithuman from bithuman.audio import load_audio, float32_to_int16 async def main(): # Initialize runtime = await AsyncBithuman.create( model_path="avatar.imx", api_secret="your_api_secret" ) await runtime.start() # Load and push audio audio, sr = load_audio("speech.wav") await runtime.push_audio( float32_to_int16(audio).tobytes(), sr ) await runtime.flush() # Display animated frames async for frame in runtime.run(): if frame.has_image: cv2.imshow("Avatar", frame.bgr_image) # numpy (H, W, 3) cv2.waitKey(1) asyncio.run(main()) ``` [Full working example on GitHub](https://github.com/bithuman-product/examples/tree/main/essence-selfhosted) --- ## Key Concepts | Concept | Description | |---------|-------------| | **Runtime** | `AsyncBithuman` instance that processes audio into video | | **push_audio** | Feed audio bytes — avatar lip-syncs in real-time | | **flush** | Signals end of audio input | | **run()** | Async generator that yields frames at 25 FPS | | **Frame** | Contains `.bgr_image` (numpy), `.audio_chunk`, `.end_of_speech` | --- ## Troubleshooting The SDK is not installed. Run: ```bash pip install bithuman --upgrade ``` Make sure you're using the correct Python environment (virtualenv, conda, etc.). Your API secret is invalid or missing. Check: 1. You copied the full secret from [Developer Dashboard](https://www.bithuman.ai/#developer) 2. The `api_secret` parameter or `BITHUMAN_API_SECRET` env var is set correctly 3. Your account is active with available credits Quick test: ```bash curl -X POST https://api.bithuman.ai/v1/validate \ -H "api-secret: YOUR_SECRET" ``` The avatar needs audio input to animate: 1. Ensure you're calling `push_audio()` with valid audio data 2. Call `flush()` after pushing all audio 3. Check that the audio is 16-bit PCM format (use `float32_to_int16()` helper) 4. Verify audio sample rate matches the file (typically 16000 or 44100) This is normal for the first session — the `.imx` model takes time to load and initialize. Subsequent sessions in the same process start instantly. To reduce perceived latency, keep the runtime alive between sessions instead of recreating it. The model file path is wrong. Check: 1. The `.imx` file exists at the path you specified 2. Use an absolute path if running from a different directory 3. Download a model from [Community Models](https://www.bithuman.ai/#community) if you don't have one --- ## Next Steps Play audio file through avatar (5 min) Real-time mic input (10 min) OpenAI voice chat (15 min) Or jump straight to the [Docker App](https://github.com/bithuman-product/examples/tree/main/essence-selfhosted) for a complete end-to-end setup. ### Guides - **[Prompt Guide](/getting-started/prompts)** — Master the CO-STAR framework for avatar personality - **[Media Guide](/getting-started/media-guide)** — Upload voice, image, and video assets - **[Animal Mode](/getting-started/animal-mode)** — Create animal avatars ### System Requirements - Python 3.9+, 4+ CPU cores, 8GB RAM - macOS (M2+), Linux (x64/ARM64), or Windows (WSL) ## How bitHuman Works: Audio-to-Avatar Architecture URL: https://docs.bithuman.ai/getting-started/how-it-works ## The Big Picture A bitHuman avatar is a virtual character that moves its lips, face, and body in real-time based on audio input. Here's what happens when someone talks to an avatar: ``` You speak into a microphone ↓ Audio is sent to an AI agent (like ChatGPT) ↓ The AI generates a text response ↓ Text is converted to speech (TTS) ↓ bitHuman animates the avatar's face to match the speech ↓ You see a lifelike avatar talking back to you ``` All of this happens in real-time — fast enough for a natural conversation. --- ## Key Concepts An `.imx` file is a pre-built avatar model. It contains everything needed to animate a specific character: face data, lip-sync mappings, and appearance information. Think of it like a "character file" in a video game — it defines what the avatar looks like and how it moves. You can create your own avatar from any photo or video using the [bitHuman dashboard](https://www.bithuman.ai), or download community models. A **room** is a virtual meeting space where participants communicate in real-time using audio and video — similar to a Zoom or Google Meet call. In a bitHuman session, the room typically has: - **Your user** — the person talking to the avatar - **An AI agent** — handles conversation logic (speech-to-text, AI response, text-to-speech) - **The avatar** — renders animated video frames based on the agent's speech LiveKit is the open-source platform that powers this real-time communication. You don't need to understand LiveKit deeply — bitHuman handles the complex parts. An **AvatarSession** is the main integration point. It connects your AI agent to a bitHuman avatar inside a LiveKit room. When you create an `AvatarSession`, bitHuman: 1. Loads the avatar model (cloud or local) 2. Joins the LiveKit room as a participant 3. Listens for audio from your AI agent 4. Generates animated video frames in real-time 5. Publishes the video back to the room You interact with just a few lines of code — the session handles everything else. Your **API secret** is the key that authenticates your application with bitHuman services. You can create one from the [Developer Dashboard](https://www.bithuman.ai/#developer). It's used for: - Verifying your identity - Tracking usage and billing - Downloading cloud avatar models --- ## Which Approach Should I Use? Start here: - **No GPU?** → Use **Cloud Plugin** (easiest) or **Self-Hosted CPU** (most private) - **Have a GPU?** → Use **Self-Hosted GPU** for dynamic face images without pre-built models - **Want the fastest setup?** → Cloud Plugin — just an API secret and agent ID - **Need privacy?** → Self-Hosted CPU — audio never leaves your machine | | Cloud Plugin | Self-Hosted CPU | Self-Hosted GPU | |---|---|---|---| | **Setup time** | ~2 min | ~5 min | ~10 min | | **GPU required** | No | No | Yes (8 GB+ VRAM) | | **Privacy** | Audio sent to cloud | Audio stays local | Audio stays local | | **Avatar source** | Pre-built agent ID | `.imx` model file | Any face image | | **Best for** | Web apps, quick demos | Edge, offline, privacy | Dynamic faces, high volume | ## Three Ways to Use bitHuman Choose the approach that fits your project: **Easiest.** Avatar runs on bitHuman's servers. No model files to manage. Just provide an Agent ID and API secret. Best for: getting started quickly, web apps, and production deployments. **Most private.** Avatar runs on your machine. Download an `.imx` model and run locally. Works offline after setup. Best for: privacy-sensitive apps, edge devices, custom deployments. **Most flexible.** GPU container on your infrastructure. Use any face image to create avatars on-the-fly. No pre-built models needed. Best for: dynamic avatars, high-volume, full infrastructure control. --- ## How the Avatar Joins a Room Here's what happens step-by-step when an avatar session starts: Your AI agent (the code you write) connects to a LiveKit room and waits for a user to join. This is where the conversation will happen. In your agent code, you create a `bithuman.AvatarSession` with either a cloud `avatar_id` or a local `model_path`. This tells bitHuman which avatar to use. When you call `avatar.start(session, room=ctx.room)`, bitHuman: - **Cloud mode:** Sends a request to bitHuman's servers, which launch an avatar worker that joins your room - **Self-hosted mode:** Loads the `.imx` model locally and starts generating frames The avatar joins the LiveKit room as a video participant. Users in the room see the avatar's video feed — a lifelike face that moves and speaks. As your AI agent produces speech audio, the avatar animates in real-time: - Audio from TTS flows to the avatar - The avatar lip-syncs and generates video frames at 25 FPS - Video is published to the room for all participants to see ### Visual Flow ``` ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ Your User │ │ AI Agent │ │ Avatar │ │ (browser) │ │ (your code) │ │ (bitHuman) │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ │ │ │ User speaks │ │ │ ──────────────────>│ │ │ │ │ │ AI processes │ │ │ & responds │ │ │ │ TTS audio │ │ │ ──────────────────>│ │ │ │ │ │ Animated video │ │<───────────────────│<───────────────────│ │ │ │ │ User sees avatar │ │ │ speaking │ │ └────────────────────┴────────────────────┘ LiveKit Room ``` --- ## What You Need | Component | What it is | Where to get it | |-----------|-----------|-----------------| | **API Secret** | Authenticates your app | [Developer Dashboard](https://www.bithuman.ai/#developer) | | **Avatar Model** | The character to animate | [Community Models](https://www.bithuman.ai/#community) or create your own | | **LiveKit Server** | Real-time communication | [LiveKit Cloud](https://cloud.livekit.io) (free tier) or self-hosted | | **AI Agent** | Conversation logic | Your code + an LLM (OpenAI, Anthropic, etc.) | --- ## Next Steps Get an avatar running in 5 minutes Complete guide to all avatar session modes ## Use Cases URL: https://docs.bithuman.ai/getting-started/use-cases ## What Can You Build? bitHuman turns audio into a real-time talking avatar. Anywhere you need a visual character that speaks — that's where bitHuman fits. --- ## Customer Support Avatar Replace hold music with a face. An avatar greets visitors, answers FAQs, and escalates to a human when needed. **Architecture:** Website embed (iframe) → bitHuman cloud → OpenAI for conversation ```python avatar = bithuman.AvatarSession( avatar_id="YOUR_SUPPORT_AGENT", api_secret=os.getenv("BITHUMAN_API_SECRET"), ) ``` **Best deployment:** [Cloud Plugin](/deployment/livekit-cloud-plugin) for fastest setup, or [Website Embed](/integrations/embed) for dropping into existing pages. --- ## AI Tutor / Virtual Teacher A patient avatar that explains concepts, answers questions, and adapts to the student's pace. Lip-sync creates presence that text chat can't match. **Architecture:** LiveKit room → AI agent (GPT-4 + domain knowledge) → bitHuman avatar **Best deployment:** [Cloud Plugin](/deployment/livekit-cloud-plugin) with custom system prompt via [Prompt Guide](/getting-started/prompts). --- ## Digital Receptionist / Kiosk A lobby screen or tablet that greets visitors, provides directions, and handles check-in. Runs on a Raspberry Pi or any Linux machine. **Architecture:** Kiosk browser → LiveKit → AI agent → bitHuman (self-hosted CPU) **Best deployment:** [Self-Hosted CPU](/examples/self-hosted-plugin) for offline capability, or [Raspberry Pi](/examples/raspberry-pi) for dedicated hardware. --- ## AI Companion / Virtual Friend A character with a persistent personality that remembers past conversations. Use [context injection](/api-reference/agent-context) to maintain relationship state. **Architecture:** Mobile app (Flutter/React) → LiveKit → AI agent with memory → bitHuman **Best deployment:** [Cloud Plugin](/deployment/livekit-cloud-plugin) + [Flutter integration](/integrations/flutter). --- ## Game NPC / Interactive Character Non-player characters that respond dynamically to player actions. Use [dynamics](/api-reference/dynamics) for gestures (wave, nod, laugh) triggered by game events. **Architecture:** Game client → WebSocket → AI agent → bitHuman (GPU for custom faces) **Best deployment:** [Self-Hosted GPU](/deployment/self-hosted-gpu) for dynamic face generation from character art. --- ## Accessibility Tool Give a face and voice to text-to-speech output. Visual lip-sync helps hearing-impaired users follow along. Audio output helps visually impaired users interact with content. **Architecture:** Screen reader / TTS → bitHuman SDK → overlay window **Best deployment:** [Python SDK directly](/deployment/avatar-sessions#using-the-sdk-without-livekit) (no LiveKit needed). --- ## Content Creation / Video Generation Generate talking-head videos from scripts without recording. Batch-process audio files through avatars to create training videos, announcements, or social content. **Architecture:** Script → TTS → bitHuman SDK → video frames → ffmpeg → MP4 ```python runtime = await AsyncBithuman.create(model_path="presenter.imx", api_secret="...") await runtime.start() await runtime.push_audio(tts_audio, sample_rate=16000) await runtime.flush() # Collect frames and encode to video frames = [] async for frame in runtime.run(): if frame.has_image: frames.append(frame.bgr_image) ``` **Best deployment:** [Python SDK directly](/deployment/avatar-sessions#using-the-sdk-without-livekit) with [Self-Hosted GPU](/deployment/self-hosted-gpu) for custom faces. --- ## Choosing the Right Deployment | Use Case | Recommended Deployment | Why | |----------|----------------------|-----| | Customer support | Cloud Plugin | Fast setup, scales automatically | | AI tutor | Cloud Plugin | Low latency, no infrastructure | | Receptionist kiosk | Self-Hosted CPU | Offline capable, privacy | | AI companion | Cloud Plugin + Flutter | Mobile-friendly, cross-platform | | Game NPC | Self-Hosted GPU | Dynamic faces, low latency | | Accessibility | Python SDK | Lightweight, no WebRTC overhead | | Content creation | Python SDK + GPU | Batch processing, custom faces | --- ## Next Steps Run your first avatar in 5 minutes Complete deployment guide Working code for every use case ## Avatar Prompt Engineering: CO-STAR Framework URL: https://docs.bithuman.ai/getting-started/prompts Learn the structure that won Singapore's GPT-4 prompt engineering competition. ## The CO-STAR Framework The **CO-STAR framework** is an award-winning method for creating effective prompts. It considers all key aspects that influence an AI's response quality. ### C - Context **Provide background information.** Give your avatar the setting and situation they need to understand. ```text CONTEXT: You are working as a customer service representative for a tech company. Customers often call frustrated with technical issues. ``` ### O - Objective **Define the specific task.** Be crystal clear about what you want your avatar to accomplish. ```text OBJECTIVE: Help customers solve their technical problems while making them feel heard and valued. Always aim to resolve issues on the first interaction. ``` ### S - Style **Specify the communication style.** This could be like a famous person, profession, or communication approach. ```text STYLE: Communicate like an experienced Apple Genius Bar technician - knowledgeable but approachable, using analogies to explain technical concepts. ``` ### T - Tone **Set the emotional attitude.** Define how your avatar should "feel" in their responses. ```text TONE: Patient, empathetic, and solution-focused. Remain calm even when customers are frustrated. ``` ### A - Audience **Identify who they're talking to.** Tailor responses to the specific audience characteristics. ```text AUDIENCE: Everyday technology users with varying technical skill levels, from beginners to intermediate users. ``` ### R - Response **Specify the output format.** Define exactly how responses should be structured. ```text RESPONSE: Always follow this format: 1. Acknowledge the customer's concern 2. Ask one clarifying question if needed 3. Provide step-by-step solution 4. Confirm understanding 5. Offer additional help ``` --- ## Complete CO-STAR Examples ### E-commerce Assistant ```text CONTEXT: You work for an online fashion retailer during the busy holiday season. Customers are shopping for gifts and need quick, helpful guidance. OBJECTIVE: Help customers find the perfect products for their needs and guide them through purchase decisions confidently. STYLE: Like a knowledgeable personal shopper at a high-end boutique - attentive, stylish, and detail-oriented. TONE: Enthusiastic, helpful, and fashion-forward while being respectful of different budgets and styles. AUDIENCE: Online shoppers aged 25-45 looking for clothing and accessories, with varying fashion knowledge and budget ranges. RESPONSE: - Start with a warm greeting - Ask 2-3 targeted questions about their needs - Suggest 3 specific product options with reasons - Mention current promotions if relevant - End with "How else can I help you today?" ``` ### Educational Tutor ```text CONTEXT: You are an online tutor helping high school students with mathematics during exam preparation season. Students are stressed and need both academic and emotional support. OBJECTIVE: Explain mathematical concepts clearly, help solve specific problems, and build student confidence in their abilities. STYLE: Like an award-winning high school teacher who makes complex topics accessible - using real-world examples and breaking down problems step-by-step. TONE: Encouraging, patient, and supportive. Celebrate small victories and reframe mistakes as learning opportunities. AUDIENCE: High school students (ages 14-18) with varying math abilities, some struggling with confidence and test anxiety. RESPONSE: - Acknowledge their question/concern - Break complex problems into smaller steps - Use encouraging phrases like "Great question!" or "You're on the right track!" - Provide visual or real-world analogies when possible - End with a confidence-building statement ``` ### Healthcare Assistant ```text CONTEXT: You work for a telehealth platform where patients schedule appointments and ask general health questions. You cannot provide medical diagnoses but can offer guidance and support. OBJECTIVE: Help patients understand their symptoms, schedule appropriate care, and provide reassurance while maintaining appropriate medical boundaries. STYLE: Like an experienced nurse practitioner - knowledgeable, professional, but warm and approachable in explanations. TONE: Compassionate, professional, and reassuring while being appropriately cautious about medical advice. AUDIENCE: Patients of all ages with varying health literacy levels, often anxious about their symptoms or conditions. RESPONSE: - Express empathy for their concern - Provide general health education when appropriate - Always recommend consulting healthcare providers for medical advice - Offer to help schedule appointments - Use clear, non-medical language ``` --- ## Tips for CO-STAR Success ### Do This **Be Specific in Context** ```text Bad: "You work in customer service" Good: "You work as a Level 2 technical support specialist for a cloud software company, handling escalated cases from customers who've already tried basic troubleshooting" ``` **Use Professional Examples in Style** ```text Bad: "Be professional" Good: "Communicate like a McKinsey consultant -- structured, data-driven, and confident while remaining accessible to non-experts" ``` **Define Clear Response Formats** ```text Bad: "Give helpful responses" Good: "Always structure responses as: Problem Summary | Root Cause Analysis | 3 Recommended Solutions | Next Steps" ``` ### Avoid This - **Vague objectives** — "Be helpful" vs "Increase customer satisfaction scores by resolving issues in under 5 minutes" - **Conflicting tones** — Don't mix "professional" with "casual and fun" - **Unclear audiences** — "Everyone" vs "Small business owners with 10-50 employees" - **Missing context** — Jumping straight to objectives without setting the scene --- ## Quick CO-STAR Template Use this template for any avatar: ```text CONTEXT: [Describe the situation/setting where your avatar operates] OBJECTIVE: [What specific goal should your avatar achieve?] STYLE: [How should they communicate? Like which profession/person?] TONE: [What emotional attitude should they convey?] AUDIENCE: [Who are they talking to? Demographics/characteristics?] RESPONSE: [What format/structure should responses follow?] ``` --- ## Next Steps 1. **Write your CO-STAR prompt** using the template above 2. **Test with sample conversations** to refine it 3. **Try it in the [Examples](/examples/overview)** to see it in action ## Media Upload Guide: Images, Video & Audio for Avatars URL: https://docs.bithuman.ai/getting-started/media-guide Learn how to prepare and upload media for optimal avatar generation results. --- ## Image Upload **Perfect for**: Facial likeness and character appearance ### Requirements | Requirement | Value | |-------------|-------| | File Size | Less than 10MB | | Characters | One person only | | Position | Centered in frame | | Orientation | Front-facing | | Expression | Calm and gentle | | Quality | High resolution, well-lit | ### Best Practices - **Good lighting** — avoid shadows on face - **Clear focus** — sharp, not blurry - **Solo shots** — no other people visible - **Neutral expression** — avoid extreme emotions - **Professional quality** — passport-style photos work well --- ## Video Upload **Perfect for**: Movement patterns and dynamic expressions ### Requirements | Requirement | Value | |-------------|-------| | Duration | Less than 30 seconds | | Characters | One person only | | Position | Centered in frame | | Movement | Minimal distracting movement | | Quality | High resolution, stable footage | ### Best Practices - **Stable camera** — use tripod if possible - **Consistent framing** — keep character centered - **Subtle movements** — gentle head movements, natural blinking - **Good lighting** — consistent throughout video - **Audio optional** — focus on visual quality --- ## Voice Upload **Perfect for**: Voice cloning and personalized speech patterns ### Requirements | Requirement | Value | |-------------|-------| | Duration | Less than 1 minute | | Quality | Clear voice, no background noise | | Format | MP3, WAV, or M4A | | Content | Natural speech in your target language | ### Best Practices - Record in a quiet environment - Use a good quality microphone - Speak naturally and clearly - Avoid music or sound effects - Include varied sentences for better voice modeling --- ## Media Priority System Understanding how different uploads influence and overwrite each other: ```mermaid graph TD subgraph "User Uploads" A[Prompt
Character Description] B[Image
Face/Appearance] C[Video
Face + Movement] D[Voice
Speech Audio] end subgraph "Likeness Generation" E{Video
Uploaded?} E -->|Yes| F[Video OVERWRITES Image
Uses video for likeness] E -->|No| G{Image
Uploaded?} G -->|Yes| H[Image for Likeness
Auto-generates persona
Prompt becomes optional] G -->|No| I[Prompt-Only
Generates appearance
from description] end subgraph "Voice Generation" J{Voice
Uploaded?} J -->|Yes| K[Uses Uploaded Voice
Clones speech patterns] J -->|No| L[Auto-Generated Voice
Matches persona/appearance] end subgraph "Final Result" M[Complete Avatar
Likeness + Voice + Personality] end A --> E B --> E C --> E F --> J H --> J I --> J K --> M L --> M ``` ### Key Priority Rules 1. **Video > Image** — Video always overwrites image for likeness 2. **Image = Auto-Prompt** — Images auto-generate persona, making manual prompts optional 3. **Voice** — When uploaded, replaces auto-generated voice 4. **Prompt** — Required only when no image/video provided ### Upload Combinations | Combination | What Happens | |-------------|-------------| | **Prompt Only** | Generates likeness, voice, and movement from text description | | **Image Only** | Uses image for likeness, auto-generates persona and voice | | **Voice + Image** | Image for likeness, voice for speech patterns | | **Video + Voice + Prompt** | Full character control — video for likeness, voice for speech, prompt for personality | --- ## Best Practices **Start simple.** Upload an image for instant results, or use prompts for creative characters. You can always add voice or refine later. **Recommended Approaches:** - **Prompts Only** — Good for creative/fictional characters - **Image Only** — Instant avatar from photo (no prompt needed) - **Image + Voice** — Realistic character recreation **Common Issues and Fixes:** | Issue | Fix | |-------|-----| | Poor lighting in images/videos | Use photo editing to improve lighting | | Background noise in audio | Record audio in quiet spaces | | Multiple people in frame | Crop images to show only target person | | Excessive movement in videos | Keep movements subtle and natural | ## Animal Avatars: Create Talking Animal Characters URL: https://docs.bithuman.ai/getting-started/animal-mode Transform animals into interactive avatars using animal mode. --- ## Available Animal Characters Use prompts to generate or upload your own pet photos: **Capybara** **Cat in Hat** **Rainbow Creature** **Koala** **Pixar Turtle** **Bunny with Glasses** **White Bunny** **English Sheepdog** **Teddy Bear** **Character D12** **Character D7** **Fluffy Creature** --- ## Automatic Face Detection The AI system automatically locates the character's face and body in animal images, enabling natural movement and expression mapping. **What works automatically:** - Eye tracking for natural gaze - Mouth detection for speech sync - Expression mapping for emotions - Facial landmarks for precise animation --- ## Manual Face Marking When the AI cannot locate facial features automatically, you'll be prompted to manually mark key points. ### Marking Process When the "Help Needed" prompt appears, click the **Mark Face** button. Draw a rectangle around the entire facial area — eyes, nose, mouth, and chin. The system will extract facial landmarks from your selection. The rectangle should cover all key features: both eyes, nose, mouth, and chin. No need to select individual points — just one bounding rectangle. --- ## Best Practices **For optimal results:** - **Clear facial features** — ensure eyes, nose, mouth are visible - **Front-facing pose** — straight-on view works best - **Good contrast** — features should stand out from background - **High resolution** — more detail means better detection **Troubleshooting:** | Problem | Solution | |---------|----------| | Face not detected | Use a front-facing photo with clear eyes, nose, and mouth visible | | Poor lip-sync | Try a higher-resolution image with more contrast around the mouth | | Unnatural movement | Avoid side profiles — straight-on views work best | **Tips:** - Start with the pre-built animals above for guaranteed compatibility - Use well-lit, high-contrast images - For custom pets, crop the image so the face fills most of the frame - Test with simple expressions first --- ## Getting Started Pick an animal character from the grid above The system automatically attempts face detection Manually mark facial points if prompted Your interactive animal avatar is ready --- # Deployment ## Avatar Sessions: Cloud, CPU & GPU Deployment Guide URL: https://docs.bithuman.ai/deployment/avatar-sessions An **AvatarSession** is how you bring a bitHuman avatar into a LiveKit room. This guide covers every way to do it, with complete working examples. **New to bitHuman?** Start with [How It Works](/getting-started/how-it-works) to understand the core concepts first. --- ## Choose Your Approach | Approach | Best For | Model Files | GPU Required | Internet Required | |----------|----------|-------------|--------------|-------------------| | [Cloud Plugin](#cloud-plugin) | Getting started, web apps | No | No | Yes | | [Self-Hosted CPU](#self-hosted-cpu) | Privacy, edge devices | Yes (.imx) | No | Only for auth | | [Self-Hosted GPU](#self-hosted-gpu) | Dynamic faces, custom images | No (uses images) | Yes | Only for auth | --- ## Prerequisites All approaches need these basics: You also need a LiveKit server. If you don't have one: ```bash # Option 1: LiveKit Cloud (easiest) # Sign up at https://cloud.livekit.io — free tier available # Option 2: Self-hosted LiveKit docker run --rm -p 7880:7880 -p 7881:7881 -p 7882:7882/udp \ livekit/livekit-server --dev ``` --- ## Cloud Plugin The cloud plugin runs the avatar on bitHuman's servers. You just provide an Agent ID and API secret — no model files, no GPU. ### Complete Working Example ```python import asyncio import os from livekit.agents import ( Agent, AgentSession, JobContext, RoomOutputOptions, WorkerOptions, cli, llm, ) from livekit.plugins import openai, silero, bithuman # 1. Define your AI agent class MyAgent(Agent): def __init__(self): super().__init__( instructions="""You are a helpful and friendly assistant. Keep responses concise — 1-2 sentences.""", ) # 2. Set up the session when a user connects async def entrypoint(ctx: JobContext): await ctx.connect() # Wait for a user to join the room await ctx.wait_for_participant() # Create the avatar session (cloud-hosted) avatar = bithuman.AvatarSession( avatar_id=os.getenv("BITHUMAN_AGENT_ID"), # e.g. "A78WKV4515" api_secret=os.getenv("BITHUMAN_API_SECRET"), ) # Create the agent session with AI components session = AgentSession( stt=openai.STT(), # Speech-to-text llm=openai.LLM(), # AI language model tts=openai.TTS(), # Text-to-speech vad=silero.VAD.load(), # Voice activity detection ) # Start everything — avatar joins the room automatically await avatar.start(session, room=ctx.room) await session.start( agent=MyAgent(), room=ctx.room, room_output_options=RoomOutputOptions(audio_enabled=False), ) # 3. Launch if __name__ == "__main__": cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint)) ``` ### Environment Variables ```bash # Required export BITHUMAN_API_SECRET="your_api_secret" # From www.bithuman.ai/#developer export BITHUMAN_AGENT_ID="A78WKV4515" # Your agent's ID export OPENAI_API_KEY="sk-..." # For STT, LLM, TTS # LiveKit connection export LIVEKIT_URL="wss://your-project.livekit.cloud" export LIVEKIT_API_KEY="APIxxxxxxxx" export LIVEKIT_API_SECRET="xxxxxxxx" ``` ### Run It ```bash python agent.py dev ``` Then open [agents-playground.livekit.io](https://agents-playground.livekit.io) to connect and talk to your avatar. ### How It Works Behind the Scenes When `avatar.start()` and `session.start()` run: 1. The plugin sends a request to bitHuman's cloud API 2. A cloud avatar worker receives the request 3. The worker downloads the avatar model (cached after first time) 4. The worker joins your LiveKit room as a participant named `bithuman-avatar-agent` 5. As your agent produces TTS audio, the worker generates animated video frames 6. Video is published to the room — users see the avatar speaking **Essence vs Expression model:** By default, the cloud plugin uses the **Essence** (CPU) model, which works with pre-built `.imx` avatars. Add `model="expression"` to use the **Expression** (GPU) model, which supports custom face images. ### Using Expression Model (GPU) with Custom Image ```python from PIL import Image avatar = bithuman.AvatarSession( avatar_image=Image.open("face.jpg"), # Any face image api_secret=os.getenv("BITHUMAN_API_SECRET"), model="expression", ) ``` --- ## Self-Hosted CPU Run the avatar entirely on your own machine using a downloaded `.imx` model file. Great for privacy and offline use. ### Complete Working Example ```python import asyncio import os from livekit.agents import ( Agent, AgentSession, JobContext, RoomOutputOptions, WorkerOptions, cli, llm, ) from livekit.plugins import openai, silero, bithuman class MyAgent(Agent): def __init__(self): super().__init__( instructions="You are a helpful assistant. Keep responses brief.", ) async def entrypoint(ctx: JobContext): await ctx.connect() await ctx.wait_for_participant() # Create the avatar session (self-hosted, CPU) avatar = bithuman.AvatarSession( model_path=os.getenv("BITHUMAN_MODEL_PATH"), # e.g. "/models/avatar.imx" api_secret=os.getenv("BITHUMAN_API_SECRET"), ) session = AgentSession( stt=openai.STT(), llm=openai.LLM(), tts=openai.TTS(), vad=silero.VAD.load(), ) await avatar.start(session, room=ctx.room) await session.start( agent=MyAgent(), room=ctx.room, room_output_options=RoomOutputOptions(audio_enabled=False), ) if __name__ == "__main__": cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint)) ``` ### Environment Variables ```bash # Required export BITHUMAN_API_SECRET="your_api_secret" export BITHUMAN_MODEL_PATH="/path/to/avatar.imx" export OPENAI_API_KEY="sk-..." # LiveKit connection export LIVEKIT_URL="wss://your-project.livekit.cloud" export LIVEKIT_API_KEY="APIxxxxxxxx" export LIVEKIT_API_SECRET="xxxxxxxx" ``` ### How It Differs from Cloud | Aspect | Cloud | Self-Hosted CPU | |--------|-------|-----------------| | Model location | bitHuman's servers | Your machine | | Avatar parameter | `avatar_id="A78WKV4515"` | `model_path="/path/to/avatar.imx"` | | Internet needed | Yes (always) | Only for authentication | | First frame latency | 2-4 seconds | ~20 seconds (model load) | | Privacy | Audio sent to cloud | Audio stays local | ### System Requirements - **CPU:** 4+ cores (8 recommended) - **RAM:** 8 GB minimum - **Disk:** ~500 MB per `.imx` model - **OS:** Linux (x64/ARM64), macOS (M2+), or Windows (WSL) --- ## Self-Hosted GPU Use a GPU container that generates avatars from any face image — no pre-built models needed. ### Complete Working Example ```python import asyncio import os from livekit.agents import ( Agent, AgentSession, JobContext, RoomOutputOptions, WorkerOptions, cli, llm, ) from livekit.plugins import openai, silero, bithuman class MyAgent(Agent): def __init__(self): super().__init__( instructions="You are a helpful assistant. Keep responses brief.", ) async def entrypoint(ctx: JobContext): await ctx.connect() await ctx.wait_for_participant() # Create the avatar session (self-hosted GPU container) avatar = bithuman.AvatarSession( api_url=os.getenv("CUSTOM_GPU_URL", "http://localhost:8089/launch"), api_secret=os.getenv("BITHUMAN_API_SECRET"), avatar_image="https://example.com/face.jpg", # Any face image URL ) session = AgentSession( stt=openai.STT(), llm=openai.LLM(), tts=openai.TTS(), vad=silero.VAD.load(), ) await avatar.start(session, room=ctx.room) await session.start( agent=MyAgent(), room=ctx.room, room_output_options=RoomOutputOptions(audio_enabled=False), ) if __name__ == "__main__": cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint)) ``` ### Start the GPU Container First ```bash # Pull and run the GPU avatar container docker run --gpus all -p 8089:8089 \ -v /path/to/model-storage:/data/models \ -e BITHUMAN_API_SECRET=your_api_secret \ docker.io/sgubithuman/expression-avatar:latest ``` ### Environment Variables ```bash # Required export BITHUMAN_API_SECRET="your_api_secret" export CUSTOM_GPU_URL="http://localhost:8089/launch" export OPENAI_API_KEY="sk-..." # LiveKit connection export LIVEKIT_URL="wss://your-project.livekit.cloud" export LIVEKIT_API_KEY="APIxxxxxxxx" export LIVEKIT_API_SECRET="xxxxxxxx" ``` For detailed GPU container setup, see [Self-Hosted GPU Container](/deployment/self-hosted-gpu). --- ## Adding Gestures (Dynamics) Make your avatar perform gestures like waving, nodding, or laughing in response to conversation keywords. Dynamics require a cloud-generated agent with gestures enabled. Create one at [www.bithuman.ai](https://www.bithuman.ai). ### Step 1: Check Available Gestures ```python import requests agent_id = "A78WKV4515" headers = {"api-secret": os.getenv("BITHUMAN_API_SECRET")} resp = requests.get( f"https://api.bithuman.ai/v1/dynamics/{agent_id}", headers=headers, ) gestures = resp.json()["data"].get("gestures", {}) print(list(gestures.keys())) # Example: ["mini_wave_hello", "talk_head_nod_subtle", "laugh_react"] ``` ### Step 2: Trigger Gestures from Keywords ```python from livekit.agents import AgentSession, UserInputTranscribedEvent from bithuman.api import VideoControl KEYWORD_ACTION_MAP = { "hello": "mini_wave_hello", "hi": "mini_wave_hello", "funny": "laugh_react", "laugh": "laugh_react", "yes": "talk_head_nod_subtle", } # Inside your entrypoint, after session.start(): @session.on("user_input_transcribed") def on_transcribed(event: UserInputTranscribedEvent): if not event.is_final: return text = event.transcript.lower() for keyword, action in KEYWORD_ACTION_MAP.items(): if keyword in text: asyncio.create_task( avatar.runtime.push(VideoControl(action=action)) ) break ``` ```python from livekit import rtc import json from datetime import datetime KEYWORD_ACTION_MAP = { "hello": "mini_wave_hello", "funny": "laugh_react", } async def trigger_gesture(participant: rtc.LocalParticipant, target: str, action: str): await participant.perform_rpc( destination_identity=target, method="trigger_dynamics", payload=json.dumps({ "action": action, "identity": participant.identity, "timestamp": datetime.utcnow().isoformat(), }), ) # Inside your entrypoint, after session.start(): @session.on("user_input_transcribed") def on_transcribed(event: UserInputTranscribedEvent): if not event.is_final: return text = event.transcript.lower() for keyword, action in KEYWORD_ACTION_MAP.items(): if keyword in text: for identity in ctx.room.remote_participants.keys(): asyncio.create_task( trigger_gesture(ctx.room.local_participant, identity, action) ) break ``` --- ## Controlling the Avatar via REST API Once an avatar is running in a room, you can control it from any backend using the REST API — no LiveKit connection needed. ### Make the Avatar Speak ```bash curl -X POST "https://api.bithuman.ai/v1/agent/A78WKV4515/speak" \ -H "api-secret: $BITHUMAN_API_SECRET" \ -H "Content-Type: application/json" \ -d '{"message": "Hello! Welcome to our demo."}' ``` ### Add Context (Silent Knowledge) ```bash curl -X POST "https://api.bithuman.ai/v1/agent/A78WKV4515/add-context" \ -H "api-secret: $BITHUMAN_API_SECRET" \ -H "Content-Type: application/json" \ -d '{ "context": "The customer just purchased a premium plan.", "type": "add_context" }' ``` The avatar won't say this aloud, but it will use the information in future responses. These REST API calls work from any language or platform — use them to integrate avatars into existing apps without touching the agent code. --- ## Using the SDK Without LiveKit If you don't need real-time rooms (e.g., generating video files or building a custom UI), use the Python SDK directly: ```python import asyncio import cv2 from bithuman import AsyncBithuman from bithuman.audio import load_audio, float32_to_int16 async def main(): # Initialize the runtime runtime = await AsyncBithuman.create( model_path="avatar.imx", api_secret="your_api_secret", ) await runtime.start() # Load an audio file and push it audio, sr = load_audio("speech.wav") audio_int16 = float32_to_int16(audio) await runtime.push_audio(audio_int16.tobytes(), sr) await runtime.flush() # Get animated video frames async for frame in runtime.run(): if frame.has_image: cv2.imshow("Avatar", frame.bgr_image) cv2.waitKey(1) if frame.end_of_speech: break asyncio.run(main()) ``` This gives you raw numpy frames — display them however you want. --- ## Complete Docker Example For the fastest path to a working demo, use the Docker example that packages everything together: ```bash # Clone the examples repo git clone https://github.com/bithuman-product/examples.git cd examples/essence-selfhosted # Configure cat > .env << 'EOF' BITHUMAN_API_SECRET=your_api_secret OPENAI_API_KEY=sk-... LIVEKIT_URL=wss://your-project.livekit.cloud LIVEKIT_API_KEY=APIxxxxxxxx LIVEKIT_API_SECRET=xxxxxxxx EOF # Add your avatar model mkdir -p models cp ~/Downloads/avatar.imx models/ # Launch docker compose up ``` Open [http://localhost:4202](http://localhost:4202) to talk to your avatar. --- ## Troubleshooting **Cloud mode:** Check that your `avatar_id` exists — look it up in the [bitHuman dashboard](https://www.bithuman.ai). Verify your API secret is valid with: ```bash curl -X POST https://api.bithuman.ai/v1/validate \ -H "api-secret: $BITHUMAN_API_SECRET" ``` **Self-hosted mode:** Check that the `.imx` file path is correct and the file is not corrupted: ```bash bithuman validate --model-path /path/to/avatar.imx ``` The avatar needs audio input to animate. Ensure: 1. Your TTS is producing audio (test with `openai.TTS()` separately) 2. Ensure `avatar.start(session, room=ctx.room)` is called before `session.start()` 3. Check agent logs for audio pipeline errors - Verify your API secret is correct (copy-paste from dashboard) - Check you have credits remaining in your account - Ensure the `BITHUMAN_API_SECRET` environment variable is set **Cloud:** First request downloads the model (~2-4 seconds). Subsequent requests use cache (~1-2 seconds). **Self-hosted CPU:** First load takes ~20 seconds (model initialization). Keep the process running for fast subsequent sessions. **Self-hosted GPU:** Cold start takes ~30-40 seconds. Use long-running containers with preset avatars for ~4 second startup. All avatar workers are busy. The system retries automatically (up to 5 times with backoff). If it persists: - Check your usage limits - Try again in a few seconds - For self-hosted: increase the number of worker replicas --- ## Billing & Credits Avatar sessions consume credits based on the deployment mode and session duration. | Deployment | Credit Cost | Billed By | Notes | |------------|-------------|-----------|-------| | **Cloud Plugin** | Per session minute | Session duration | Includes GPU rendering | | **Self-Hosted CPU** | Per authentication | Auth call | Rendering is free (your hardware) | | **Self-Hosted GPU** | Per authentication | Auth call | Rendering is free (your hardware) | Check your remaining credits at [www.bithuman.ai](https://www.bithuman.ai) > Developer section. Credits are consumed only for active sessions — idle containers cost nothing. --- ## Next Steps Add gestures and movements Get notified about session events Put avatars on any website ## LiveKit Cloud Plugin: Zero-GPU Avatar Setup URL: https://docs.bithuman.ai/deployment/livekit-cloud-plugin Use existing bitHuman agents in real-time applications with our cloud-hosted LiveKit plugin. The avatar runs on bitHuman's servers — no model files, no GPU needed on your side. **New here?** Read [How It Works](/getting-started/how-it-works) first to understand rooms, sessions, and avatars. ## Quick Start The bitHuman plugin ships inside the livekit/agents repository. Remove any PyPI version first to avoid conflicts, then install from GitHub: ```bash # Remove old PyPI version if present (safe to ignore "not installed" warnings) uv pip uninstall livekit-plugins-bithuman # Install the latest version GIT_LFS_SKIP_SMUDGE=1 uv pip install git+https://github.com/livekit/agents@main#subdirectory=livekit-plugins/livekit-plugins-bithuman ``` Go to [www.bithuman.ai](https://www.bithuman.ai/#developer) and copy your **API Secret**. Click on any agent card in [your dashboard](https://www.bithuman.ai). The **Agent Settings** dialog shows your Agent ID (e.g., `A78WKV4515`). ```bash export BITHUMAN_API_SECRET="your_api_secret" export BITHUMAN_AGENT_ID="A78WKV4515" export OPENAI_API_KEY="sk-..." # LiveKit (get from cloud.livekit.io) export LIVEKIT_URL="wss://your-project.livekit.cloud" export LIVEKIT_API_KEY="APIxxxxxxxx" export LIVEKIT_API_SECRET="xxxxxxxx" ``` --- ## Complete Working Example Here's a full agent that uses a cloud-hosted avatar: ```python import asyncio import os from livekit.agents import ( Agent, AgentSession, JobContext, RoomOutputOptions, WorkerOptions, cli, llm, ) from livekit.plugins import openai, silero, bithuman class MyAgent(Agent): def __init__(self): super().__init__( instructions="""You are a friendly assistant. Keep responses to 1-2 sentences.""", ) async def entrypoint(ctx: JobContext): # Connect to the LiveKit room await ctx.connect() # Wait for a human to join await ctx.wait_for_participant() # Create a cloud-hosted avatar avatar = bithuman.AvatarSession( avatar_id=os.getenv("BITHUMAN_AGENT_ID"), api_secret=os.getenv("BITHUMAN_API_SECRET"), ) # Wire up the AI pipeline session = AgentSession( stt=openai.STT(), # Listens to the user llm=openai.LLM(), # Generates responses tts=openai.TTS(), # Converts text to speech vad=silero.VAD.load(), # Detects when user is speaking ) # Start — avatar joins room and begins animating await avatar.start(session, room=ctx.room) await session.start( agent=MyAgent(), room=ctx.room, room_output_options=RoomOutputOptions(audio_enabled=False), ) if __name__ == "__main__": cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint)) ``` Run it: ```bash python agent.py dev ``` Open [agents-playground.livekit.io](https://agents-playground.livekit.io) and talk to your avatar. ### What Happens When You Run This 1. Your agent connects to a LiveKit room and waits for a user 2. When a user joins, `AvatarSession` sends a request to bitHuman's cloud 3. A cloud avatar worker downloads the model (cached after first time) and joins the room 4. The user speaks → STT transcribes → LLM responds → TTS generates audio → Avatar animates 5. The avatar publishes video to the room — the user sees a talking face --- ## Avatar Modes ### Essence Model (CPU) — Default Pre-built avatars with full body support, animal mode, and fast response times. ```python avatar = bithuman.AvatarSession( avatar_id="A78WKV4515", api_secret="your_api_secret", ) ``` ### Expression Model (GPU) — Agent ID Higher-fidelity face animation for platform-created agents. ```python avatar = bithuman.AvatarSession( avatar_id="A78WKV4515", api_secret="your_api_secret", model="expression", ) ``` ### Expression Model (GPU) — Custom Image Create an avatar from any face image on-the-fly. ```python from PIL import Image avatar = bithuman.AvatarSession( avatar_image=Image.open("face.jpg"), api_secret="your_api_secret", model="expression", ) ``` ### Model Comparison | Feature | Essence (CPU) | Expression (GPU) | |---------|--------------|------------------| | Personalities | Pre-trained | Dynamic | | Response time | Faster (~2s) | Standard (~4s) | | Body support | Full body + animal mode | Face and shoulders | | Animal mode | Yes | No | | Custom images | No | Yes | --- ## Adding Gestures (Dynamics) Make the avatar wave, nod, or laugh in response to conversation keywords. ### Step 1: Get Available Gestures ```python import requests import os agent_id = os.getenv("BITHUMAN_AGENT_ID") headers = {"api-secret": os.getenv("BITHUMAN_API_SECRET")} response = requests.get( f"https://api.bithuman.ai/v1/dynamics/{agent_id}", headers=headers, ) gestures = response.json()["data"].get("gestures", {}) print(list(gestures.keys())) # Example: ["mini_wave_hello", "talk_head_nod_subtle", "laugh_react"] ``` ### Step 2: Trigger on Keywords ```python from livekit.agents import UserInputTranscribedEvent from livekit import rtc import json from datetime import datetime KEYWORD_ACTION_MAP = { "laugh": "laugh_react", "funny": "laugh_react", "hello": "mini_wave_hello", "hi": "mini_wave_hello", } async def send_dynamics_trigger( local_participant: rtc.LocalParticipant, destination_identity: str, action: str, ) -> None: await local_participant.perform_rpc( destination_identity=destination_identity, method="trigger_dynamics", payload=json.dumps({ "action": action, "identity": local_participant.identity, "timestamp": datetime.utcnow().isoformat(), }), ) # Add this after session.start() in your entrypoint: @session.on("user_input_transcribed") def on_user_input_transcribed(event: UserInputTranscribedEvent): if not event.is_final: return transcript = event.transcript.lower() for keyword, action in KEYWORD_ACTION_MAP.items(): if keyword in transcript: for identity in ctx.room.remote_participants.keys(): asyncio.create_task( send_dynamics_trigger( ctx.room.local_participant, identity, action ) ) break ``` Gesture actions vary by agent. Always check the Dynamics API response first to see what's available for your specific agent. --- ## Configuration | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `avatar_id` | string | Yes* | Agent ID from the bitHuman dashboard | | `avatar_image` | PIL.Image | Yes* | Face image for on-the-fly avatar (Expression only) | | `api_secret` | string | Yes | Your API secret | | `model` | string | No | `"essence"` (default) or `"expression"` | *Either `avatar_id` or `avatar_image` is required. --- ## Cloud Advantages - **No Local Storage** — No large model files to download or manage - **Auto-Updates** — Always uses the latest model versions - **Scalability** — Handles multiple concurrent sessions automatically - **Cross-Platform** — Works on any device with internet --- ## Pricing Visit [www.bithuman.ai](https://www.bithuman.ai/#api) for current pricing. **Free Tier:** 199 credits per month, community support **Pro:** Unlimited credits, priority support --- ## Troubleshooting | Problem | Solution | |---------|----------| | Authentication errors | Verify API secret at [www.bithuman.ai](https://www.bithuman.ai/#developer) | | Avatar doesn't appear | Check agent_id exists in your dashboard | | Network timeouts | Ensure stable internet; the plugin retries automatically | | Plugin installation fails | Use `uv` with `GIT_LFS_SKIP_SMUDGE=1` flag | | No lip movement | Ensure `avatar.start(session, room=ctx.room)` is called before `session.start()` | --- ## Next Steps All avatar modes explained with complete examples Run on your own infrastructure Configure gestures and animations ## Self-Hosted GPU: Expression Avatar Docker Container URL: https://docs.bithuman.ai/deployment/self-hosted-gpu **Preview Feature** — 2 credits per minute while using the GPU container. ## Overview The self-hosted GPU avatar container (`docker.io/sgubithuman/expression-avatar:latest`) enables production-grade avatar generation on your own GPU infrastructure. - **Full Control** — Complete control over deployment, scaling, and configuration - **Cost Optimization** — Pay only for the GPU resources you use - **Data Privacy** — Avatar images and audio never leave your infrastructure - **Customization** — Extend the worker with custom logic and integrations ### How It Works The container is a GPU worker that joins a [LiveKit](https://livekit.io) room and streams avatar video frames in real time. Your application calls the `/launch` endpoint with LiveKit room credentials and an avatar image; the container connects to the room, listens for audio, and generates lip-synced video at 25 FPS — entirely on your GPU. ``` Your Agent (LiveKit) │ │ POST /launch │ { livekit_url, livekit_token, room_name, avatar_image } ▼ expression-avatar container │ ├─ Joins LiveKit room as video publisher ├─ Receives audio from agent via data stream └─ Generates 25 FPS lip-synced video → streams to room ↑ 100% local GPU — no cloud calls during inference ``` --- ## Prerequisites - NVIDIA GPU with **≥8 GB VRAM** (RTX 3080 or better) - [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) installed - Docker 24+ with Compose v2 - bitHuman API secret from the [bitHuman Console](https://www.bithuman.ai) - Model weights download automatically on first start (~5 GB, cached in Docker volume) - A running [LiveKit server](https://docs.livekit.io/home/self-hosting/local/) (or LiveKit Cloud) --- ## Quick Start Model weights download automatically on first run — just provide your API secret: ```bash # 1. Pull the image (includes wav2vec2 audio encoder, ~360 MB) docker pull docker.io/sgubithuman/expression-avatar:latest # 2. Run — proprietary weights (~4.7 GB) download automatically on first start docker run --gpus all -p 8089:8089 \ -v bithuman-models:/data/models \ -e BITHUMAN_API_SECRET=your_api_secret \ docker.io/sgubithuman/expression-avatar:latest ``` ```bash # 3. Wait for startup (first run: ~3 min download + ~48s GPU compilation) # Subsequent starts: ~48s (weights already cached in the named volume) curl http://localhost:8089/health # {"status": "healthy", "service": "expression-avatar", "active_sessions": 0, "max_sessions": 8} ``` The `-v bithuman-models:/data/models` named volume caches the downloaded weights so you only pay the download cost once. Once healthy, the container is ready to accept avatar sessions via `/launch`. --- ## Docker Compose Setup Use the [full example](https://github.com/bithuman-product/examples/tree/main/expression-selfhosted) for a complete setup with LiveKit, an AI agent, and a web frontend: ```bash git clone https://github.com/bithuman-product/examples.git cd examples/expression-selfhosted # Configure environment cp .env.example .env # Edit .env with your API secret, OpenAI key, and avatar image # Copy your avatar image into ./avatars/ mkdir -p avatars cp /path/to/your/avatar.jpg avatars/ # Model weights download automatically on first run — nothing to pre-download! docker compose up ``` Open `http://localhost:4202` to start a conversation with your GPU avatar. --- ## Integration Guide The container exposes a simple HTTP API. Your LiveKit agent calls `/launch` to start an avatar session. There are two ways to integrate: ### Option 1: LiveKit Python Plugin (Recommended) Install the bitHuman LiveKit plugin: ```bash pip install livekit-plugins-bithuman ``` In your LiveKit agent, point `AvatarSession` at your container's `/launch` endpoint: ```python from livekit.agents import Agent, AgentSession, JobContext, WorkerOptions, WorkerType, cli from livekit.plugins import bithuman, openai, silero async def entrypoint(ctx: JobContext): await ctx.connect() await ctx.wait_for_participant() avatar = bithuman.AvatarSession( api_url="http://localhost:8089/launch", # your container api_secret="your_api_secret", # for billing avatar_image="/path/to/avatar.jpg", # local file or HTTPS URL ) session = AgentSession( llm=openai.realtime.RealtimeModel(voice="coral"), vad=silero.VAD.load(), ) await avatar.start(session, room=ctx.room) await session.start( agent=Agent(instructions="You are a helpful assistant."), room=ctx.room, ) if __name__ == "__main__": cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint, worker_type=WorkerType.ROOM)) ``` The plugin handles room token generation and calls `/launch` automatically when a participant joins. ### Option 2: Direct HTTP API You can call `/launch` directly from any HTTP client. The container joins the LiveKit room as a video publisher. ```bash # Generate a LiveKit room token first (using livekit-server-sdk or CLI) TOKEN=$(livekit-token create --room my-room --identity avatar-worker \ --api-key devkey --api-secret your-livekit-secret) # Launch with an image URL curl -X POST http://localhost:8089/launch \ -F "livekit_url=ws://your-livekit-server:7880" \ -F "livekit_token=$TOKEN" \ -F "room_name=my-room" \ -F "avatar_image_url=https://example.com/avatar.jpg" # Or upload an image file directly curl -X POST http://localhost:8089/launch \ -F "livekit_url=ws://your-livekit-server:7880" \ -F "livekit_token=$TOKEN" \ -F "room_name=my-room" \ -F "avatar_image=@./avatar.jpg" ``` Response (async by default): ```json { "status": "pending", "task_id": "a1b2c3d4", "room_name": "my-room" } ``` The avatar is live in the room within ~4–6 seconds. --- ## HTTP API Reference All endpoints are served on port `8089` (default). ### `POST /launch` Start an avatar session for a LiveKit room. The container joins the room and begins streaming lip-synced video. **Content-Type:** `multipart/form-data` | Field | Type | Required | Description | |-------|------|----------|-------------| | `livekit_url` | string | Yes | LiveKit server WebSocket URL (e.g. `ws://livekit:7880`) | | `livekit_token` | string | Yes | LiveKit room token with publish permissions | | `room_name` | string | Yes | LiveKit room name (must match token) | | `avatar_image` | file | No* | Avatar image file upload (JPEG/PNG) | | `avatar_image_url` | string | No* | Avatar image HTTPS URL (alternative to file upload) | | `prompt` | string | No | Motion prompt (default: `"A person is talking naturally."`) | | `api_secret` | string | No | Override billing secret (defaults to `BITHUMAN_API_SECRET`) | | `async_mode` | bool | No | Return immediately (`true`, default) or wait for session to end | *Provide either `avatar_image` or `avatar_image_url`. If neither is given, a default image is used. **Response (async_mode=true):** ```json { "status": "pending", "task_id": "a1b2c3d4", "room_name": "my-room" } ``` **Error responses:** - `503 Service Unavailable` — container still initializing, or at session capacity - `400 Bad Request` — invalid image or download failed --- ### `GET /health` Lightweight health check. Always returns 200 once the container is running (even during model loading). ```json { "status": "healthy", "service": "expression-avatar", "active_sessions": 2, "max_sessions": 8 } ``` --- ### `GET /ready` Readiness check. Returns `200` only when the model is loaded and a session slot is available. Use this to gate traffic in load balancers or health checks. ```json { "status": "ready", "model_ready": true, "active_sessions": 2, "available_sessions": 6, "max_sessions": 8 } ``` Returns `503` with `"status": "not_ready"` during model loading, or `"status": "at_capacity"` when all session slots are in use. --- ### `GET /tasks` List all sessions (active and completed). ```bash curl http://localhost:8089/tasks ``` ```json { "tasks": [ { "task_id": "a1b2c3d4", "room_name": "my-room", "status": "running", "created_at": "2024-01-01T12:00:00", "completed_at": null, "error": null } ] } ``` --- ### `GET /tasks/{task_id}` Check the status of a specific session. ```json { "task_id": "a1b2c3d4", "room_name": "my-room", "status": "running", "created_at": "2024-01-01T12:00:00", "completed_at": null, "error": null } ``` Status values: `pending` → `running` → `completed` / `failed` / `cancelled` --- ### `POST /tasks/{task_id}/stop` Stop a running session and release the session slot. ```bash curl -X POST http://localhost:8089/tasks/a1b2c3d4/stop ``` --- ### `POST /benchmark` Run an inference benchmark and return per-stage timing. Useful for verifying GPU performance. ```bash curl -X POST "http://localhost:8089/benchmark?iterations=10" ``` ```json { "iterations": 10, "frames_per_generate": 24, "avg_ms": 79.3, "fps": 302.6, "stages": { "dit_ms": 41.2, "vae_decode_ms": 13.1, "vae_encode_ms": 8.5, "color_correct_ms": 6.1, "postprocess_ms": 2.8, "audio_ms": 7.1 }, "vram_gb": 6.2, "gpu": "NVIDIA GPU" } ``` --- ### `GET /test-frame` Generate a few chunks and return the last frame as a JPEG. Useful for verifying the model is producing valid output. ```bash curl http://localhost:8089/test-frame --output frame.jpg open frame.jpg ``` --- ## Environment Variables | Variable | Required | Default | Description | |----------|----------|---------|-------------| | `BITHUMAN_API_SECRET` | Yes | — | API secret for billing and weight download | | `MAX_SESSIONS` | No | `8` | Max concurrent avatar sessions | | `CUDA_VISIBLE_DEVICES` | No | all GPUs | Restrict to specific GPU (e.g. `0`) | | `BITHUMAN_API_URL` | No | `https://api.bithuman.ai` | Override API endpoint (for testing) | | `FAST_DECODER_CONFIG` | No | — | Path to fast decoder config JSON (optional speedup) | | `FAST_DECODER_CHECKPOINT` | No | — | Path to fast decoder weights (optional speedup) | Without `BITHUMAN_API_SECRET`, avatar sessions will run but usage will not be tracked or billed. This is not permitted for production use. --- ## Performance Characteristics | GPU Tier | VRAM Usage | Concurrent Sessions | |----------|------------|---------------------| | High-end (data center) | ~6 GB | up to 8 concurrent | | High-end (consumer) | ~6 GB | up to 4 concurrent | | Mid-range | ~6 GB | up to 2 concurrent | | Configuration | Time to First Frame | Description | |---------------|---------------------|-------------| | Long-running container | ~4–6 seconds | Model loaded at startup; new sessions encode image (~2s) then stream | | Cold start | ~48 seconds | Full GPU model compilation on first start (cached on subsequent starts) | ### Long-Running Containers (Recommended) Keep the container running between sessions. The model loads once at startup (~48s including GPU compilation), and subsequent sessions start in ~4–6 seconds. ```bash docker run --gpus all -p 8089:8089 --restart always \ -v bithuman-models:/data/models \ -e BITHUMAN_API_SECRET=your_api_secret \ docker.io/sgubithuman/expression-avatar:latest ``` --- ## Troubleshooting | Problem | Solution | |---------|----------| | Container won't start | Check GPU: `nvidia-smi`; check logs: `docker logs ` | | First start takes >5 minutes | Normal — weights are downloading (~4.7 GB). Check logs for download progress. | | Download fails with 401 | Verify `BITHUMAN_API_SECRET` is set and valid | | Download fails with connection error | Check outbound internet access from the container | | `/health` returns connection refused | Container still initializing — wait for `PREWARM: Pipeline loaded` in logs | | `/launch` returns `503 not_ready` | Model still loading — poll `/ready` until `model_ready: true` | | `/launch` returns `503 at_capacity` | All session slots in use; increase `MAX_SESSIONS` or scale horizontally | | Startup takes >2 minutes (after download) | GPU compilation runs once per container — subsequent starts reuse compiled cache | | Out of memory | Use a GPU with ≥8 GB VRAM; reduce `MAX_SESSIONS` if needed | | Billing not working | Verify `BITHUMAN_API_SECRET` is set; check logs for `[HEARTBEAT]` messages | | Avatar image not showing | Check `/test-frame` — if it returns a valid JPEG, image encoding is working | --- ## Next Steps Full GPU setup with LiveKit server, AI agent, and web frontend LiveKit Agents Python SDK documentation --- # Integrations ## Embed Avatars on Any Website (iframe) URL: https://docs.bithuman.ai/integrations/embed Embed a bitHuman avatar directly on your website so visitors can have real-time conversations without leaving your page. --- ## How Embedding Works ``` Your Website bitHuman Cloud ┌─────────────────────┐ ┌─────────────────────┐ │ │ │ │ │ │ │ │ │ │ │ AI Agent │ │ Your page content │ │ (conversation) │ │ │ │ │ └─────────────────────┘ └─────────────────────┘ ``` The avatar runs entirely in bitHuman's cloud. Your website just needs a small embed snippet. --- ## Quick Start Call the embed token API from your **backend** (never expose your API secret in the browser): ```python import requests resp = requests.post( "https://api.bithuman.ai/v1/embed-tokens/request", headers={"api-secret": "your_api_secret"}, json={ "agent_id": "A78WKV4515", "fingerprint": "unique-visitor-id", }, ) data = resp.json()["data"] token = data["token"] # Short-lived JWT session_id = data["sid"] # Session identifier ``` Send the token to your frontend via your own API endpoint: ```javascript // Your backend endpoint (Express example) app.get("/api/avatar-token", async (req, res) => { const response = await fetch( "https://api.bithuman.ai/v1/embed-tokens/request", { method: "POST", headers: { "api-secret": process.env.BITHUMAN_API_SECRET, "Content-Type": "application/json", }, body: JSON.stringify({ agent_id: "A78WKV4515", fingerprint: req.query.fp || "anonymous", }), } ); const data = await response.json(); res.json(data.data); }); ``` Use the token to load the avatar widget: ```html ``` --- ## Token Details | Property | Value | |----------|-------| | **Lifetime** | 1 hour | | **Scope** | Single agent, single session | | **JWT claims** | `userId`, `sessionId`, `agentCode`, `model`, `app` | **Never put your API secret in frontend code.** Always generate embed tokens from your backend server. The API secret grants full access to your account. --- ## Complete Example (Python + HTML) ### Backend (Flask) ```python from flask import Flask, jsonify, request import requests import os app = Flask(__name__) @app.route("/api/avatar-token") def get_token(): resp = requests.post( "https://api.bithuman.ai/v1/embed-tokens/request", headers={"api-secret": os.environ["BITHUMAN_API_SECRET"]}, json={ "agent_id": "A78WKV4515", "fingerprint": request.args.get("fp", "web-visitor"), }, ) return jsonify(resp.json()["data"]) if __name__ == "__main__": app.run(port=3000) ``` ### Frontend (HTML) ```html My Website with Avatar

Talk to Our AI Assistant

``` --- ## Customization ### Responsive Sizing ```html
``` ### Control the Avatar from Your Page Use the REST API to send messages to an active avatar session: ```javascript // Make the avatar say something await fetch("https://api.bithuman.ai/v1/agent/A78WKV4515/speak", { method: "POST", headers: { "api-secret": API_SECRET, // Call from backend! "Content-Type": "application/json", }, body: JSON.stringify({ message: "Welcome! How can I help you today?", }), }); ``` --- ## Troubleshooting | Problem | Solution | |---------|----------| | Blank iframe | Check that the token is valid and not expired (1 hour TTL) | | No audio | Ensure `allow="microphone"` is set on the iframe | | CORS errors | Embed tokens must be generated from your backend, not frontend | | Avatar not responding | Check agent has an active session — verify agent_id is correct | --- ## Next Steps Get notified when users join sessions Control what the avatar says programmatically ## Webhooks: Real-Time Avatar Event Notifications URL: https://docs.bithuman.ai/integrations/webhooks ## Quick Setup Go to [www.bithuman.ai/#developer](https://www.bithuman.ai/#developer) and open the **Webhooks** section. Must be HTTPS. Choose **room.join**, **chat.push**, or both. ```http Authorization: Bearer your-api-token X-API-Key: your-secret-key ``` --- ## Payload Format All payloads follow the same structure: | Field | Type | Description | |-------|------|-------------| | `agent_id` | string | The agent that triggered the event | | `event_type` | string | Event name (`room.join` or `chat.push`) | | `data` | object | Event-specific data | | `timestamp` | float | Unix timestamp | ### room.join ```json { "agent_id": "agent_abc123", "event_type": "room.join", "data": { "room_name": "customer-support", "participant_count": 1, "session_id": "session_xyz789" }, "timestamp": 1705312200.0 } ``` ### chat.push ```json { "agent_id": "agent_abc123", "event_type": "chat.push", "data": { "role": "user", "message": "Hello, I need help with my order", "session_id": "session_xyz789", "timestamp": 1705312285.0 }, "timestamp": 1705312285.0 } ``` --- ## Implementation Examples ```python Flask (Python) from flask import Flask, request, jsonify import hmac, hashlib app = Flask(__name__) WEBHOOK_SECRET = "your-webhook-secret" @app.route('/webhook', methods=['POST']) def handle_webhook(): signature = request.headers.get('X-bitHuman-Signature', '') if not verify_signature(request.data, signature): return jsonify({'error': 'Invalid signature'}), 401 data = request.json event_type = data.get('event_type') if event_type == 'room.join': print(f"User joined session {data['data']['session_id']}") elif event_type == 'chat.push': print(f"[{data['data']['role']}] {data['data']['message']}") return jsonify({'status': 'ok'}) def verify_signature(payload, signature): expected = hmac.new( WEBHOOK_SECRET.encode(), payload, hashlib.sha256 ).hexdigest() return hmac.compare_digest(f"sha256={expected}", signature) if __name__ == '__main__': app.run(port=3000) ``` ```javascript Express (Node.js) const express = require('express'); const crypto = require('crypto'); const app = express(); app.use(express.json()); const WEBHOOK_SECRET = 'your-webhook-secret'; app.post('/webhook', (req, res) => { const signature = req.headers['x-bithuman-signature'] || ''; if (!verifySignature(req.body, signature)) { return res.status(401).json({ error: 'Invalid signature' }); } const { event_type, data } = req.body; if (event_type === 'room.join') { console.log(`User joined session ${data.session_id}`); } else if (event_type === 'chat.push') { console.log(`[${data.role}] ${data.message}`); } res.json({ status: 'ok' }); }); function verifySignature(payload, signature) { const expected = crypto .createHmac('sha256', WEBHOOK_SECRET) .update(JSON.stringify(payload)) .digest('hex'); return crypto.timingSafeEqual( Buffer.from(`sha256=${expected}`), Buffer.from(signature) ); } app.listen(3000, () => console.log('Listening on port 3000')); ``` --- ## Signature Verification All webhook requests include an `X-bitHuman-Signature` header. Verify it using HMAC SHA-256: 1. Compute `HMAC-SHA256(secret, raw_request_body)` 2. Compare the hex digest against the signature header (strip `sha256=` prefix) 3. Use constant-time comparison to prevent timing attacks Always use HTTPS. HTTP endpoints are rejected. --- ## Testing ### Local development with ngrok ```bash ngrok http 3000 # Use the resulting HTTPS URL as your webhook endpoint ``` ### Manual curl test ```bash curl -X POST https://your-app.com/webhook \ -H "Content-Type: application/json" \ -H "X-bitHuman-Signature: sha256=test" \ -d '{ "agent_id": "test_agent", "event_type": "room.join", "data": { "room_name": "test-room", "participant_count": 1, "session_id": "session_123" }, "timestamp": 1705312200.0 }' ``` --- ## Retry Policy Failed deliveries (non-2xx responses) are retried automatically: | Attempt | Delay | |---------|-------| | 1st retry | 1 second | | 2nd retry | 5 seconds | | 3rd retry | 30 seconds | Maximum 3 retries. Your endpoint must respond within 30 seconds. ## Troubleshooting | Issue | Solution | |-------|----------| | Signature invalid | Verify HMAC SHA-256 against raw request body | | Timeout errors | Return 200 immediately, process async | | 404 Not Found | Check endpoint URL in dashboard | | SSL errors | Use a valid HTTPS certificate | ## Webhook Event Types: room.join & chat.push URL: https://docs.bithuman.ai/integrations/events Webhooks deliver HTTP POST requests to your endpoint when avatar events occur. For setup instructions, handler examples, and retry policies, see the [Webhook Integration Guide](/integrations/webhooks). ## Event Types ### room.join Fired once when a user connects to an avatar session. ```json { "agent_id": "agent_customer_support", "event_type": "room.join", "data": { "room_name": "customer-support-room", "participant_count": 1, "session_id": "session_xyz789" }, "timestamp": 1705312200.0 } ``` ### chat.push Fired for each message sent in the conversation (both user and agent). ```json { "agent_id": "agent_customer_support", "event_type": "chat.push", "data": { "role": "user", "message": "I need help with my order #12345", "session_id": "session_xyz789", "timestamp": 1705312285.0 }, "timestamp": 1705312285.0 } ``` --- For complete handler examples (Flask, Express), signature verification, endpoint setup, testing, and retry policy, see the [Webhook Integration Guide](/integrations/webhooks). --- ## Async Processing Return `200` immediately and process events in the background. Long-running work (database writes, API calls, analytics) should be offloaded to a task queue so your endpoint responds within the timeout window. Any standard job queue (Celery, BullMQ, Sidekiq, etc.) works. --- ## Resources - [Webhook Integration Guide](/integrations/webhooks) — endpoint setup, signature verification, testing, and retry policy - [Discord](https://discord.gg/ES953n7bPA) — community support ## Flutter + bitHuman: Mobile Avatar App URL: https://docs.bithuman.ai/integrations/flutter ## Architecture ``` ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ Flutter App │ │ LiveKit Room │ │ Python Agent │ │ Video View │◄──►│ Real-time │◄──►│ bitHuman │ │ Audio Capture │ │ Streaming │ │ Avatar + LLM │ └─────────────────┘ └─────────────────┘ └─────────────────┘ ``` - **Flutter App**: Cross-platform UI, camera/microphone capture, video rendering - **LiveKit Room**: Real-time media routing, participant management - **Python Agent**: AI conversation processing, avatar rendering ## Prerequisites - Flutter SDK 3.0+ - Python 3.11+ - bitHuman API Secret - LiveKit Cloud account - OpenAI API Key --- ## Quick Start ```bash mkdir flutter-bithuman-avatar cd flutter-bithuman-avatar mkdir -p backend frontend/lib ``` ```bash cd backend python3 -m venv .venv source .venv/bin/activate pip install "livekit-agents[openai,bithuman,silero]~=1.4" flask flask-cors python-dotenv ``` Create `.env`: ```bash BITHUMAN_API_SECRET=your_api_secret BITHUMAN_AGENT_ID=A33NZN6384 OPENAI_API_KEY=sk-proj_your_key_here LIVEKIT_API_KEY=APIyour_key LIVEKIT_API_SECRET=your_secret LIVEKIT_URL=wss://your-project.livekit.cloud ``` ```bash cd ../frontend flutter create . --org com.bithuman.avatar ``` Update `pubspec.yaml` dependencies: ```yaml dependencies: flutter: sdk: flutter livekit_components: 1.2.2+hotfix.1 livekit_client: ^2.5.3 provider: ^6.1.1 http: ^1.1.0 ``` ```bash flutter pub get ``` ```bash # Terminal 1: Start Backend cd backend && source .venv/bin/activate python token_server.py & python agent.py dev # Terminal 2: Start Frontend cd frontend flutter run -d chrome --web-port 8080 ``` --- ## Token Server LiveKit requires a JWT to join rooms. Never ship LiveKit API keys in client apps. Use a server endpoint to mint short-lived tokens. ```python token_server.py from flask import Flask, request, jsonify from livekit import api from datetime import timedelta import os from dotenv import load_dotenv load_dotenv() app = Flask(__name__) LIVEKIT_API_KEY = os.getenv("LIVEKIT_API_KEY") LIVEKIT_API_SECRET = os.getenv("LIVEKIT_API_SECRET") LIVEKIT_URL = os.getenv("LIVEKIT_URL") @app.route('/token', methods=['POST']) def create_token(): data = request.get_json() or {} room = data.get('room', 'flutter-avatar-room') identity = data.get('participant', 'Flutter User') at = api.AccessToken(LIVEKIT_API_KEY, LIVEKIT_API_SECRET, identity=identity) at.add_grant(api.VideoGrant(room_join=True, room=room)) at.ttl = timedelta(hours=1) return jsonify({'token': at.to_jwt(), 'server_url': LIVEKIT_URL}) if __name__ == '__main__': app.run(host='0.0.0.0', port=3000) ``` --- ## Python Agent ```python agent.py import os from dotenv import load_dotenv from livekit.agents import ( Agent, AgentSession, JobContext, RoomOutputOptions, WorkerOptions, WorkerType, cli, ) from livekit.plugins import bithuman, openai, silero load_dotenv() async def entrypoint(ctx: JobContext): await ctx.connect() await ctx.wait_for_participant() avatar = bithuman.AvatarSession( avatar_id=os.getenv("BITHUMAN_AGENT_ID"), api_secret=os.getenv("BITHUMAN_API_SECRET"), ) session = AgentSession( llm=openai.realtime.RealtimeModel( voice="coral", model="gpt-4o-mini-realtime-preview", ), vad=silero.VAD.load(), ) await avatar.start(session, room=ctx.room) await session.start( agent=Agent( instructions="You are a helpful assistant. Respond concisely." ), room=ctx.room, room_output_options=RoomOutputOptions(audio_enabled=False), ) if __name__ == "__main__": cli.run_app(WorkerOptions( entrypoint_fnc=entrypoint, worker_type=WorkerType.ROOM, job_memory_warn_mb=2000, num_idle_processes=1, initialize_process_timeout=180, )) ``` --- ## Flutter App ### LiveKit Configuration ```dart config/livekit_config.dart import 'dart:convert'; import 'dart:math'; import 'package:http/http.dart' as http; class LiveKitConfig { static const String serverUrl = 'wss://your-project.livekit.cloud'; static const String? tokenEndpoint = 'http://localhost:3000/token'; static String get roomName { const chars = 'abcdefghijklmnopqrstuvwxyz0123456789'; final random = Random(); return 'room-${String.fromCharCodes( Iterable.generate(12, (_) => chars.codeUnitAt(random.nextInt(chars.length))) )}'; } static String get participantName { const chars = 'abcdefghijklmnopqrstuvwxyz0123456789'; final random = Random(); return 'user-${String.fromCharCodes( Iterable.generate(8, (_) => chars.codeUnitAt(random.nextInt(chars.length))) )}'; } static Future getToken() async { final response = await http.post( Uri.parse(tokenEndpoint!), headers: {'Content-Type': 'application/json'}, body: jsonEncode({ 'room': roomName, 'participant': participantName, }), ); if (response.statusCode == 200) { return jsonDecode(response.body)['token'] as String; } throw Exception('Token server returned ${response.statusCode}'); } } ``` ### Main App ```dart main.dart import 'package:flutter/material.dart'; import 'package:livekit_client/livekit_client.dart' as lk; import 'package:livekit_components/livekit_components.dart'; import 'config/livekit_config.dart'; void main() => runApp(const BitHumanFlutterApp()); class BitHumanFlutterApp extends StatelessWidget { const BitHumanFlutterApp({super.key}); @override Widget build(BuildContext context) { return MaterialApp( title: 'bitHuman Flutter Integration', theme: LiveKitTheme().buildThemeData(context), themeMode: ThemeMode.dark, home: const ConnectionScreen(), ); } } class ConnectionScreen extends StatefulWidget { const ConnectionScreen({super.key}); @override State createState() => _ConnectionScreenState(); } class _ConnectionScreenState extends State { bool _isConnecting = false; @override void initState() { super.initState(); WidgetsBinding.instance.addPostFrameCallback((_) => _connect()); } Future _connect() async { setState(() => _isConnecting = true); final token = await LiveKitConfig.getToken(); if (!mounted) return; Navigator.of(context).pushReplacement( MaterialPageRoute( builder: (_) => VideoRoomScreen( url: LiveKitConfig.serverUrl, token: token, roomName: LiveKitConfig.roomName, ), ), ); } @override Widget build(BuildContext context) { return Scaffold( backgroundColor: const Color(0xFF1a1a1a), body: Center( child: Column( mainAxisAlignment: MainAxisAlignment.center, children: [ const CircularProgressIndicator(), const SizedBox(height: 20), Text( _isConnecting ? 'Connecting...' : 'Failed', style: const TextStyle(color: Colors.white70, fontSize: 18), ), ], ), ), ); } } class VideoRoomScreen extends StatelessWidget { final String url, token, roomName; const VideoRoomScreen({ super.key, required this.url, required this.token, required this.roomName, }); @override Widget build(BuildContext context) { return LivekitRoom( roomContext: RoomContext( url: url, token: token, connect: true, roomOptions: lk.RoomOptions(adaptiveStream: true, dynacast: true), ), builder: (context, roomCtx) { return Scaffold( appBar: AppBar(title: Text('Room: $roomName')), backgroundColor: const Color(0xFF1a1a1a), body: const Center(child: Text('AI Avatar Video Here')), ); }, ); } } ``` --- ## Platform-Specific Setup ### iOS (`ios/Runner/Info.plist`) ```xml NSCameraUsageDescription Camera access for video calls with AI avatar NSMicrophoneUsageDescription Microphone access for voice interaction with AI avatar ``` ### Android (`AndroidManifest.xml`) ```xml ``` --- ## Deployment ```bash iOS flutter build ios --release ``` ```bash Android flutter build apk --release ``` ```bash Web flutter build web --release ``` --- ## Troubleshooting | Problem | Solution | |---------|----------| | Avatar session failed | Check bitHuman API secret and avatar ID | | Connection failed | Verify LiveKit server URL, ensure backend is running | | No camera found | Check device permissions | | Avatar not showing | Check backend logs, verify API key | | Shader compilation errors | Run `flutter clean && flutter pub get` | --- ## Resources - [Flutter Documentation](https://docs.flutter.dev) - [LiveKit Flutter SDK](https://pub.dev/packages/livekit_client) - [LiveKit Agents Documentation](https://docs.livekit.io/agents) --- # Changelog ## Changelog URL: https://docs.bithuman.ai/changelog ## February 2026 ### Expression Avatar v2 — Turbo VAE Decoder - 2.5x faster VAE decode (32ms → 13ms) with distilled Turbo-VAED decoder - Total pipeline: 103ms → 79ms per chunk (24% faster) - Throughput: 233 → 305 FPS on H100 - Per-session TRT contexts eliminate concurrent session artifacts ### Self-Hosted GPU Container - Published `sgubithuman/expression-avatar:latest` Docker image - Supports up to 8 concurrent sessions per GPU - Cold start ~50s, warm start 4-6s - ~5 GB auto-downloaded model weights (cached in Docker volume) ### Developer Examples Overhaul - Fixed Docker Compose env_file handling across all 4 example stacks - Standardized `.env.example` files with section headers and inline help - Expanded READMEs with architecture diagrams, config tables, verification steps - Added `api/test.py` for zero-friction API credential validation - Added `AGENTS.md` for AI coding agent discoverability - Added `llms.txt` and `llms-full.txt` for AI documentation indexing - Published OpenAPI specification ### REST API - `POST /v1/agent/{code}/speak` — make avatar speak text in active sessions - `POST /v1/agent/{code}/add-context` — inject silent background knowledge - Improved error responses with consistent error codes and messages ### SDK & Plugin - `livekit-plugins-bithuman` — Expression model support with `model="expression"` - `bithuman.AvatarSession` — unified interface for cloud, CPU, and GPU modes - Animal mode support for Essence avatars --- ## January 2026 ### Essence Avatar - CPU-only avatar rendering from `.imx` model files - 25 FPS real-time on 4+ core machines - Cross-platform: Linux, macOS (M2+), Windows (WSL) ### Platform API - Agent generation from text prompts + image/video/audio - Agent management (CRUD operations) - File upload (URL and base64) - Dynamics/gesture generation and triggering ### Integrations - LiveKit Cloud Plugin - Website embed (iframe with JWT) - Webhooks (room.join, chat.push events) - Flutter full-stack example --- For feature requests and bug reports, visit our [GitHub](https://github.com/bithuman-product/examples/issues) or [Discord](https://discord.gg/ES953n7bPA). --- # API Reference ## bitHuman REST API Reference URL: https://docs.bithuman.ai/api-reference/overview The bitHuman API lets you programmatically create, manage, and interact with avatar agents. ## Base URL ``` https://api.bithuman.ai ``` ## Authentication All requests require the `api-secret` header. Get your API secret from [www.bithuman.ai](https://www.bithuman.ai/#developer). ```http api-secret: YOUR_API_SECRET ``` ## Agent Identifiers All endpoints use the **agent code** (e.g. `A91XMB7113`) to identify agents. This is the same value across all endpoints, referred to as `{code}`, `{agent_code}`, or `{agent_id}` depending on the endpoint. You receive this code when you [generate an agent](/api-reference/agent-generation) or find it in the [bitHuman dashboard](https://www.bithuman.ai). ## Available APIs Create new avatar agents from prompts, images, or video Validate credentials, retrieve agent details, update prompts Send real-time messages and inject context into live sessions Upload images, audio, video, and documents Generate and manage avatar movements and gestures ## Common Error Format All errors follow the same structure: ```json { "error": { "code": "ERROR_CODE", "message": "Human-readable error description", "httpStatus": 401 }, "status": "error", "status_code": 401 } ``` See the [Error Reference](/api-reference/errors) for all error codes. | HTTP Status | Meaning | |-------------|---------| | `200` | Success | | `400` | Invalid request parameters | | `401` | Invalid or missing `api-secret` | | `402` | Insufficient credits | | `404` | Resource not found | | `413` | Payload too large | | `415` | Unsupported media type | | `422` | Validation error | | `429` | Rate limit exceeded | | `500` | Internal server error | | `503` | Service unavailable (workers busy) | ## Agent Generation API: Create Avatar Agents URL: https://docs.bithuman.ai/api-reference/agent-generation ## Generate Agent ``` POST /v1/agent/generate ``` Creates a new avatar agent. Generation is asynchronous — poll the status endpoint for completion. **Headers** | Header | Value | |--------|-------| | `Content-Type` | `application/json` | | `api-secret` | Your API secret | **Request Body** | Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | `prompt` | string | No | Random | System prompt for the agent | | `image` | string | No | — | Image URL or base64 data for the agent's appearance | | `video` | string | No | — | Video URL or base64 data | | `audio` | string | No | — | Audio URL or base64 data for the agent's voice | | `aspect_ratio` | string | No | `16:9` | Aspect ratio for image generation (`16:9`, `9:16`, `1:1`) | | `video_aspect_ratio` | string | No | `16:9` | Aspect ratio for video generation (`16:9`, `9:16`, `1:1`) | | `agent_id` | string | No | Auto-generated | Custom agent identifier | | `duration` | number | No | `10` | Video duration in seconds | **Response** ```json { "success": true, "message": "Agent generation started", "agent_id": "A91XMB7113", "status": "processing" } ``` **Example** ```python import requests response = requests.post( "https://api.bithuman.ai/v1/agent/generate", headers={ "Content-Type": "application/json", "api-secret": "YOUR_API_SECRET" }, json={ "prompt": "You are a professional video content creator.", "image": "https://example.com/avatar.jpg" } ) print(response.json()) ``` --- ## Get Agent Status ``` GET /v1/agent/status/{agent_id} ``` Returns the current status of an agent generation request. **Status Values** | Status | Description | |--------|-------------| | `processing` | Agent is being generated (initial state) | | `generating` | Active generation in progress (sub-steps running) | | `completed` | All generation steps finished (transitional, becomes `ready`) | | `ready` | Generation completed successfully — model available for use | | `failed` | Generation failed — check `error_message` for details | For polling, check for `ready` or `failed` as terminal states. The `generating` and `completed` states are intermediate — keep polling. **Response** ```json { "success": true, "data": { "agent_id": "A91XMB7113", "event_type": "lip_created", "status": "ready", "error_message": null, "created_at": "2025-08-01T13:58:51.907177+00:00", "updated_at": "2025-08-01T09:59:15.159901+00:00", "system_prompt": "You are a professional video content creator.", "image_url": "https://...", "video_url": "https://...", "name": "agent name", "model_url": "https://..." } } ``` **Example** ```python import requests response = requests.get( "https://api.bithuman.ai/v1/agent/status/A91XMB7113", headers={"api-secret": "YOUR_API_SECRET"} ) print(response.json()) ``` --- ## Complete Example: Generate and Poll ```python import requests import time API_SECRET = "YOUR_API_SECRET" BASE = "https://api.bithuman.ai" headers = {"Content-Type": "application/json", "api-secret": API_SECRET} # Generate agent resp = requests.post(f"{BASE}/v1/agent/generate", headers=headers, json={ "prompt": "You are a friendly AI assistant." }) agent_id = resp.json()["agent_id"] print(f"Agent created: {agent_id}") # Poll until ready while True: status = requests.get( f"{BASE}/v1/agent/status/{agent_id}", headers={"api-secret": API_SECRET} ).json() if status["data"]["status"] == "ready": print(f"Agent ready: {status['data']['model_url']}") break elif status["data"]["status"] == "failed": print(f"Generation failed: {status['data']['error_message']}") break time.sleep(5) ``` ## Error Codes | HTTP Status | Meaning | |-------------|---------| | `200` | Success | | `400` | Invalid request parameters | | `401` | Invalid or missing `api-secret` | | `429` | Rate limit exceeded | | `500` | Internal server error | ## Agent Management API: Validate, Get & Update Agents URL: https://docs.bithuman.ai/api-reference/agent-management ## Validate API Secret ``` POST /v1/validate ``` Verify that your API secret is valid before making other API calls. ```python Python import requests response = requests.post( "https://api.bithuman.ai/v1/validate", headers={"api-secret": "YOUR_API_SECRET"} ) result = response.json() if result["valid"]: print("API secret is valid.") else: print("Invalid API secret.") ``` ```javascript JavaScript const response = await fetch('https://api.bithuman.ai/v1/validate', { method: 'POST', headers: { 'api-secret': 'YOUR_API_SECRET' } }); const result = await response.json(); console.log('Valid:', result.valid); ``` **Response** ```json { "valid": true } ``` --- ## Get Agent Info ``` GET /v1/agent/{code} ``` Retrieve detailed information about an agent by its code identifier. **Path Parameters** | Parameter | Type | Description | |-----------|------|-------------| | `code` | string | The agent code identifier (e.g., `A12345678`) | **Response** ```json { "success": true, "data": { "agent_id": "A91XMB7113", "event_type": "lip_created", "status": "ready", "error_message": null, "created_at": "2025-08-01T13:58:51.907177+00:00", "updated_at": "2025-08-01T09:59:15.159901+00:00", "system_prompt": "You are a friendly AI assistant", "image_url": "https://storage.supabase.co/image.jpg", "video_url": "https://storage.supabase.co/video.mp4", "name": "My Agent", "model_url": "https://storage.supabase.co/model.imx" } } ``` ```python Python import requests code = "A91XMB7113" response = requests.get( f"https://api.bithuman.ai/v1/agent/{code}", headers={"api-secret": "YOUR_API_SECRET"} ) data = response.json() if data["success"]: agent = data["data"] print(f"Agent: {agent['name']}") print(f"Status: {agent['status']}") ``` ```javascript JavaScript const code = 'A91XMB7113'; const response = await fetch(`https://api.bithuman.ai/v1/agent/${code}`, { headers: { 'api-secret': 'YOUR_API_SECRET' } }); const data = await response.json(); if (data.success) { console.log('Agent:', data.data.name); } ``` This endpoint uses the agent **code** (e.g., `A91XMB7113`), which is the same as the agent ID used across the platform. For checking generation progress, you can also use [`GET /v1/agent/status/{agent_id}`](/api-reference/agent-generation). --- ## Update Agent Prompt ``` POST /v1/agent/{code} ``` Update the system prompt of an existing agent without regenerating it. **Request Body** ```json { "system_prompt": "You are a helpful customer service agent who speaks Spanish" } ``` **Response** ```json { "agent_code": "A91XMB7113", "updated": true } ``` ```python Python import requests code = "A91XMB7113" response = requests.post( f"https://api.bithuman.ai/v1/agent/{code}", headers={ "Content-Type": "application/json", "api-secret": "YOUR_API_SECRET" }, json={ "system_prompt": "You are a professional sales assistant." } ) print(response.json()) ``` ```javascript JavaScript const code = 'A91XMB7113'; const response = await fetch(`https://api.bithuman.ai/v1/agent/${code}`, { method: 'POST', headers: { 'Content-Type': 'application/json', 'api-secret': 'YOUR_API_SECRET' }, body: JSON.stringify({ system_prompt: 'You are a professional sales assistant.' }) }); const result = await response.json(); console.log('Update result:', result); ``` --- ## Complete Example ```python import requests import time headers = { "Content-Type": "application/json", "api-secret": "YOUR_API_SECRET" } # Step 1: Create agent response = requests.post( "https://api.bithuman.ai/v1/agent/generate", headers=headers, json={"prompt": "You are a friendly greeter."} ) agent_id = response.json()["agent_id"] # Step 2: Wait for agent to be ready while True: status = requests.get( f"https://api.bithuman.ai/v1/agent/status/{agent_id}", headers={"api-secret": "YOUR_API_SECRET"} ).json() if status["data"]["status"] == "ready": break time.sleep(5) # Step 3: Get agent info info = requests.get( f"https://api.bithuman.ai/v1/agent/{agent_id}", headers={"api-secret": "YOUR_API_SECRET"} ).json() print(f"Current prompt: {info['data']['system_prompt']}") # Step 4: Update the prompt update = requests.post( f"https://api.bithuman.ai/v1/agent/{agent_id}", headers=headers, json={"system_prompt": "You are now a technical support specialist."} ).json() print(f"Prompt updated: {update}") ``` ## Error Codes | Code | Description | |------|-------------| | `UNAUTHORIZED` | Invalid or missing API secret | | `MISSING_PARAM` | Required parameter not provided | | `AGENT_NOT_FOUND` | No agent found with the given code | | `VALIDATION_ERROR` | Invalid request body format | ## Agent Context API: Speak & Inject Knowledge URL: https://docs.bithuman.ai/api-reference/agent-context Send real-time messages to agents deployed on the [www.bithuman.ai](https://www.bithuman.ai) platform. Make agents speak proactively or inject background knowledge to improve their responses. This API is for agents created on the bitHuman platform, not for local SDK agents. ## Make Agent Speak ``` POST /v1/agent/{agent_code}/speak ``` Triggers the agent to speak a message to users in the session. **Request Body** | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `message` | string | Yes | Text the agent will speak | | `room_id` | string | No | Target a specific room. If omitted, delivers to all active rooms. | **Response** ```json { "agent_code": "A12345678", "context_type": "speak", "delivered_to_rooms": 1 } ``` **Example** ```python import requests response = requests.post( "https://api.bithuman.ai/v1/agent/A12345678/speak", headers={ "Content-Type": "application/json", "api-secret": "YOUR_API_SECRET" }, json={ "message": "We have a 20% discount available today.", "room_id": "customer_session_1" } ) print(response.json()) ``` --- ## Add Context ``` POST /v1/agent/{agent_code}/add-context ``` Adds background knowledge the agent will use to inform future responses. Can also trigger speech by setting `type` to `speak`. **Request Body** | Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | `context` | string | Yes | — | Context text or message to speak | | `type` | string | No | `add_context` | `add_context` to inject knowledge silently, `speak` to trigger speech | | `room_id` | string | No | — | Target a specific room. If omitted, delivers to all active rooms. | **Response** ```json { "agent_code": "A12345678", "context_type": "add_context", "delivered_to_rooms": 1 } ``` ### Adding background context ```python import requests response = requests.post( "https://api.bithuman.ai/v1/agent/A12345678/add-context", headers={ "Content-Type": "application/json", "api-secret": "YOUR_API_SECRET" }, json={ "context": "Customer has VIP status. Preferred name: Alex. Account since 2021.", "type": "add_context", "room_id": "vip_session_42" } ) ``` ### Triggering speech via add-context ```python response = requests.post( "https://api.bithuman.ai/v1/agent/A12345678/add-context", headers={ "Content-Type": "application/json", "api-secret": "YOUR_API_SECRET" }, json={ "context": "Your issue has been resolved. Let me know if you need anything else.", "type": "speak", "room_id": "support_session_1" } ) ``` ## Error Codes | HTTP Status | Error Code | Description | |-------------|------------|-------------| | `401` | `UNAUTHORIZED` | Invalid or missing `api-secret` | | `404` | `AGENT_NOT_FOUND` | No agent with the given code exists | | `404` | `NO_ACTIVE_ROOMS` | Agent has no active sessions | | `422` | `VALIDATION_ERROR` | Invalid request body (e.g., bad `type` value) | ## File Upload API: Images, Video & Audio URL: https://docs.bithuman.ai/api-reference/file-upload Upload files to the system for processing. Supports both URL downloads and direct file uploads. ## Upload File ``` POST /v1/files/upload ``` Files are automatically organized by type: | Category | Storage Path | Extensions | |----------|-------------|------------| | **Images** | `assets/image/` | `.jpg`, `.jpeg`, `.png`, `.gif`, `.webp`, `.bmp`, `.svg` | | **Videos** | `assets/video/` | `.mp4`, `.avi`, `.mov`, `.wmv`, `.flv`, `.webm`, `.mkv` | | **Audio** | `assets/audio/` | `.mp3`, `.wav`, `.flac`, `.aac`, `.ogg`, `.m4a` | | **Documents** | `assets/docs/` | `.pdf`, `.doc`, `.docx`, `.txt`, `.ppt`, `.pptx`, `.xls`, `.xlsx`, `.csv` | --- ### Method 1: URL Upload Download a file from a URL. ```json { "file_url": "https://example.com/document.pdf", "file_type": "auto" } ``` | Parameter | Type | Description | |-----------|------|-------------| | `file_url` | string | URL of the file to download | | `file_type` | string | Type of file (`pdf`, `image`, `audio`, `video`, `auto`) | ```python import requests response = requests.post( "https://api.bithuman.ai/v1/files/upload", headers={ "Content-Type": "application/json", "api-secret": "YOUR_API_SECRET" }, json={ "file_url": "https://example.com/presentation.pdf", "file_type": "auto" } ) print(response.json()) ``` ### Method 2: Direct Upload Upload base64-encoded file data directly. ```json { "file_data": "JVBERi0xLjQKJcOkw7zDtsO...", "file_name": "document.pdf", "file_type": "auto" } ``` | Parameter | Type | Description | |-----------|------|-------------| | `file_data` | string | Base64 encoded file data | | `file_name` | string | Original filename | | `file_type` | string | Type of file (`pdf`, `image`, `audio`, `video`, `auto`) | ```python import requests import base64 with open("document.pdf", "rb") as f: file_data = base64.b64encode(f.read()).decode('utf-8') response = requests.post( "https://api.bithuman.ai/v1/files/upload", headers={ "Content-Type": "application/json", "api-secret": "YOUR_API_SECRET" }, json={ "file_data": file_data, "file_name": "document.pdf", "file_type": "auto" } ) print(response.json()) ``` --- ### Response (both methods) ```json { "success": true, "message": "File uploaded successfully", "data": { "file_url": "https://storage.supabase.co/assets/docs/20250115_103000_abc12345.pdf", "original_source": "https://example.com/document.pdf", "file_type": "auto", "file_size": 1024000, "mime_type": "application/pdf", "asset_category": "docs", "uploaded_at": "2025-01-15T10:30:00Z" } } ``` --- ## Size Limits | Category | Max Size | |----------|----------| | **Images** | 10 MB | | **Videos** | 100 MB | | **Audio** | 50 MB | | **Documents** | 10 MB | Exceeding these limits returns HTTP `413`. ## Upload Methods Comparison | Method | Best For | Pros | Cons | |--------|----------|------|------| | **URL Upload** | External files, cloud storage | No file size limits, efficient | Requires accessible URL | | **Direct Upload** | Local files, form uploads | Works with any file source | Limited by request size | ## Complete Examples ### Batch Upload ```python import requests import base64 from pathlib import Path def batch_upload_files(directory_path): results = [] for file_path in Path(directory_path).iterdir(): if file_path.is_file(): with open(file_path, "rb") as f: file_data = base64.b64encode(f.read()).decode('utf-8') response = requests.post( "https://api.bithuman.ai/v1/files/upload", headers={ "Content-Type": "application/json", "api-secret": "YOUR_API_SECRET" }, json={ "file_data": file_data, "file_name": file_path.name, "file_type": "auto" } ) results.append({ "filename": file_path.name, "status": "success" if response.status_code == 200 else "error" }) return results results = batch_upload_files("./documents") for r in results: print(f"{r['filename']}: {r['status']}") ``` ## Error Codes | HTTP Status | Meaning | |-------------|---------| | `200` | Success | | `400` | Bad request (invalid parameters) | | `401` | Unauthorized (invalid API secret) | | `413` | File too large | | `415` | Unsupported file type | | `500` | Internal server error | ## Dynamics API: Gestures & Animations URL: https://docs.bithuman.ai/api-reference/dynamics ## Generate Dynamics ``` POST /v1/dynamics/generate ``` Generate dynamic movements and animations for an agent. Returns immediately with a "processing" status — use the GET endpoint to check completion. **Request Body** | Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | `agent_id` | string | Yes | — | Agent ID to generate dynamics for | | `image_url` | string | No | (from agent) | Agent image URL (fetched from agent data if not provided) | | `duration` | number | No | `5` | Duration of each motion in seconds | | `model` | string | No | `seedance` | Model to use (`seedance`, `kling`) | **Response** ```json { "success": true, "message": "Dynamics generation started", "agent_id": "A91XMB7113", "status": "processing" } ``` ```python import requests response = requests.post( "https://api.bithuman.ai/v1/dynamics/generate", headers={ "Content-Type": "application/json", "api-secret": "YOUR_API_SECRET" }, json={ "agent_id": "A91XMB7113", "duration": 5, "model": "seedance" } ) print(response.json()) ``` --- ## Get Dynamics ``` GET /v1/dynamics/{agent_id} ``` Retrieve the current dynamics configuration and available gestures for an agent. **Response (dynamics generated)** ```json { "success": true, "data": { "url": "https://storage.supabase.co/dynamics-model.imx", "status": "ready", "agent_id": "A91XMB7113", "gestures": { "mini_wave_hello": "https://storage.supabase.co/mini_wave_hello.mp4", "talk_head_nod_subtle": "https://storage.supabase.co/talk_head_nod_subtle.mp4", "blow_kiss_heart": "https://storage.supabase.co/blow_kiss_heart.mp4" } } } ``` **Response (not yet generated)** ```json { "success": true, "data": { "url": null, "status": "ready", "agent_id": "A91XMB7113", "gestures": {} } } ``` **Response Fields** | Field | Type | Description | |-------|------|-------------| | `url` | string \| null | URL to the dynamics model file, or null if not generated | | `status` | string | `generating` while in progress, `ready` when complete | | `agent_id` | string | The agent ID | | `gestures` | object | Map of gesture action names to video URLs (e.g. `mini_wave_hello`, `talk_head_nod_subtle`) | Gesture names like `mini_wave_hello` and `talk_head_nod_subtle` are the action identifiers you pass to `VideoControl(action=...)` or the RPC `trigger_dynamics` method. See [Avatar Sessions](/deployment/avatar-sessions#adding-gestures-dynamics) for integration examples. ```python agent_id = "A91XMB7113" response = requests.get( f"https://api.bithuman.ai/v1/dynamics/{agent_id}", headers={"api-secret": "YOUR_API_SECRET"} ) print(response.json()) ``` --- ## Update Dynamics ``` PUT /v1/dynamics/{agent_id} ``` Update dynamics configuration for an agent. After a successful update, movements regeneration is automatically triggered in the background. **Request Body** | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `dynamics` | object | Yes | Dynamics configuration to merge with existing data | | `dynamics.enabled` | boolean | No | Enable or disable dynamics for this agent | | `dynamics.batch_results` | object | No | Map of gesture names to video generation results | | `dynamics.result` | object | No | Result model path and hash (set when dynamics generation completes) | | `dynamics.talking` | object | No | Default talking model path and hash (used when dynamics are disabled) | | `toggle_enabled` | boolean | No | `true` to switch to dynamics model, `false` to restore default talking model | **Example: Enable dynamics after generation** ```json { "dynamics": { "enabled": true }, "toggle_enabled": true } ``` **Response (with regeneration)** ```json { "success": true, "message": "Dynamics updated successfully and movements regeneration started", "agent_id": "A91XMB7113", "regeneration_status": "started" } ``` **Response (regeneration failed to start)** ```json { "success": true, "message": "Dynamics updated successfully, but movements regeneration failed to start", "agent_id": "A91XMB7113", "regeneration_status": "failed", "regeneration_error": "Connection refused" } ``` --- ## Gesture Names When dynamics are generated, the available gestures use descriptive action names: | Gesture Action | Category | Typical Use | |----------------|----------|-------------| | `mini_wave_hello` | wave | Greeting | | `talk_head_nod_subtle` | nod | Agreement, acknowledgment | | `blow_kiss_heart` | expression | Playful reaction | | `laugh_react` | expression | Humor response | | `idle_subtle` | idle | Background movement | The exact gesture names depend on what was generated. Use `GET /v1/dynamics/{agent_id}` to discover available gestures for each agent. ## Configuration Options **Duration Settings:** - `1-3 seconds`: Quick gestures (waves, nods) - `3-5 seconds`: Standard motions (default) - `5-10 seconds`: Extended animations **Model Options:** - `seedance`: High-quality motion generation (default) - `kling`: Alternative motion model --- ## Integration Example ```python import requests import time headers = {"Content-Type": "application/json", "api-secret": "YOUR_API_SECRET"} # Step 1: Create an agent resp = requests.post( "https://api.bithuman.ai/v1/agent/generate", headers=headers, json={"prompt": "You are a friendly customer service representative."} ) agent_id = resp.json()["agent_id"] # Step 2: Wait for agent to be ready while True: status = requests.get( f"https://api.bithuman.ai/v1/agent/status/{agent_id}", headers={"api-secret": "YOUR_API_SECRET"} ).json() if status["data"]["status"] in ("ready", "failed"): break time.sleep(5) # Step 3: Generate dynamics resp = requests.post( "https://api.bithuman.ai/v1/dynamics/generate", headers=headers, json={"agent_id": agent_id, "duration": 5, "model": "seedance"} ) print("Dynamics generation started:", resp.json()) # Step 4: Check available gestures time.sleep(30) # Wait for generation resp = requests.get( f"https://api.bithuman.ai/v1/dynamics/{agent_id}", headers={"api-secret": "YOUR_API_SECRET"} ) gestures = resp.json()["data"].get("gestures", {}) print(f"Available gestures: {list(gestures.keys())}") ``` ## Error Codes | HTTP Status | Meaning | |-------------|---------| | `200` | Success | | `400` | Invalid parameters | | `401` | Unauthorized | | `402` | Insufficient credits | | `404` | Agent not found | | `500` | Internal server error | ## Rate Limits & Quotas URL: https://docs.bithuman.ai/api-reference/rate-limits ## Request Limits API endpoints are rate-limited to protect service quality. Limits are applied per API secret. | Tier | Concurrent Sessions | Agent Generations/day | |------|---------------------|----------------------| | **Free** | 2 | 5 | | **Pro** | 10 | 50 | | **Enterprise** | Custom | Custom | Check your current tier and usage at [www.bithuman.ai](https://www.bithuman.ai) > Developer section. ## Handling Errors If you exceed limits or run out of credits, the API returns an error: ```json { "error": { "code": "INSUFFICIENT_BALANCE", "message": "Insufficient credits", "httpStatus": 402 }, "status": "error", "status_code": 402 } ``` Common status codes: `402` (no credits), `429` (rate limited), `503` (workers busy). ### Recommended Retry Strategy Use exponential backoff with jitter: ```python import time import random import requests def api_request_with_retry(url, headers, max_retries=3): for attempt in range(max_retries): resp = requests.post(url, headers=headers) if resp.status_code not in (429, 503): return resp # Exponential backoff with jitter wait = (2 ** attempt) + random.uniform(0, 1) time.sleep(wait) return resp # Return last response if all retries exhausted ``` ## Concurrency Limits Avatar sessions have per-account concurrency limits: | Resource | Limit | Notes | |----------|-------|-------| | **Cloud avatar sessions** | Based on tier | Active WebRTC sessions | | **Agent generation** | 3 concurrent | Queued if exceeded | | **Dynamics generation** | 2 concurrent | Queued if exceeded | ## Endpoint Guidelines | Endpoint | Guidance | Notes | |----------|----------|-------| | `POST /v1/validate` | Lightweight | Use for health checks | | `POST /v1/agent/generate` | Heavy | Triggers GPU pipeline, ~2-5 min | | `GET /v1/agent/status/*` | Poll at 5s intervals | Avoid sub-second polling | | `POST /v1/agent/*/speak` | Per active session | Agent must be in a room | | `POST /v1/files/upload` | 10 MB image, 100 MB video | Size limits enforced | | `POST /v1/dynamics/generate` | Heavy | Triggers video generation | ## Best Practices Instead of polling `/v1/agent/status/{id}` in a loop, configure [webhooks](/integrations/webhooks) to get notified when generation completes. Agent data rarely changes. Cache `GET /v1/agent/{code}` responses locally and refresh only when needed. Keep avatar sessions alive between conversations instead of creating new ones. Session creation is the most expensive operation. Use `POST /v1/validate` to verify your account is active before starting agent generation or dynamics creation. ## Need Higher Limits? Contact us via [Discord](https://discord.gg/ES953n7bPA) or email for enterprise tier pricing with custom limits. ## Error Reference URL: https://docs.bithuman.ai/api-reference/errors ## Error Response Format All error responses follow a consistent format: ```json { "error": { "code": "ERROR_CODE", "message": "Human-readable description of what went wrong.", "httpStatus": 401 }, "status": "error", "status_code": 401 } ``` ## HTTP Status Codes | Status | Meaning | Common Cause | |--------|---------|-------------| | `200` | Success | Request completed | | `400` | Bad Request | Invalid parameters or missing required fields | | `401` | Unauthorized | Invalid or missing `api-secret` header | | `404` | Not Found | Agent, resource, or endpoint doesn't exist | | `413` | Payload Too Large | File exceeds size limit | | `415` | Unsupported Media Type | File type not supported | | `422` | Validation Error | Parameters are present but invalid | | `429` | Rate Limited | Too many requests — see [Rate Limits](/api-reference/rate-limits) | | `500` | Internal Error | Server-side error — retry or contact support | | `503` | Service Unavailable | All workers busy — retry with backoff | ## Error Codes ### Authentication | Code | HTTP | Message | Resolution | |------|------|---------|------------| | `UNAUTHORIZED` | 401 | Invalid API secret | Check your `api-secret` header value. Get a valid secret from [Developer Dashboard](https://www.bithuman.ai/#developer). | | `MISSING_AUTH` | 401 | Missing api-secret header | Add `api-secret` header to your request. | | `ACCOUNT_SUSPENDED` | 401 | Account suspended | Contact support via [Discord](https://discord.gg/ES953n7bPA). | | `INSUFFICIENT_BALANCE` | 402 | Insufficient credits | Top up credits at [www.bithuman.ai](https://www.bithuman.ai). | ### Agent Operations | Code | HTTP | Message | Resolution | |------|------|---------|------------| | `AGENT_NOT_FOUND` | 404 | Agent not found | Check the agent code. Use `POST /v1/validate` to verify your API secret has access. | | `AGENT_PROCESSING` | 409 | Agent is still generating | Wait for generation to complete. Poll `/v1/agent/status/{id}`. | | `AGENT_FAILED` | 400 | Agent generation failed | Check generation logs. Retry with different parameters. | | `VALIDATION_ERROR` | 422 | prompt is required | Include all required fields. See endpoint documentation. | | `NO_ACTIVE_ROOMS` | 404 | No active rooms for agent | The agent must be in an active LiveKit session for `/speak` and `/add-context`. | ### File Operations | Code | HTTP | Message | Resolution | |------|------|---------|------------| | `FILE_TOO_LARGE` | 413 | File exceeds size limit | Images: 10 MB max. Videos: 100 MB max. Audio: 50 MB max. | | `UNSUPPORTED_TYPE` | 415 | Unsupported file type | Supported: JPEG, PNG, WebP, MP4, WAV, MP3, OGG. | | `DOWNLOAD_FAILED` | 400 | Could not download URL | Ensure the URL is publicly accessible and returns a valid file. | ### Dynamics | Code | HTTP | Message | Resolution | |------|------|---------|------------| | `DYNAMICS_NOT_FOUND` | 404 | No dynamics for agent | Generate dynamics first with `POST /v1/dynamics/generate`. | | `DYNAMICS_PROCESSING` | 409 | Dynamics still generating | Wait for generation to complete. | ### Session & Infrastructure | Code | HTTP | Message | Resolution | |------|------|---------|------------| | `RATE_LIMITED` | 429 | Rate limit exceeded | Back off and retry. See [Rate Limits](/api-reference/rate-limits). | | `NO_AVAILABLE_WORKERS` | 503 | All workers busy | Retry with exponential backoff (up to 5 times). | | `SESSION_LIMIT` | 429 | Concurrent session limit reached | Wait for an existing session to end, or upgrade your tier. | | `INTERNAL_ERROR` | 500 | Internal server error | Retry once. If persistent, report via [Discord](https://discord.gg/ES953n7bPA). | ## Handling Errors in Python ```python import requests resp = requests.post( "https://api.bithuman.ai/v1/agent/generate", headers={"api-secret": api_secret, "Content-Type": "application/json"}, json={"prompt": "You are a helpful assistant"}, ) if resp.status_code == 200: result = resp.json() print(f"Agent {result['agent_id']} is generating...") elif resp.status_code == 401: print("Invalid API secret. Check BITHUMAN_API_SECRET.") elif resp.status_code == 429: print("Rate limited. Wait a moment and retry.") elif resp.status_code == 503: print("Workers busy. Retry in a few seconds.") else: error = resp.json().get("error", {}) print(f"Error {error.get('code')}: {error.get('message')}") ``` ## GPU Container Errors The self-hosted expression-avatar container returns its own error responses: | Endpoint | Error | Resolution | |----------|-------|------------| | `GET /health` | Connection refused | Container not started or still initializing | | `GET /ready` | `503 Not Ready` | Model still loading (~50s cold start) or all session slots full | | `POST /launch` | `401 Unauthorized` | Invalid `BITHUMAN_API_SECRET` in container env | | `POST /launch` | `400 No face detected` | Image has no detectable face. Use a clear front-facing photo. | | `POST /launch` | `503 No capacity` | All session slots in use. Wait or add more containers. | For GPU container troubleshooting, see [Self-Hosted GPU](/deployment/self-hosted-gpu#troubleshooting). --- # Examples ## bitHuman Code Examples: Audio, Microphone, AI Chat & More URL: https://docs.bithuman.ai/examples/overview --- ## Platform API Programmatic agent management -- no SDK or local runtime needed. | Example | What It Does | Source | |---------|-------------|--------| | **Agent Management** | Validate credentials, get/update agents | [api/](https://github.com/bithuman-product/examples/tree/main/api) | | **Agent Generation** | Create agents from prompt, poll status | [api/](https://github.com/bithuman-product/examples/tree/main/api) | | **Dynamics** | Generate gestures, list available gestures | [api/](https://github.com/bithuman-product/examples/tree/main/api) | ## Avatar Integration Four combinations of model type and deployment mode. | Example | Model | Deployment | Source | |---------|-------|------------|--------| | **Essence + Cloud** | Essence (CPU) | bitHuman Cloud | [essence-cloud/](https://github.com/bithuman-product/examples/tree/main/essence-cloud) | | **[Essence + Self-Hosted](/examples/audio-clip)** | Essence (CPU) | Your machine | [essence-selfhosted/](https://github.com/bithuman-product/examples/tree/main/essence-selfhosted) | | **Expression + Cloud** | Expression (GPU) | bitHuman Cloud | [expression-cloud/](https://github.com/bithuman-product/examples/tree/main/expression-cloud) | | **Expression + Self-Hosted** | Expression (GPU) | Your machine | [expression-selfhosted/](https://github.com/bithuman-product/examples/tree/main/expression-selfhosted) | ## Full-Stack & Integration Examples | Example | What It Does | Source | |---------|-------------|--------| | **[Apple Local Agent](/examples/apple-local)** | 100% offline on macOS (Siri + Ollama) | [integrations/macos-offline/](https://github.com/bithuman-product/examples/tree/main/integrations/macos-offline) | | **[Raspberry Pi](/examples/raspberry-pi)** | Edge deployment on Raspberry Pi | — | | **Web UI** | Browser-based Gradio interface | [integrations/web-ui/](https://github.com/bithuman-product/examples/tree/main/integrations/web-ui) | | **Java Client** | WebSocket streaming from Java | [integrations/java/](https://github.com/bithuman-product/examples/tree/main/integrations/java) | | **Next.js UI** | Drop-in LiveKit web interface | [integrations/nextjs-ui/](https://github.com/bithuman-product/examples/tree/main/integrations/nextjs-ui) | --- ## Prerequisites --- **New to bitHuman?** Start with [Essence + Cloud](https://github.com/bithuman-product/examples/tree/main/essence-cloud) -- the simplest setup with no models to download. ## Example: Play Audio Through a Talking Avatar (Python) URL: https://docs.bithuman.ai/examples/audio-clip A simple first example that works reliably. ## Quick Start ```bash pip install bithuman --upgrade opencv-python sounddevice ``` ```bash export BITHUMAN_API_SECRET="your_secret" export BITHUMAN_MODEL_PATH="/path/to/model.imx" export BITHUMAN_AUDIO_PATH="/path/to/audio.wav" # optional ``` ```bash python examples/avatar-with-audio-clip.py ``` [View source code on GitHub](https://github.com/bithuman-product/examples/tree/main/essence-selfhosted) - **Press `1`** — Play audio with avatar - **Press `2`** — Stop playback - **Press `q`** — Quit --- ## What It Does 1. Loads your audio file (WAV, MP3, M4A supported) 2. Creates synchronized avatar animation 3. Shows real-time video in OpenCV window 4. Plays audio through speakers with sounddevice **Key features:** - Smooth audio playback with buffering - Real-time video display at 25 FPS - Keyboard controls for interaction - Supports multiple audio formats --- ## Command Line Options ```bash # Use specific files python examples/avatar-with-audio-clip.py \ --model /path/to/model.imx \ --audio-file /path/to/audio.wav \ --api-secret your_secret # Use JWT token instead of API secret python examples/avatar-with-audio-clip.py \ --token your_jwt_token \ --model /path/to/model.imx ``` | Option | Description | |--------|-------------| | `--model` | Path to .imx model file | | `--audio-file` | Path to audio file | | `--api-secret` | Your bitHuman API secret | | `--token` | JWT token (alternative to API secret) | | `--insecure` | Disable SSL verification (dev only) | --- ## Common Issues | Problem | Solution | |---------|----------| | No audio playing | Install sounddevice: `pip install sounddevice`. Try WAV format. | | Avatar not loading | Verify `BITHUMAN_API_SECRET` and `BITHUMAN_MODEL_PATH`. | | Video choppy | Close other applications using GPU/CPU. | | Controls not working | Click on the OpenCV window to focus it. | --- ## Technical Details | Component | Specification | |-----------|--------------| | Audio sample rate | 16kHz (auto-converted) | | Audio channels | Mono (stereo auto-converted) | | Video resolution | 512x512 pixels | | Frame rate | 25 FPS | | Audio formats | WAV, MP3, M4A, FLAC | --- ## Next Steps Real-time interaction with your voice Full OpenAI voice chat with avatar ## Example: Real-Time Microphone to Avatar Lip-Sync URL: https://docs.bithuman.ai/examples/microphone Speak and see your avatar respond instantly. ## Quick Start ```bash pip install bithuman --upgrade livekit-rtc livekit-agents ``` ```bash export BITHUMAN_API_SECRET="your_secret" export BITHUMAN_MODEL_PATH="/path/to/model.imx" ``` ```bash python examples/avatar-with-microphone.py ``` [View source code on GitHub](https://github.com/bithuman-product/examples/tree/main/essence-selfhosted) - **Speak into microphone** — Avatar animates in real-time - **Stay quiet** — Avatar stops after silence timeout (3 seconds) - **Press `q`** — Quit application --- ## What It Does 1. Captures audio from your default microphone 2. Creates real-time avatar animation as you speak 3. Shows live video using LocalVideoPlayer 4. Automatically detects voice activity and silence **Key features:** - Real-time audio processing at 24kHz - Voice activity detection with configurable threshold (-40dB) - Automatic silence detection (3-second timeout) - Local audio/video processing (no web interface) --- ## Command Line Options ```bash # Adjust volume and silence detection python examples/avatar-with-microphone.py \ --volume 1.5 \ --silent-threshold-db -35 # Enable audio echo for testing python examples/avatar-with-microphone.py --echo ``` | Option | Default | Description | |--------|---------|-------------| | `--model` | env | Path to .imx model file | | `--api-secret` | env | Your bitHuman API secret | | `--volume` | 1.0 | Audio volume multiplier | | `--silent-threshold-db` | -40 | Silence threshold in dB | | `--echo` | off | Enable audio echo for testing | --- ## Advanced Usage ```bash # More sensitive (picks up quieter voices) python examples/avatar-with-microphone.py --silent-threshold-db -50 # Less sensitive (only loud voices) python examples/avatar-with-microphone.py --silent-threshold-db -30 # Boost quiet microphones python examples/avatar-with-microphone.py --volume 2.0 ``` --- ## Common Issues | Problem | Solution | |---------|----------| | No microphone input | Check microphone permissions in system settings | | Avatar not responding | Speak louder or adjust `--silent-threshold-db` to lower value | | Performance lag | Close other audio applications, use wired microphone | | Audio echo/feedback | Don't use `--echo` flag, use headphones | --- ## Technical Details | Component | Specification | |-----------|--------------| | Audio sample rate | 24kHz | | Input | Mono microphone | | Buffer | 240 samples per chunk (10ms) | | Silence detection | -40dB threshold, 3s timeout | --- ## Next Steps Full OpenAI voice chat with avatar 100% private on-device processing ## Example: AI Voice Chat with Avatar (OpenAI + LiveKit) URL: https://docs.bithuman.ai/examples/ai-conversation Complete chatbot with avatar that users can talk to on the web. ## Quick Start ```bash pip install bithuman --upgrade livekit-agents openai ``` - **bitHuman**: [www.bithuman.ai](https://www.bithuman.ai) - **OpenAI**: [openai.com](https://openai.com) - **LiveKit**: [livekit.io](https://livekit.io) (free) ```bash export BITHUMAN_API_SECRET="your_secret" export BITHUMAN_MODEL_PATH="/path/to/model.imx" export OPENAI_API_KEY="your_openai_key" export LIVEKIT_API_KEY="your_livekit_key" export LIVEKIT_API_SECRET="your_livekit_secret" export LIVEKIT_URL="wss://your-project.livekit.cloud" ``` ```bash git clone https://github.com/livekit/agents-playground.git cd agents-playground npm install && npm run dev ``` [View source code on GitHub](https://github.com/bithuman-product/examples/tree/main/essence-selfhosted) ```bash Web streaming (recommended) python examples/agent-livekit-openai.py dev ``` ```bash Command line testing python examples/agent-livekit-openai.py console ``` Go to `http://localhost:3000` and join a room to chat. --- ## What It Does 1. User speaks in browser 2. AI processes speech and responds intelligently 3. Avatar shows AI's response with dynamic movement 4. Works from any device with internet **Built with:** - **OpenAI GPT-4** for intelligent conversation - **LiveKit** for web streaming - **bitHuman** for avatar animation --- ## Run Modes | Mode | Use Case | Description | |------|----------|-------------| | `dev` | Production | Connects to LiveKit for web browsers | | `console` | Testing | Runs in terminal for debugging | --- ## Customization Change the agent's personality by editing the `instructions`: ```python agent=Agent( instructions=( "You are a helpful customer service assistant. " "Be friendly, professional, and solve problems quickly." ) ) ``` **Example personalities:** - **Tech Support**: "You are a patient tech expert who explains things simply" - **Sales Assistant**: "You are an enthusiastic product advisor" - **Teacher**: "You are an encouraging tutor who makes learning fun" --- ## Common Issues | Problem | Solution | |---------|----------| | Agent won't start | Check all API keys are set | | No audio in browser | Allow microphone permissions, try Chrome | | Can't connect | Check LiveKit URL format: `wss://your-project.livekit.cloud` | --- ## Next Steps Full privacy — speech never leaves your Mac Edge deployment on IoT devices ## Example: 100% Local Avatar on macOS (Apple Silicon) URL: https://docs.bithuman.ai/examples/apple-local Full privacy — speech never leaves your Mac. ## Quick Start - macOS 13+ (Apple Silicon recommended) - Microphone permissions ```bash pip install https://github.com/bithuman-product/examples/releases/download/v0.1/bithuman_voice-1.3.2-py3-none-any.whl ``` ```bash bithuman-voice serve --port 8091 ``` macOS will ask for Speech permissions — approve this. ```bash pip install bithuman --upgrade livekit-agents openai livekit-plugins-silero ``` ```bash export BITHUMAN_API_SECRET="your_secret" export BITHUMAN_MODEL_PATH="/path/to/model.imx" export LIVEKIT_API_KEY="your_livekit_key" export LIVEKIT_API_SECRET="your_livekit_secret" export LIVEKIT_URL="wss://your-project.livekit.cloud" export OPENAI_API_KEY="your_openai_key" # Only for AI brain ``` [View source code on GitHub](https://github.com/bithuman-product/examples/tree/main/integrations/macos-offline) ```bash Web streaming python examples/agent-livekit-apple-local.py dev ``` ```bash Command line testing python examples/agent-livekit-apple-local.py console ``` --- ## What It Does **Stays on your Mac:** - Speech-to-text (Apple Speech Framework) - Text-to-speech (Apple Voice Synthesis) - Avatar animation (bitHuman) - Voice activity detection (Silero) **Uses internet:** - Only AI conversation (OpenAI LLM) **Privacy benefits:** - Voice patterns never leave your device - Apple's hardware-accelerated speech processing - Full control over your data --- ## Make it 100% Private For 100% local operation with no internet required, use the complete Docker setup: [Complete macOS Offline Example](https://github.com/bithuman-product/examples/tree/main/integrations/macos-offline) **What you get:** - **Apple Speech Recognition** — Local STT - **Apple Voices/Siri** — Local TTS - **Ollama LLM** — Local language models (Llama 3.2) - **bitHuman Avatar** — Real-time facial animation - **LiveKit + Web UI** — Complete conversation interface - **Zero Internet Dependency** ```bash git clone https://github.com/bithuman-product/examples.git cd examples/integrations/macos-offline pip install https://github.com/bithuman-product/examples/releases/download/v0.1/bithuman_voice-1.3.2-py3-none-any.whl bithuman-voice serve --port 8000 ollama run llama3.2:1b docker compose up # Access at http://localhost:4202 ``` **Enterprise Offline Mode:** Contact bitHuman for offline tokens to eliminate all internet requirements for authentication and metering. --- ## Common Issues | Problem | Solution | |---------|----------| | Voice service won't start | Check microphone permissions, enable "Speech Recognition" in Privacy & Security | | No speech recognition | Restart `bithuman-voice` service, test with built-in dictation | | Permission errors | Run voice service from Terminal (not IDE) | --- ## Performance **Recommended specs:** - M2+ Mac (M4 ideal) - 16GB+ RAM - macOS 13+ --- ## Next Steps Edge deployment on IoT devices Simpler cloud-based setup ## Example: Avatar on Raspberry Pi (Edge / IoT / Kiosk) URL: https://docs.bithuman.ai/examples/raspberry-pi ## Quick Start - Raspberry Pi 4B (8GB RAM recommended) - microSD card (32GB+, Class 10) - USB microphone - Stable internet connection - **Separate computer** for web interface (recommended) Use **Raspberry Pi OS (64-bit)** with Raspberry Pi Imager. ```bash sudo apt update && sudo apt upgrade -y sudo apt install python3.11 python3.11-venv -y python3.11 -m venv bithuman-env source bithuman-env/bin/activate ``` ```bash pip install bithuman --upgrade livekit-agents openai sudo apt install portaudio19-dev -y ``` ```bash export BITHUMAN_API_SECRET="your_secret" export BITHUMAN_MODEL_PATH="/home/pi/model.imx" export LIVEKIT_API_KEY="your_livekit_key" export LIVEKIT_API_SECRET="your_livekit_secret" export LIVEKIT_URL="wss://your-project.livekit.cloud" export OPENAI_API_KEY="your_openai_key" export LOADING_MODE="SYNC" # Important for Pi performance ``` [View source code on GitHub](https://github.com/bithuman-product/examples/tree/main/essence-selfhosted) ```bash Web streaming (recommended) python examples/agent-livekit-rasp-pi.py dev ``` ```bash Command line testing python examples/agent-livekit-rasp-pi.py console ``` For best results, run the web interface on a **separate computer**. Running both agent and web UI on the same Pi causes significant slowdown. --- ## What It Does 1. Runs avatar agent optimized for Raspberry Pi 2. Uses `SYNC` loading mode for memory efficiency 3. Connects to web browsers via LiveKit 4. Suited for always-on edge applications **Pi-specific optimizations:** - Synchronous model loading (`LOADING_MODE="SYNC"`) - Lower memory limits (1500MB warning threshold) - Single process mode for stability - Extended initialization timeout (120s) --- ## Auto-start Service Make it run automatically on boot: ```ini /etc/systemd/system/bithuman-agent.service [Unit] Description=bitHuman Avatar Agent After=network.target [Service] Type=simple User=pi WorkingDirectory=/home/pi Environment=LOADING_MODE=SYNC Environment=BITHUMAN_API_SECRET=your_secret Environment=BITHUMAN_MODEL_PATH=/home/pi/model.imx ExecStart=/home/pi/bithuman-env/bin/python examples/agent-livekit-rasp-pi.py dev Restart=always RestartSec=10 [Install] WantedBy=multi-user.target ``` ```bash sudo systemctl enable bithuman-agent sudo systemctl start bithuman-agent sudo systemctl status bithuman-agent ``` --- ## Performance Tips ```bash # Enable performance governor echo 'performance' | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor # Disable unnecessary services sudo systemctl disable bluetooth sudo systemctl disable wifi # if using ethernet ``` - Use swap file for extra memory - Store models on USB SSD if possible - Monitor with `htop` or `free -h` --- ## Common Issues | Problem | Solution | |---------|----------| | Out of memory | Use Pi 4B 8GB, enable swap: `sudo dphys-swapfile swapon` | | Slow performance | Use ethernet, check CPU temp: `vcgencmd measure_temp` | | Audio problems | Check USB mic: `arecord -l`, test: `arecord -d 5 test.wav` | | Model loading timeout | Ensure `LOADING_MODE="SYNC"`, use faster storage | --- ## Hardware Add-ons ```python import board import adafruit_dht # Add environmental awareness dht = adafruit_dht.DHT22(board.D4) temperature = dht.temperature humidity = dht.humidity ``` --- ## Next Steps - **Add sensors** — Integrate environmental awareness - **Add camera** — Use Pi camera for visual context - **Scale up** — Deploy multiple Pi devices - **Go local** — Replace OpenAI with local LLM ## Example: Self-Hosted LiveKit Agent with Gestures URL: https://docs.bithuman.ai/examples/self-hosted-plugin Use bitHuman agents in real-time applications with self-hosted deployment, featuring direct model file access and VideoControl-based gesture triggering. ## Quick Start ```bash cd examples/self-hosted pip install -r requirements.txt ``` - **API Secret**: [www.bithuman.ai](https://www.bithuman.ai/#developer) - **Model File**: Download your `.imx` model from the platform ```bash # bitHuman Configuration BITHUMAN_API_SECRET=your_api_secret_here BITHUMAN_MODEL_PATH=/path/to/your/avatar_model.imx BITHUMAN_AGENT_ID=A31KJC8622 # Optional: for dynamics gestures # OpenAI Configuration OPENAI_API_KEY=your_openai_api_key_here # LiveKit Configuration LIVEKIT_URL=wss://your-livekit-server.com LIVEKIT_API_KEY=your_livekit_api_key LIVEKIT_API_SECRET=your_livekit_api_secret ``` ```bash python agent.py dev ``` --- ## Basic Self-Hosted Agent Standard avatar interactions without dynamics: ```python from livekit.plugins import bithuman bithuman_avatar = bithuman.AvatarSession( api_secret=os.getenv("BITHUMAN_API_SECRET"), model_path=os.getenv("BITHUMAN_MODEL_PATH"), ) ``` **Key features:** - Direct model file access (`.imx` format) - Native AsyncBithuman runtime integration - High-performance streaming with VideoGenerator pattern - Real-time audio/video processing --- ## Self-Hosted Agent with Dynamics For reactive avatar gestures triggered by user speech keywords. ### Step 1: Get Available Gestures Retrieve available gesture actions for your agent via the [Dynamics API](/api-reference/dynamics). ```python import requests agent_id = "A31KJC8622" url = f"https://api.bithuman.ai/v1/dynamics/{agent_id}" headers = {"api-secret": "YOUR_API_SECRET"} response = requests.get(url, headers=headers) dynamics_data = response.json() if dynamics_data.get("success"): gestures_dict = dynamics_data["data"].get("gestures", {}) available_gestures = list(gestures_dict.keys()) print(f"Available gestures: {available_gestures}") # Example: ["mini_wave_hello", "talk_head_nod_subtle", "laugh_react"] ``` Gesture actions are user-defined and vary based on your agent's dynamics generation. Always check the API response to see what's available. ### Step 2: Set Up Keyword-to-Action Mapping ```python from livekit.agents import AgentSession, JobContext, UserInputTranscribedEvent from livekit.plugins import bithuman from bithuman.api import VideoControl import asyncio import os KEYWORD_ACTION_MAP = { "laugh": "laugh_react", "laughing": "laugh_react", "haha": "laugh_react", "funny": "laugh_react", "hello": "mini_wave_hello", "hi": "mini_wave_hello", } async def entrypoint(ctx: JobContext): await ctx.connect() await ctx.wait_for_participant() bithuman_avatar = bithuman.AvatarSession( api_secret=os.getenv("BITHUMAN_API_SECRET"), model_path=os.getenv("BITHUMAN_MODEL_PATH"), ) session = AgentSession(...) await bithuman_avatar.start(session, room=ctx.room) @session.on("user_input_transcribed") def on_user_input_transcribed(event: UserInputTranscribedEvent): if not event.is_final: return transcript = event.transcript.lower() for keyword, action in KEYWORD_ACTION_MAP.items(): if keyword in transcript: asyncio.create_task( bithuman_avatar.runtime.push(VideoControl(action=action)) ) break ``` **How it works:** 1. Get available gestures from the Dynamics API 2. Map keywords to gesture action names 3. Listen for user speech via `user_input_transcribed` events 4. Trigger gestures via `VideoControl(action=action)` Always verify that a gesture action exists in the API response before using it. Non-existent gestures will be silently ignored. --- ## Configuration | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `model_path` | string | Yes | Path to the `.imx` model file | | `api_secret` | string | Yes | Authentication secret | | `api_token` | string | No | Optional API token | | `agent_id` | string | No | Agent ID for fetching dynamics gestures | ## Self-Hosted Advantages - **Full Control** — Complete control over model files and deployment - **Privacy** — Models stay on your infrastructure - **Customization** — Modify and extend agent behavior - **Performance** — Optimize for your specific hardware - **Offline Capable** — Works without internet after initial setup --- ## Common Issues | Problem | Solution | |---------|----------| | Model loading errors | Verify model file path and permissions | | Memory issues | Minimum 4GB RAM, recommended 8GB+ | | Gesture not triggering | Verify gesture name exists in dynamics API response | | Connection issues | Verify LiveKit server URL and credentials | --- ## Model Requirements | Specification | Value | |---------------|-------| | Format | `.imx` files | | Minimum RAM | 4GB | | Recommended RAM | 8GB+ | | Initialization time | ~20 seconds | | Frame rate | 25 FPS | --- ## Next Steps Cloud-hosted deployment option Configure gestures and animations