Architecture

How bitHuman is built and shipped — the libessence engine, the language SDKs that wrap it, the SDK-to-engine ABI compatibility matrix, and the per-device hardware matrix.

The three layers

bitHuman is one portable engine with thin language bindings on top and apps on top of that. Everything below reads the same .imx model file and produces byte-equivalent, lip-synced visual frames — on an iPhone, a Raspberry Pi, a MacBook, a browser, or a cloud GPU.

Apps & tools L3
bitHuman CLI · reference apps (Mac, iPad, iPhone, Flutter) · LiveKit transport for WebRTC
Language SDKs L2
Python · Swift · Kotlin · JS — thin, idiomatic bindings over the same engine
libessence — the engine L1
Portable C++ avatar renderer behind a stable C ABI. Statically linked into every SDK. macOS · iOS · Android · Linux · Windows · WASM

Every layer drives the same pipeline — audio goes in, lip-synced visual frames come out at a steady 25 FPS:

16 kHz mono audiolibessence engine25 FPS visual frames

Most developers integrate at the SDK layer (L2) — you never need to know what’s underneath. The engine is statically linked into each SDK distribution, so there are no extra system libraries to install.

What runs where

bitHuman is shipped as a single cross-platform runtime with idiomatic SDKs in each language. All SDKs and the CLI read the same .imx model file and produce byte-equivalent frames. Third-party dependencies are bundled inside each distribution — your app’s manifest only needs the bithuman dependency, never a transitive onnxruntime / webp / livekit-server.

GoalInstallWhat you get
Embed in a Python apppip install bithumanPython SDK (library only, ~5 MB)
Embed in a Swift appSwiftPM bitHumanKitSwift SDK + bundled libessence XCFramework
Embed in an Android appai.bithuman:sdk (Maven Central)Kotlin SDK + AAR (arm64-v8a)
Run from the CLI on Macbrew install bithuman-clibithuman CLI (single Rust binary)
Run from the CLI in any Python envpip install bithuman-cliSame Rust binary, inside a Python wheel
Cloud LiveKit avatarpip install livekit-plugins-bithumanManaged avatar session

Engine layering

The platform is three owned layers — engine, SDKs, apps — plus an upstream integration layer that wires bithuman into other ecosystems.

L4  Upstream integrations
    livekit-plugins-bithuman (lives in livekit/agents)
        |
        v depends on
L3  Apps (consume the SDKs)
    bithuman CLI · Flutter reference app · Mac + iPad reference apps
        |
        v builds on
L2  bitHuman SDKs (language bindings)
    Python · Swift (bitHumanKit) · Kotlin (ai.bithuman:sdk) · Rust (in-tree)
        |
        v wraps
L1  bitHuman Engine — libessence (cross-platform C++/Rust)
    Audio in: 16 kHz mono PCM   ·   Video out: 25 FPS BGR frames
    macOS · iOS · Android · Linux · Windows

Most developers integrate at the SDK layer (L2) — you never need to know what’s underneath. The engine is statically linked into each SDK distribution, so there are no extra system libraries to install. The bithuman CLI (L3) consumes the Rust SDK the same way any third-party app would.

Cross-layer contracts

The owned layers ship independently but agree on a small set of stable contracts:

  • Engine ABI (versioned). libessence exposes a C ABI tagged with an explicit version. The current shipping ABI is v7, introduced in libessence 2.3.0. New ABI versions are additive — old SDK builds keep working with newer engines until a version is formally retired. bithuman --version prints libessence <ver> ABI <n> / bithuman <ver>.
  • SDK public API (SemVer-stable). The public surface in each language is stable across patch and minor releases. Patches never break source compatibility; minors add APIs without removing old ones; majors call out breaks explicitly.
  • One .imx, every surface. A model file packed for one SDK runs identically across all of them — enforced by an in-tree parity/ contract test suite that streams the same audio through every SDK and asserts byte-equal frames.

SDK ↔ engine compatibility matrix

Each artifact declares the libessence ABI it builds against. Artifacts with a matching ABI are interoperable even when their headline versions differ.

ArtifactLatest versionChannellibessence ABI
Python SDK (bithuman)2.3.0PyPIv7
Swift SDK (bitHumanKit)0.8.2SwiftPMv7
Kotlin SDK (ai.bithuman:sdk)1.17.1Maven Centralv6
Rust SDK (bithuman-core)in-tree cratesource-only (not on crates.io)v7
bithuman CLI2.3.0Homebrew · PyPI bithuman-cli · universal installerv7

Engine ABI history

ABIIntroducedNotes
v7libessence 2.3.0Adds be_set_default_audio_encoder for fallback audio-encoder registration. Backwards-compatible with v6 callers.
v6libessence 1.16.0Streaming push-audio / pull-frame API. Current production baseline; covers every shipping SDK.
v5 and earlierpre-1.16Retired — synchronous compose only, no streaming.

Skew policy

  • Patch skew within a minor (2.3.02.3.1) is always safe.
  • Minor skew (2.2.x2.3.x) is safe — minors only add APIs.
  • SDK / engine skew is safe across ABI versions that share a major. A 1.17.1 Kotlin AAR (ABI v6) talks to a 2.3.0 engine because v7 is additive on v6 — you simply can’t call v7-only entry points from the older binding.
  • Major skew on the Python side (1.x2.x PyPI) is not supported — the 2.0 streaming API reshaped the surface. Pin the whole Python stack to one major.

Platform / device matrix

What ships in 2.3

PlatformCLI binaryPython wheelSwift SDKKotlin SDK
macOS arm64 (M-series)Homebrew + bithuman-cli wheelbithuman (3.10–3.14)SwiftPM
macOS x86_64 (Intel)PendingPending (1.x was last)
Linux x86_64Tarball + bithuman-cli wheelbithuman (manylinux)
Linux aarch64Tarball + bithuman-cli wheelbithuman (manylinux)
WindowsNot shipping (use WSL2)Not shipping (1.9.0 was last)
iOS / iPadOSSwiftPM
AndroidMaven Central ai.bithuman:sdk

macOS-Intel and Windows are tracked but not part of the 2.3 cut. The 1.x line still has Windows wheels and a macOS-Intel build if you’re stuck on either target.

Essence hardware floor

HostStatusNotes
Apple M-series MacReal-time, large headroomAny Apple Silicon (arm64)
iPhone 17 Pro+Real-time, smallest footprintiOS 26
iPad Pro M4+Real-timePairs well with an on-device LLM
Android (arm64-v8a)Real-timeSnapdragon 8 Gen 2+, Android 10+
Linux x86_64 / aarch64Real-timeModern CPU + 4 GB RAM
Raspberry Pi 4B / 5Near real-timeAdequate for kiosks at modest FPS
Intel Mac / WindowsPendingUse WSL2 or the 1.x wheel today

Expression hardware floor

HostStatusNotes
Mac M3+ (arm64)On-deviceDemo app target
iPad Pro M4+On-deviceSized for 16 GB+ devices
iPhoneNot supportedExceeds the iOS per-app memory budget — use Essence
AndroidNot supportedUse Essence
Linux + NVIDIA GPUServer8 GB+ VRAM via the Docker container
Mac Intel / Linux CPU / Windows / Raspberry PiNot supportedRequires Apple Silicon or NVIDIA GPU

Avatar resolutions

ResolutionBest for
384×384Mobile and edge — the default sweet spot
512×512Mac and iPad Pro — comfortable on M-series
1280×720Desktop and cloud streaming — default for the CLI and LiveKit plugin

Frames are delivered at 1280×720 by every SDK; smaller avatars are letterboxed into that frame. All hosts produce identical frames — your device decision is about form factor, memory, and latency budget, not visual quality.

Authentication and billing

One credential drives every surface; only the env-var name differs by platform convention:

BITHUMAN_API_SECRET    # Python, Kotlin, REST API, CLI
BITHUMAN_API_KEY       # Swift (Apple convention)

The SDK never holds the long-lived secret in process memory — it exchanges the secret for a short-lived runtime token at startup, auto-renewing on the billing heartbeat. Failed heartbeats trigger a 5-minute offline grace window before the avatar pauses. Audio-only mode (no attached avatar) is fully offline and bills nothing. See Pricing.

Where to go next