Documentation Index
Fetch the complete documentation index at: https://docs.bithuman.ai/llms.txt
Use this file to discover all available pages before exploring further.
bitHumanKit ships an on-device runtime for the Essence avatar
model alongside the existing on-device Expression runtime. One
Swift Package, one API to pick between the two — Bithuman.createRuntime(modelPath:)
inspects the file you pass and hands back the right runtime.
What is Essence on Swift?
Essence on Swift is a 720p+ on-device avatar runtime. It plays back the pre-rendered base movement baked into your.imx model and
applies real-time, audio-driven lip patches to it. The heavy lifting
runs on CPU + ANE: a small audio encoder runs on the Neural Engine
through Metal (MLX), and the renderer composites lip patches over the
decoded base movement on CPU. There is no on-device DiT — only a
compact lip-patch model.
That makes Essence on Swift a different shape from Expression on Swift:
| Essence (Swift) | Expression (Swift) | |
|---|---|---|
| What renders | Pre-rendered base movement + audio-driven lip patches | Diffusion-generated facial animation from a portrait |
| Avatar source | .imx model file (built from your video on the dashboard) | Any portrait image — no build step |
| Resolution | 720p+ | 384×384 |
| Custom gestures | Baked into the .imx | No |
| Runtime cost | 1 cr/min on-device | 2 cr/min on-device |
| Memory footprint | Lower — no DiT in memory | Higher — DiT weights resident |
| Best for | Branded characters, kiosks, polished playback | Dynamic faces, drag-drop swap, conversational micro-expression |
Quickstart
Get an .imx from the bitHuman dashboard
Sign in at https://www.bithuman.ai →
Agents → New Agent, pick the Essence model, upload your
source video, and download the resulting
.imx once generation
finishes. Drop it in your app bundle (or download it on first launch
and cache it on disk).If you don’t have a video yet, the dashboard ships a handful of
royalty-free Essence agents you can use as placeholders.Add the package
bitHumanKit is a single Swift Package — same dependency as the
Expression quickstart. In an Xcode project: File → Add Package
Dependencies → paste the URL → click “Add Package”:Package.swift:Boot the runtime
Bithuman.createRuntime(modelPath:) returns a sum type — switch on
it and drive whichever runtime came back. The same call site handles
both .imx (Essence) and Expression weight bundles.renderer here is whatever you already use to display frames — a
CALayer, an NSImageView / UIImageView, or the existing
AvatarRendererView from the Expression quickstart. Essence frames
are full-resolution CGImages, so size your host view to match
essence.resolution.API surface
Full DocC reference: https://docs.bithuman.ai/swift-sdk/overview.
The signatures below are the minimum you need to integrate Essence.
Bithuman.createRuntime(modelPath:)
modelPath and returns the matching runtime.
Throws if the file is missing, malformed, or asks for hardware the
device can’t satisfy (see Hardware).
EssenceRuntime
pushAudio(_:)— feed 16 kHz mono PCM as it arrives. Safe to call from any actor; back-pressure is handled internally.frames()— anAsyncStreamofCGImage?at the model’s native frame rate. Anilelement means “render the idle frame” — keep your last frame on screen or composite a static idle. Don’t blank the view onnil.stop()— cancels the audio encoder, drains the frame stream, and releases ANE resources. Call this when the user leaves the conversation screen.resolution— the native pixel size of the loaded.imx(typically 720p or higher). Size your renderer to match or scale withCALayer’scontentsGravity.
Hardware
Essence on Swift is hardware-gated at runtime viaHardwareCheck.evaluate().
Phase 1 supports:
| Platform | Minimum | Notes |
|---|---|---|
| macOS | M3+ Apple Silicon, macOS 26 | Recommended development target |
| iPad | iPad Pro M4+, 16 GB unified memory, iPadOS 26 | Requires the increased-memory-limit entitlement |
| iPhone | Not supported in Phase 1 | Memory budget too tight for 720p+ pipelines — see Roadmap |
Essence vs Expression on Swift
When deciding, the question is usually “do I want a baked character or a swappable face?”:| Pick Essence when | Pick Expression when |
|---|---|
| You ship a branded character your users don’t customise | You let users drag-drop their own face |
| You want 720p+ visual fidelity | 384² fits your UI (PiP, widget, side panel) |
Your .imx already has the gestures you want | You want diffusion-driven micro-expression |
| You want the lower 1 cr/min rate | 2 cr/min is fine for your unit economics |
BithumanRuntime sum type. Pick a model file, get a runtime
back, drive it.
Reference apps
The bithuman-apps repo holds Mac, iPad, and iPhone reference apps that consume the SDK.Limitations & roadmap
- iPhone is deferred to Phase 2. 720p+ playback plus the audio encoder doesn’t fit the iPhone memory budget without streaming the base movement off disk. Phase 2 will add disk-streamed playback and re-enable the iPhone target.
- No runtime face swap. Essence’s identity is baked into the
.imxat generation time. To switch faces, generate (or download) a different.imxand re-callBithuman.createRuntime(modelPath:). If you want runtime face swap on Apple Silicon, use Expression instead. - Action triggers and video graph (gestures, transitions) are Phase 2. Phase 1 plays the bundled base movement and applies lip patches; keyword-triggered gestures and explicit transition control land in a follow-up release.
Next
- Quickstart — the Expression on-device walkthrough; the package install and entitlements section apply identically to Essence.
- Essence vs Expression — the cross-platform comparison (cloud + Python + Swift).
- Pricing — credit rates per model and surface.
- Reference apps — source for the Mac, iPad, and iPhone reference apps.
