Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.bithuman.ai/llms.txt

Use this file to discover all available pages before exploring further.

bitHumanKit ships an on-device runtime for the Essence avatar model alongside the existing on-device Expression runtime. One Swift Package, one API to pick between the two — Bithuman.createRuntime(modelPath:) inspects the file you pass and hands back the right runtime.

What is Essence on Swift?

Essence on Swift is a 720p+ on-device avatar runtime. It plays back the pre-rendered base movement baked into your .imx model and applies real-time, audio-driven lip patches to it. The heavy lifting runs on CPU + ANE: a small audio encoder runs on the Neural Engine through Metal (MLX), and the renderer composites lip patches over the decoded base movement on CPU. There is no on-device DiT — only a compact lip-patch model. That makes Essence on Swift a different shape from Expression on Swift:
Essence (Swift)Expression (Swift)
What rendersPre-rendered base movement + audio-driven lip patchesDiffusion-generated facial animation from a portrait
Avatar source.imx model file (built from your video on the dashboard)Any portrait image — no build step
Resolution720p+384×384
Custom gesturesBaked into the .imxNo
Runtime cost1 cr/min on-device2 cr/min on-device
Memory footprintLower — no DiT in memoryHigher — DiT weights resident
Best forBranded characters, kiosks, polished playbackDynamic faces, drag-drop swap, conversational micro-expression
Different trade-offs — pick the one that matches your app. You can also ship both and let the user switch.
Pricing. Essence on-device is 1 credit per active minute and Expression on-device is 2 credits per active minute, billed via a 1-request-per-minute heartbeat to api.bithuman.ai. See pricing.

Quickstart

1

Get an .imx from the bitHuman dashboard

Sign in at https://www.bithuman.aiAgents → New Agent, pick the Essence model, upload your source video, and download the resulting .imx once generation finishes. Drop it in your app bundle (or download it on first launch and cache it on disk).If you don’t have a video yet, the dashboard ships a handful of royalty-free Essence agents you can use as placeholders.
2

Add the package

bitHumanKit is a single Swift Package — same dependency as the Expression quickstart. In an Xcode project: File → Add Package Dependencies → paste the URL → click “Add Package”:
https://github.com/bithuman-product/bithuman-kit-public.git
In a Package.swift:
dependencies: [
    .package(url: "https://github.com/bithuman-product/bithuman-kit-public.git",
             from: "0.8.1")
],
targets: [
    .target(
        name: "MyApp",
        dependencies: [
            .product(name: "bitHumanKit", package: "bithuman-kit-public")
        ]
    )
]
3

Boot the runtime

Bithuman.createRuntime(modelPath:) returns a sum type — switch on it and drive whichever runtime came back. The same call site handles both .imx (Essence) and Expression weight bundles.
import bitHumanKit

let runtime = try await Bithuman.createRuntime(modelPath: imxURL)

switch runtime {
case .essence(let essence):
    // 720p+ pre-rendered + lip-patch path.
    Task {
        for await frame in essence.frames() {
            // `frame` is `nil` when the runtime wants you to render
            // the idle frame — keep the last good CGImage on screen
            // or composite your own idle layer.
            await renderer.present(frame ?? idleFrame)
        }
    }
    // Push 16 kHz mono PCM as it arrives from your TTS / mic.
    try await essence.pushAudio(pcmChunk)

case .expression(let bithuman):
    // Existing Expression Bithuman actor — see /swift-sdk/quickstart.
    try await driveExpression(bithuman)
}
renderer here is whatever you already use to display frames — a CALayer, an NSImageView / UIImageView, or the existing AvatarRendererView from the Expression quickstart. Essence frames are full-resolution CGImages, so size your host view to match essence.resolution.

API surface

Full DocC reference: https://docs.bithuman.ai/swift-sdk/overview. The signatures below are the minimum you need to integrate Essence.

Bithuman.createRuntime(modelPath:)

public enum BithumanRuntime {
    case expression(Bithuman)
    case essence(EssenceRuntime)
}

extension Bithuman {
    public static func createRuntime(modelPath: URL) async throws -> BithumanRuntime
}
Inspects the file at modelPath and returns the matching runtime. Throws if the file is missing, malformed, or asks for hardware the device can’t satisfy (see Hardware).

EssenceRuntime

public actor EssenceRuntime {
    public var resolution: CGSize { get }

    public func pushAudio(_ pcm: Data) async throws
    public func frames() -> AsyncStream<CGImage?>
    public func stop() async
}
  • pushAudio(_:) — feed 16 kHz mono PCM as it arrives. Safe to call from any actor; back-pressure is handled internally.
  • frames() — an AsyncStream of CGImage? at the model’s native frame rate. A nil element means “render the idle frame” — keep your last frame on screen or composite a static idle. Don’t blank the view on nil.
  • stop() — cancels the audio encoder, drains the frame stream, and releases ANE resources. Call this when the user leaves the conversation screen.
  • resolution — the native pixel size of the loaded .imx (typically 720p or higher). Size your renderer to match or scale with CALayer’s contentsGravity.

Hardware

Essence on Swift is hardware-gated at runtime via HardwareCheck.evaluate(). Phase 1 supports:
PlatformMinimumNotes
macOSM3+ Apple Silicon, macOS 26Recommended development target
iPadiPad Pro M4+, 16 GB unified memory, iPadOS 26Requires the increased-memory-limit entitlement
iPhoneNot supported in Phase 1Memory budget too tight for 720p+ pipelines — see Roadmap
switch HardwareCheck.evaluate() {
case .supported:
    // boot the runtime
case .unsupported(let reason):
    // show a polite refusal screen
}
Under-spec devices see a refusal screen instead of a half-loaded engine. Don’t try to bypass the gate — the engine will OOM mid-turn and iOS will terminate your app.

Essence vs Expression on Swift

When deciding, the question is usually “do I want a baked character or a swappable face?”:
Pick Essence whenPick Expression when
You ship a branded character your users don’t customiseYou let users drag-drop their own face
You want 720p+ visual fidelity384² fits your UI (PiP, widget, side panel)
Your .imx already has the gestures you wantYou want diffusion-driven micro-expression
You want the lower 1 cr/min rate2 cr/min is fine for your unit economics
You can ship both runtimes in the same app — they’re both behind the same BithumanRuntime sum type. Pick a model file, get a runtime back, drive it.

Reference apps

The bithuman-apps repo holds Mac, iPad, and iPhone reference apps that consume the SDK.
Essence integration is being added to the reference apps. Watch the repo’s main branch — the Mac and iPad targets will gain a model picker (Essence vs Expression) once the integration commit lands. Until then the reference apps demonstrate the Expression quickstart end-to-end and the API call sites are identical.

Limitations & roadmap

  • iPhone is deferred to Phase 2. 720p+ playback plus the audio encoder doesn’t fit the iPhone memory budget without streaming the base movement off disk. Phase 2 will add disk-streamed playback and re-enable the iPhone target.
  • No runtime face swap. Essence’s identity is baked into the .imx at generation time. To switch faces, generate (or download) a different .imx and re-call Bithuman.createRuntime(modelPath:). If you want runtime face swap on Apple Silicon, use Expression instead.
  • Action triggers and video graph (gestures, transitions) are Phase 2. Phase 1 plays the bundled base movement and applies lip patches; keyword-triggered gestures and explicit transition control land in a follow-up release.

Next

  • Quickstart — the Expression on-device walkthrough; the package install and entitlements section apply identically to Essence.
  • Essence vs Expression — the cross-platform comparison (cloud + Python + Swift).
  • Pricing — credit rates per model and surface.
  • Reference apps — source for the Mac, iPad, and iPhone reference apps.