Essence on Apple Silicon (Swift SDK)

bitHumanKit ships an on-device runtime for the Essence avatar model alongside the existing on-device Expression runtime. One Swift Package, one API to pick between the two — Bithuman.createRuntime(modelPath:) inspects the file you pass and hands back the right runtime.

What is Essence on Swift?

Essence on Swift is a 720p+ on-device avatar runtime. It plays back the pre-rendered base movement baked into your .imx model and applies real-time, audio-driven lip patches to it. The heavy lifting runs on CPU + ANE: a small audio encoder runs on the Neural Engine through Metal (MLX), and the renderer composites lip patches over the decoded base movement on CPU. There is no on-device DiT — only a compact lip-patch model. That makes Essence on Swift a different shape from Expression on Swift:

	Essence (Swift)	Expression (Swift)
What renders	Pre-rendered base movement + audio-driven lip patches	Diffusion-generated facial animation from a portrait
Avatar source	`.imx` model file (built from your video on the dashboard)	Any portrait image — no build step
Resolution	720p+	384×384
Custom gestures	Baked into the `.imx`	No
Runtime cost	1 cr/min on-device	2 cr/min on-device
Memory footprint	Lower — no DiT in memory	Higher — DiT weights resident
Best for	Branded characters, kiosks, polished playback	Dynamic faces, drag-drop swap, conversational micro-expression

Different trade-offs — pick the one that matches your app. You can also ship both and let the user switch.

Pricing. Essence on-device is 1 credit per active minute and Expression on-device is 2 credits per active minute, billed via a 1-request-per-minute heartbeat to api.bithuman.ai. See pricing.

Quickstart

Get an .imx from the bitHuman dashboard

Sign in at https://www.bithuman.ai → Agents → New Agent, pick the Essence model, upload your source video, and download the resulting .imx once generation finishes. Drop it in your app bundle (or download it on first launch and cache it on disk).If you don’t have a video yet, the dashboard ships a handful of royalty-free Essence agents you can use as placeholders.

Add the package

bitHumanKit is a single Swift Package — same dependency as the Expression quickstart. In an Xcode project: File → Add Package Dependencies → paste the URL → click “Add Package”:

https://github.com/bithuman-product/bithuman-kit-public.git

In a Package.swift:

dependencies: [
    .package(url: "https://github.com/bithuman-product/bithuman-kit-public.git",
             from: "0.8.1")
],
targets: [
    .target(
        name: "MyApp",
        dependencies: [
            .product(name: "bitHumanKit", package: "bithuman-kit-public")
        ]
    )
]

Boot the runtime

Bithuman.createRuntime(modelPath:) returns a sum type — switch on it and drive whichever runtime came back. The same call site handles both .imx (Essence) and Expression weight bundles.

import bitHumanKit

let runtime = try await Bithuman.createRuntime(modelPath: imxURL)

switch runtime {
case .essence(let essence):
    // 720p+ pre-rendered + lip-patch path.
    Task {
        for await frame in essence.frames() {
            // `frame` is `nil` when the runtime wants you to render
            // the idle frame — keep the last good CGImage on screen
            // or composite your own idle layer.
            await renderer.present(frame ?? idleFrame)
        }
    }
    // Push 16 kHz mono PCM as it arrives from your TTS / mic.
    try await essence.pushAudio(pcmChunk)

case .expression(let bithuman):
    // Existing Expression Bithuman actor — see /swift-sdk/quickstart.
    try await driveExpression(bithuman)
}

renderer here is whatever you already use to display frames — a CALayer, an NSImageView / UIImageView, or the existing AvatarRendererView from the Expression quickstart. Essence frames are full-resolution CGImages, so size your host view to match essence.resolution.

API surface

Full DocC reference: https://docs.bithuman.ai/swift-sdk/overview. The signatures below are the minimum you need to integrate Essence.

`Bithuman.createRuntime(modelPath:)`

public enum BithumanRuntime {
    case expression(Bithuman)
    case essence(EssenceRuntime)
}

extension Bithuman {
    public static func createRuntime(modelPath: URL) async throws -> BithumanRuntime
}

Inspects the file at modelPath and returns the matching runtime. Throws if the file is missing, malformed, or asks for hardware the device can’t satisfy (see Hardware).

`EssenceRuntime`

public actor EssenceRuntime {
    public var resolution: CGSize { get }

    public func pushAudio(_ pcm: Data) async throws
    public func frames() -> AsyncStream<CGImage?>
    public func stop() async
}

pushAudio(_:) — feed 16 kHz mono PCM as it arrives. Safe to call from any actor; back-pressure is handled internally.
frames() — an AsyncStream of CGImage? at the model’s native frame rate. A nil element means “render the idle frame” — keep your last frame on screen or composite a static idle. Don’t blank the view on nil.
stop() — cancels the audio encoder, drains the frame stream, and releases ANE resources. Call this when the user leaves the conversation screen.
resolution — the native pixel size of the loaded .imx (typically 720p or higher). Size your renderer to match or scale with CALayer’s contentsGravity.

Hardware

Essence on Swift is hardware-gated at runtime via HardwareCheck.evaluate(). Phase 1 supports:

Platform	Minimum	Notes
macOS	M3+ Apple Silicon, macOS 26	Recommended development target
iPad	iPad Pro M4+, 16 GB unified memory, iPadOS 26	Requires the increased-memory-limit entitlement
iPhone	Not supported in Phase 1	Memory budget too tight for 720p+ pipelines — see Roadmap

switch HardwareCheck.evaluate() {
case .supported:
    // boot the runtime
case .unsupported(let reason):
    // show a polite refusal screen
}

Under-spec devices see a refusal screen instead of a half-loaded engine. Don’t try to bypass the gate — the engine will OOM mid-turn and iOS will terminate your app.

Essence vs Expression on Swift

When deciding, the question is usually “do I want a baked character or a swappable face?”:

Pick Essence when	Pick Expression when
You ship a branded character your users don’t customise	You let users drag-drop their own face
You want 720p+ visual fidelity	384² fits your UI (PiP, widget, side panel)
Your `.imx` already has the gestures you want	You want diffusion-driven micro-expression
You want the lower 1 cr/min rate	2 cr/min is fine for your unit economics

You can ship both runtimes in the same app — they’re both behind the same BithumanRuntime sum type. Pick a model file, get a runtime back, drive it.

Reference apps

The bithuman-apps repo holds Mac, iPad, and iPhone reference apps that consume the SDK.

Essence integration is being added to the reference apps. Watch the repo’s main branch — the Mac and iPad targets will gain a model picker (Essence vs Expression) once the integration commit lands. Until then the reference apps demonstrate the Expression quickstart end-to-end and the API call sites are identical.

Limitations & roadmap

iPhone is deferred to Phase 2. 720p+ playback plus the audio encoder doesn’t fit the iPhone memory budget without streaming the base movement off disk. Phase 2 will add disk-streamed playback and re-enable the iPhone target.
No runtime face swap. Essence’s identity is baked into the .imx at generation time. To switch faces, generate (or download) a different .imx and re-call Bithuman.createRuntime(modelPath:). If you want runtime face swap on Apple Silicon, use Expression instead.
Action triggers and video graph (gestures, transitions) are Phase 2. Phase 1 plays the bundled base movement and applies lip patches; keyword-triggered gestures and explicit transition control land in a follow-up release.

Quickstart — the Expression on-device walkthrough; the package install and entitlements section apply identically to Essence.
Essence vs Expression — the cross-platform comparison (cloud + Python + Swift).
Pricing — credit rates per model and surface.
Reference apps — source for the Mac, iPad, and iPhone reference apps.

Getting Started

Swift SDK

Deployment

Integrations

Changelog

Essence on Apple Silicon (Swift SDK)

What is Essence on Swift?

Quickstart

API surface

`Bithuman.createRuntime(modelPath:)`

`EssenceRuntime`

Hardware

Essence vs Expression on Swift

Reference apps

Limitations & roadmap

Next

Getting Started

Swift SDK

Deployment

Integrations

Changelog

Documentation Index

​What is Essence on Swift?

​Quickstart

​API surface

​Bithuman.createRuntime(modelPath:)

​EssenceRuntime

​Hardware

​Essence vs Expression on Swift

​Reference apps

​Limitations & roadmap

​Next

What is Essence on Swift?

Quickstart

API surface

`Bithuman.createRuntime(modelPath:)`

`EssenceRuntime`

Hardware

Essence vs Expression on Swift

Reference apps

Limitations & roadmap

Next