Swift SDK
On-device, real-time, lip-synced avatars for iOS, iPadOS, and macOS via SwiftPM. Apple Silicon only. GA.
Overview
The Swift SDK drops a real-time, lip-synced avatar into a native Mac, iPad, or
iPhone app. Audio in (16 kHz mono PCM), CGImage / BGR frames out at 25 FPS.
All inference runs on-device; a once-per-minute billing heartbeat meters
avatar mode (audio-only is unmetered). This SDK is GA.
Note Swift has two surfaces today, both on the same
libessenceengine:
bitHumanKit— the currently-published SwiftPM package. Full-stack and on-device: a high-levelVoiceChatagent (built-in STT + LLM + TTS) plus a low-level streaming runtime. This is what you install today.Bithuman— a newer binding that maps directly onto thelibessencestreaming engine (Fixture/Runtime/pushAudio/pullFrame). It is rolling out; both are documented below.
It ships as a single binary XCFramework (macOS arm64 + iOS device + simulator) with zero transitive SwiftPM dependencies — everything is statically linked.
Install — bitHumanKit (published)
In Xcode: File → Add Package Dependencies… → paste the URL → pick the latest
tag → attach bitHumanKit to your target.
// Package.swift
dependencies: [
.package(url: "https://github.com/bithuman-product/bithuman-sdk-public.git",
from: "0.8.1")
],
targets: [
.target(name: "MyApp",
dependencies: [.product(name: "bitHumanKit", package: "bithuman")])
]
The SwiftPM package was renamed bithuman-sdk-public → bithuman in Wave 8. The
import stays import bitHumanKit — only the package: field changes. Do not pin
below 0.8.1 (older versions are unsupported).
Auth: export BITHUMAN_API_KEY (Apple convention) or set
VoiceChatConfig.apiKey. Get a secret at Developer → API
Keys.
Two API surfaces in bitHumanKit
Both ship in the same package.
VoiceChat— high-level voice agent with built-in speech recognition, LLM, and TTS. Fastest way to ship a talking on-device assistant.Bithuman.createRuntime+EssenceRuntime.frames()— low-level streaming: push your own PCM, pullCGImageframes. Use when you bring your own audio (WebRTC, custom TTS).
Low-level streaming (bring your own audio)
import bitHumanKit
let imxURL = Bundle.main.url(forResource: "avatar", withExtension: "imx")!
let runtime = try await Bithuman.createRuntime(modelPath: imxURL)
guard case .essence(let essence) = runtime else { return }
// Drain frames at 25 fps → your SwiftUI view.
Task {
for await frame in essence.frames() {
if let frame { renderer.present(frame) }
}
}
// Push 16 kHz mono PCM as it arrives.
try await essence.pushAudio(pcmChunk)
This is the Apple expression of the audio-streaming push/drain loop.
High-level voice agent (audio-only, unmetered)
import SwiftUI
import bitHumanKit
@MainActor
final class Lifecycle: ObservableObject {
@Published var status = "booting…"
private var chat: VoiceChat?
func start() async {
var config = VoiceChatConfig()
config.localeIdentifier = "en-US"
config.systemPrompt = "You are a helpful assistant. One sentence per turn."
config.voice = .preset("Aiden")
do {
let chat = VoiceChat(config: config)
try await chat.start()
self.chat = chat
status = "live — talk to me"
} catch { status = "error: \(error.localizedDescription)" }
}
}
Say “hello” — it transcribes, thinks, and replies via TTS. Fully offline, no API
key for audio-only. To add the lip-synced avatar, set
config.avatar = AvatarConfig(modelPath:portraitPath:) and host an
AvatarRendererView — return the same renderer instance from both
makeXxxView and updateXxxView (SwiftUI rebuilds the parent many times per
second and the renderer must persist).
The Bithuman libessence binding (rolling out)
The newer binding exposes the libessence streaming surface directly — the same
Fixture / Runtime shape as the Python and Kotlin SDKs. Install via SwiftPM
against the bithuman-sdk repo and import Bithuman:
dependencies: [
.package(url: "https://github.com/bithuman-product/bithuman-sdk", from: "1.16.0")
],
targets: [
.target(name: "MyApp",
dependencies: [.product(name: "Bithuman", package: "bithuman-sdk")])
]
Push PCM whenever it arrives; pull a composed BGR frame whenever you want one:
import Bithuman
let fixture = try Fixture(path: modelURL.path) // load weights once
let runtime = try Runtime(fixture: fixture) // cheap per session
var bgr = [UInt8](repeating: 0, count: fixture.info.bgrFrameByteCount)
// Feed any amount of 16 kHz mono Float32 PCM (incremental, O(n_new)).
try runtime.pushAudio(audioPCM: pcm16kMonoFloat32)
// Drain however many ticks have become available.
while runtime.ticksAvailable > 0 {
_ = try bgr.withUnsafeMutableBufferPointer {
try runtime.pullFrame(frameIdxHint: -1, frameOut: $0)
}
// Hand `bgr` to SwiftUI / UIKit / AppKit.
}
// At end-of-utterance / when the conversation pauses:
try runtime.resetStream()
For multi-conversation hosting, share a single Fixture (~344 MB) across many
lightweight Runtimes (~36 MB each) — about 5.6× memory efficiency at N=10
concurrent streams on M5. A Runtime is not internally synchronized; pin
push/pull to one thread or wrap in your own mutex.
let fixture = try Fixture(path: modelURL.path)
let convoA = try Runtime(fixture: fixture)
let convoB = try Runtime(fixture: fixture)
The binding targets Swift 6 (strict-concurrency clean), swift-tools-version 6.0,
and wraps libessence ABI v6. BithumanInfo.libraryVersion /
BithumanInfo.abiVersion report the linked engine version.
Permissions + entitlements
Info.plist (all platforms):
<key>NSMicrophoneUsageDescription</key><string>Talk to your assistant.</string>
<key>NSSpeechRecognitionUsageDescription</key><string>Recognise what you say.</string>
Without these, chat.start() fails silently (the OS denies and remembers).
Sandboxed Mac apps also need com.apple.security.device.audio-input in
.entitlements.
Warning The iOS increased-memory entitlement is mandatory. Without it, iOS kills your app mid-conversation (~30 s into a turn) when memory exceeds the default ~3 GB ceiling. Request approval before development — Apple takes 1–3 business days.
<key>com.apple.developer.kernel.increased-memory-limit</key><true/> <key>com.apple.developer.kernel.extended-virtual-addressing</key><true/>Request at developer.apple.com → Account → Membership → Request Additional Capabilities.
Audio-only keyless mode
The high-level VoiceChat agent runs fully on-device and needs no API key for
audio-only use — STT, LLM, and TTS all run locally and audio-only mode is
unmetered. You only need a key (and the billing heartbeat fires) once you add the
lip-synced avatar.
Hardware floor
The SDK gates this at runtime — under-spec devices get a polite refusal screen,
not a half-loaded engine. Use HardwareCheck.evaluate() to branch your SwiftUI
root and show your own UnsupportedDeviceView for .unsupported(reason).
| Essence | Expression | |
|---|---|---|
| macOS | M3+, macOS 26 | M3+, macOS 26 |
| iPadOS | iPad Pro M4+, iPadOS 26 | iPad Pro M4+, 16 GB, iPadOS 26 |
| iPhone | iPhone 16 Pro+ (A18 Pro) | Not supported — use Essence |
Requires Xcode 26+ (older Xcodes reject the Swift 6 concurrency syntax).
Expression on Apple Silicon auto-spawns a bithuman-expression-daemon
subprocess; on unsupported hardware it raises ExpressionModelNotSupported — not
a crash. See models.
Performance
Measured on an M5 MacBook Pro (libessence 1.16.0, single conversation):
| Metric | Value |
|---|---|
| Per-tick mean | 1.43 ms |
| Per-tick p99 | 1.51 ms |
| Sustained (tight loop) | 698 FPS |
| Cold start | ~290 ms |
| Peak RSS | ~84 MB |
| Wrapper overhead vs raw libessence | +1.7 % |
Comfortable headroom over the 25 FPS / 40 ms tick budget.
Troubleshooting
chat.start() fails silently
Missing Info.plist privacy strings — the OS denies mic / speech and caches the
denial for the session.
App killed ~30 s into a conversation (iOS)
Missing the increased-memory-limit entitlement. See the warning above — it must be approved by Apple before it takes effect.
Avatar disappears on re-render
Return the same AvatarRendererView instance from both makeXxxView and
updateXxxView. SwiftUI rebuilds the parent constantly; a fresh renderer each
time means a vanishing avatar.
Under-spec device shows a refusal screen
Working as intended. Branch on HardwareCheck.evaluate().
See also
- SDK overview — which SDK to pick
- LiveKit (Apple) — connect a native app to a cloud-hosted avatar
- Models — Essence vs Expression
- CLI — no-code Mac terminal tool, same engine