Swift / iOS — Hello, avatar
Boot a real-time, lip-synced on-device bitHuman voice agent on iPhone or iPad with SwiftUI and bitHumanKit.
Prerequisites
- A bitHuman API key, exposed to the app as
BITHUMAN_API_KEY(Apple convention) — get one at Developer → API Keys; see Authentication. - Xcode 26+ on a Mac, plus an Apple Developer account. Add the SwiftPM package:
.package(url: "https://github.com/bithuman-product/bithuman-sdk-public.git", from: "0.8.1")
- Device floor (real hardware — the Simulator can’t run on-device inference): iPhone 16 Pro or later (A18 Pro+), or iPad Pro M4 or later, on iOS / iPadOS 26+. Earlier devices are refused at launch by
HardwareCheck.evaluate(). - Apple-approved memory entitlements — without them iOS terminates the app mid-conversation. Request both before you start (Apple takes 1–3 business days):
com.apple.developer.kernel.increased-memory-limitandcom.apple.developer.kernel.extended-virtual-addressing.
Note Swift / Apple is GA. The published SwiftPM package is bitHumanKit — a self-contained XCFramework with zero transitive dependencies. The iOS example below drives an Expression voice agent; Essence is the lighter option for lower-end devices (see Models).
Run it
- Open the example folder in Xcode (
File → Open→ select the folder containingPackage.swift):
git clone https://github.com/bithuman-product/bithuman-sdk-public.git
open bithuman-sdk-public/Examples/swift/ios-avatar/Package.swift
-
Set the API key in the scheme:
Product → Scheme → Edit Scheme → Run → Arguments → Environment Variables, addBITHUMAN_API_KEY. Never hardcode it. -
Select a physical iPhone 16 Pro or iPad Pro M4+, then Build and Run.
What you’ll see
On first launch the app downloads the Expression weights (~1.6 GB, cached), warms the model, then shows a live circular avatar that says “live — talk to me”. Speak and the avatar answers and lip-syncs the reply at 25 fps, fully on-device with sub-200 ms latency. Under-spec devices instead show an “unsupported device” screen.
Full code
The minimal shape: a HardwareCheck gate, then a VoiceChat that boots the avatar. The full app (Sources/IOSAvatarApp.swift) adds the render-host wiring and lifecycle phases.
// IOSAvatarApp.swift — iOS voice agent with a lip-synced avatar
import SwiftUI
import UIKit
import bitHumanKit
@main
struct IOSAvatarApp: App {
var body: some Scene {
WindowGroup {
switch HardwareCheck.evaluate() {
case .supported: AvatarRootView()
case .unsupported(let reason): UnsupportedDeviceView(reason: reason)
}
}
}
}
@MainActor
final class AvatarLifecycle: ObservableObject {
@Published var phase: Phase = .idle
@Published private(set) var renderer: AvatarRendererView?
private var chat: VoiceChat?
enum Phase: Equatable { case idle, warming, live, error(String) }
func start() async {
do {
// 1. Download / verify the Expression weights (~1.6 GB, cached).
let weights = try await ExpressionWeights.ensureAvailable { _ in }
phase = .warming
// 2. Configure a voice chat with an avatar.
let agent = AgentCatalog.defaultAgent
var config = VoiceChatConfig()
config.systemPrompt = agent.systemPrompt
config.avatar = AvatarConfig(modelPath: weights,
portraitPath: AgentCatalog.thumbnailURL(for: agent)!)
config.apiKey = ProcessInfo.processInfo.environment["BITHUMAN_API_KEY"]
// 3. Start it and render frames into a view.
let chat = VoiceChat(config: config)
try await chat.start()
self.chat = chat
self.renderer = AvatarRendererView(frame: .zero,
idleFrame: chat.initialIdleFrame,
clipMode: .circle)
self.phase = .live
} catch {
phase = .error(error.localizedDescription)
}
}
}
Full source: GitHub
Next steps
- Swift SDK — full walkthrough: lifecycle, entitlements, device matrix.
- macos-voice example — offline macOS voice agent: no avatar, no API key.
- Models — Essence vs Expression, which to ship.