Documentation Index
Fetch the complete documentation index at: https://docs.bithuman.ai/llms.txt
Use this file to discover all available pages before exploring further.
Ship a working on-device voice agent in your Mac/iPad/iPhone app in
under 10 minutes. The on-device runtime makes no network calls
(other than the first-launch weights download); all inference
runs locally.
Want a working app right now? Clone bithuman-apps — Mac (swift run BithumanMac), iPad, and iPhone reference apps that consume the SDK out of the box. Same code as this quickstart, just pre-wired. Walk through this page if you’re integrating the SDK into an existing project; clone the reference app if you want to read finished code.
Prerequisites
- Xcode 26+ — older Xcodes don’t recognise some Swift 6
concurrency syntax bitHumanKit uses.
- An Apple Silicon Mac (M3+) for development — the SDK builds
for both
swift build and Xcode targets, but compilation
requires Apple Silicon.
- An Apple Developer account if you’re targeting iOS / iPadOS
on real devices (free is fine for sideloaded builds; paid is
required for App Store + the memory entitlements).
- A target device that meets the
hardware floor — under-spec devices
are refused at runtime via
HardwareCheck.evaluate().
If you’ve never used Swift Package Manager
You have two on-ramps:
- You already have an Xcode project / app. Skip to
Step 1: Add the package. You’ll add
bitHumanKit as a Swift Package dependency through Xcode’s UI.
- You’re starting from scratch. Open Xcode → File → New →
Project → pick “App” (macOS, iOS, or iPadOS) → name it →
create. Once the project is open, follow Step 1 below.
If you’re integrating into an existing project that uses a
top-level Package.swift instead of an Xcode project (e.g. a
command-line tool), the Step 1 snippet below shows what to add to
your dependencies and targets arrays.
1. Add the package
In an Xcode project: File → Add Package Dependencies →
paste the URL → click “Add Package”:
https://github.com/bithuman-product/bithuman-kit-public.git
When Xcode asks which products to add, pick bitHumanKit and
attach it to your app’s target.
In a Package.swift (SPM-only project):
dependencies: [
.package(url: "https://github.com/bithuman-product/bithuman-kit-public.git",
from: "0.8.1")
],
targets: [
.target(
name: "MyApp",
dependencies: [
.product(name: "bitHumanKit", package: "bithuman-kit-public")
]
)
]
2. Permissions and entitlements
<key>NSMicrophoneUsageDescription</key>
<string>Talk to your on-device assistant.</string>
<key>NSSpeechRecognitionUsageDescription</key>
<string>Recognise what you say so the assistant can respond.</string>
Without these, chat.start() fails silently when iOS / macOS
prompts the user — the OS denies the request and remembers the
denial for the rest of the session.
macOS-only — sandboxed apps
If your Mac app is sandboxed (default for App Store distribution),
also add to your .entitlements:
<key>com.apple.security.device.audio-input</key>
<true/>
Direct DMG / Homebrew distribution doesn’t need the entitlement —
just the Info.plist keys above.
iOS-only — memory entitlements (REQUIRED)
⚠️ Without these your app will be terminated by iOS
mid-conversation when memory exceeds the default ~3 GB
ceiling (~30 s into a live turn). Request approval BEFORE you
start development — Apple takes 1–3 business days.
<key>com.apple.developer.kernel.increased-memory-limit</key>
<true/>
<key>com.apple.developer.kernel.extended-virtual-addressing</key>
<true/>
Request approval at developer.apple.com → Account → Membership
→ Request Additional Capabilities. The provisioning profile
updates automatically once Apple replies via email.
3. Boot a voice chat (audio-only)
import SwiftUI
import bitHumanKit
@main
struct MyApp: App {
@StateObject private var lifecycle = MyLifecycle()
var body: some Scene {
WindowGroup {
ContentView(lifecycle: lifecycle)
.task { await lifecycle.start() }
}
}
}
@MainActor
final class MyLifecycle: ObservableObject {
@Published var status = "booting…"
private var chat: VoiceChat?
func start() async {
var config = VoiceChatConfig()
config.localeIdentifier = "en-US"
config.systemPrompt = "You are a helpful assistant. One sentence per turn."
config.voice = .preset("Aiden")
do {
let chat = VoiceChat(config: config)
try await chat.start()
self.chat = chat
status = "live — talk to me"
} catch {
status = "error: \(error.localizedDescription)"
}
}
}
struct ContentView: View {
@ObservedObject var lifecycle: MyLifecycle
var body: some View {
Text(lifecycle.status).font(.title)
}
}
Run it. Say “hello.” The bot transcribes, thinks, replies via
TTS through the speakers. That’s the entire integration for an
audio-only voice agent.
4. Add the lip-synced avatar (video mode)
💳 Avatar mode is metered. The pipeline charges 2 credits
per active minute against your bitHuman developer account, via
a 1-request-per-minute heartbeat to api.bithuman.ai. Get an
API key first: https://www.bithuman.ai → Developer → API Keys.
Then either set VoiceChatConfig.apiKey or export
BITHUMAN_API_KEY before chat.start(). Audio-only mode
(Step 3 above) doesn’t require a key and isn’t metered.
The avatar pipeline downloads ~1.6 GB of weights on first launch
(sha256 verified + cached for next time). On a slow network this
takes a few minutes — wire up a progress callback or your app
will look frozen.
import Foundation
import SwiftUI
import bitHumanKit
@MainActor
final class AvatarLifecycle: ObservableObject {
@Published var phase: BootPhase = .idle
@Published private(set) var coordinator: AvatarCoordinator?
@Published private(set) var renderer: AvatarRendererView?
private var chat: VoiceChat?
private var pump: FramePump?
enum BootPhase: Equatable {
case idle
case downloading(Double) // 0…1
case warming
case live
case error(String)
}
func start() async {
do {
// 1. Download/verify the universal weights bundle.
// Surface progress to the UI — first launch can take
// minutes on a slow network and silent hang is a
// reliable way to lose users.
phase = .downloading(0)
let weights = try await ExpressionWeights.ensureAvailable { event in
Task { @MainActor in
if case .downloading(let f, _, _, _, _) = event {
self.phase = .downloading(f)
}
}
}
phase = .warming
// 2. Pick a bundled agent for the first run. Swap later
// via coordinator.applyAgent(_:) or the AgentPickerView.
let agent = AgentCatalog.defaultAgent
let portrait = AgentCatalog.thumbnailURL(for: agent)!
// 3. Configure + boot. The apiKey here is required for
// the avatar pipeline; it's resolved from
// BITHUMAN_API_KEY automatically if you leave the
// field empty, but explicit-in-config is clearer.
var config = VoiceChatConfig()
config.systemPrompt = agent.systemPrompt
config.avatar = AvatarConfig(modelPath: weights, portraitPath: portrait)
config.apiKey = ProcessInfo.processInfo.environment["BITHUMAN_API_KEY"]
let chat = VoiceChat(config: config)
try await chat.start() // throws .missingAPIKey / .authenticationFailed
await chat.setVoicePreset(agent.voicePreset)
// 4. Bind the coordinator + render stack.
guard let bh = chat.bithuman else {
phase = .error("avatar engine failed to initialise")
return
}
let coord = AvatarCoordinator(chat: chat)
coord.bindToOrchestrator()
coord.prewarmPortraitURL = portrait
coord.currentAgentCode = agent.code
let renderer = AvatarRendererView(
frame: .zero, idleFrame: chat.initialIdleFrame, clipMode: .circle)
let pump = FramePump(
bithuman: bh, chat: chat, window: renderer, coordinator: coord)
coord.framePump = pump
chat.onBargeIn = { [weak pump] in pump?.buffer.flushSpeech() }
// 5. Hold strong refs so SwiftUI doesn't deinit them on
// re-render — this is the most common first-try bug.
self.chat = chat
self.pump = pump
self.renderer = renderer
self.coordinator = coord
self.phase = .live
} catch {
phase = .error(error.localizedDescription)
}
}
}
Host the renderer in your SwiftUI tree. The representables below
are explicit struct types because NSViewRepresentable /
UIViewRepresentable are protocols — they don’t take a closure.
Returning the same instance from makeXxxView and
updateXxxView is essential; SwiftUI may rebuild the parent view
many times per second, but the renderer must persist or the
avatar disappears.
#if canImport(AppKit)
import AppKit
struct AvatarHost: NSViewRepresentable {
let view: AvatarRendererView
func makeNSView(context: Context) -> AvatarRendererView { view }
func updateNSView(_ nsView: AvatarRendererView, context: Context) {}
}
#elseif canImport(UIKit)
import UIKit
struct AvatarHost: UIViewRepresentable {
let view: AvatarRendererView
func makeUIView(context: Context) -> AvatarRendererView { view }
func updateUIView(_ uiView: AvatarRendererView, context: Context) {}
}
#endif
// In your SwiftUI tree:
if let renderer = lifecycle.renderer {
AvatarHost(view: renderer)
.frame(width: 280, height: 280)
.clipShape(Circle())
}
5. Add the hardware gate (iOS)
@main
struct MyApp: App {
var body: some Scene {
WindowGroup {
switch HardwareCheck.evaluate() {
case .supported:
ContentView()
case .unsupported(let reason):
UnsupportedDeviceView(reason: reason)
}
}
}
}
What just happened
You’ve integrated a real-time conversational AI that runs entirely
on the device. Speech recognition, language model, voice synthesis,
animated face — none of it touches the network.
Next
- Reference apps — Mac, iPad, and iPhone source-available apps that consume the SDK. Drag-drop face swap, PiP, Stage Manager widget, all wired up.
- Platform-specific guides — entitlements, distribution, and host-specific patterns for macOS and iOS / iPadOS.
- CLI —
brew install bithuman-cli for a no-code Mac terminal app built on the same SDK.
- Troubleshooting — every error pattern with the fix.