Local Voice Pipeline on macOS (Essence avatar + Apple Speech)

Looking for the on-device Expression avatar (custom face, 25 FPS, no GPU)? See Expression on macOS. This page covers the local voice stack (STT/TTS/VAD) paired with an Essence avatar.

Preview — private distribution. The bithuman-voice wheel referenced below is not currently published on PyPI. Reach out on Discord or through bitHuman support to request access to the voice pipeline.

Full privacy — speech never leaves your Mac.

Quick Start

Requirements

macOS 13+ (Apple Silicon recommended)
Microphone permissions

Install voice service

Request the bithuman-voice wheel via the channels above, then:

pip install bithuman_voice-1.3.2-py3-none-any.whl

Start voice service

bithuman-voice serve --port 8091

macOS will ask for Speech permissions — approve this.

Install dependencies

pip install bithuman --upgrade livekit-agents openai livekit-plugins-silero

Set environment

export BITHUMAN_API_SECRET="your_secret"
export BITHUMAN_MODEL_PATH="/path/to/model.imx"
export LIVEKIT_API_KEY="your_livekit_key"
export LIVEKIT_API_SECRET="your_livekit_secret"
export LIVEKIT_URL="wss://your-project.livekit.cloud"
export OPENAI_API_KEY="your_openai_key"  # Only for AI brain

Run agent

git clone https://github.com/bithuman-product/bithuman-examples.git
cd bithuman-examples/integrations/macos-offline

View source code on GitHub

python agent.py dev

What It Does

Stays on your Mac:

Speech-to-text (Apple Speech Framework)
Text-to-speech (Apple Voice Synthesis)
Avatar animation (bitHuman)
Voice activity detection

Uses internet:

Only AI conversation (OpenAI LLM)

Privacy benefits:

Voice patterns never leave your device
Apple’s hardware-accelerated speech processing
Full control over your data

Make it 100% Private

For 100% local operation with no internet required, use the complete Docker setup: Complete macOS Offline Example What you get:

Apple Speech Recognition — Local STT
Apple Voices/Siri — Local TTS
Ollama LLM — Local language models (Llama 3.2)
bitHuman Avatar — Real-time facial animation
LiveKit + Web UI — Complete conversation interface
Zero Internet Dependency

git clone https://github.com/bithuman-product/bithuman-examples.git
cd bithuman-examples/integrations/macos-offline

# Request the bithuman-voice wheel via Discord / bitHuman support, then:
pip install bithuman_voice-1.3.2-py3-none-any.whl
bithuman-voice serve --port 8000

ollama run llama3.2:1b
docker compose up
# Access at http://localhost:4202

Enterprise Offline Mode: Contact bitHuman for offline tokens to eliminate all internet requirements for authentication and metering.

Common Issues

Problem	Solution
Voice service won’t start	Check microphone permissions, enable “Speech Recognition” in Privacy & Security
No speech recognition	Restart `bithuman-voice` service, test with built-in dictation
Permission errors	Run voice service from Terminal (not IDE)

Performance

Recommended specs:

M2+ Mac (M4 ideal)
16GB+ RAM
macOS 13+

Next Steps

Raspberry Pi

Edge deployment on IoT devices

AI Conversation

Simpler cloud-based setup

Examples

Documentation Index

​Quick Start

​What It Does

​Make it 100% Private

​Common Issues

​Performance

​Next Steps

Raspberry Pi

AI Conversation

Quick Start

What It Does

Make it 100% Private

Common Issues

Performance

Next Steps