Browser rendering

Move avatar rendering out of your server and into the user's tab — ONNX Runtime Web does the mel spectrogram, audio encoder, and frame compositing in WASM, so the video never leaves the machine.

ONNX in the user’s tab

rendering_mode=browser and rendering_mode=avatar are two URL-toggled modes on every bitHuman agent landing page. They move the avatar rendering pipeline (mel spectrogram → ONNX audio encoder → KNN cluster lookup → frame composite) out of your server and into the user’s browser, using ONNX Runtime Web on a WebAssembly backend. The agent worker keeps the brain (STT/LLM/TTS); the GPU on the server is free.

Production-deployed since Feb 2026. No install, no SDK call — flip one URL parameter on an existing agent landing page.

# Browser-side rendering, agent brain still cloud:
https://agent.viewer.bithuman.ai/<AGENT_CODE>?rendering_mode=browser

# Pure client-side puppet (mic-driven, no LiveKit, no agent worker):
https://agent.viewer.bithuman.ai/?rendering_mode=avatar&model_url=<IMX_URL>

Try it on a showcase agent →

When you’d reach for it

  • Server video egress is the bottleneck. Cloud rendering publishes H.264 over LiveKit; browser rendering publishes only the agent’s TTS audio. Bandwidth drops ~10–20×.
  • You’re paying for avatar GPU on the server. Browser mode skips the avatar worker dispatch entirely — the agent-worker pipeline runs STT/LLM/TTS only.
  • Privacy. The rendered video never leaves the user’s machine. Useful for kiosks, healthcare, education.
  • Offline / cached. In avatar mode the IMX is cached in IndexedDB after the first load. Subsequent sessions need no network for the avatar — the brain still does.
  • Cross-device parity. The same WASM pipeline runs in Safari / Chrome / Firefox on macOS, Windows, Linux, iOS, and Android. No per-platform native build.

The three rendering modes

ModeWhere avatar runsWhere brain runsLiveKitAudio source
cloud (default)Server (H.264 video published)ServerYes — video trackServer TTS, server-side
browserBrowser (ONNX WASM, 25 FPS canvas)ServerYes — audio track onlyAgent TTS over LiveKit audio
avatarBrowser (ONNX WASM, 25 FPS canvas)None — pure puppetNoUser’s microphone (getUserMedia)

cloud is the production default. The new modes are opt-in via URL parameter — your existing deployments are unchanged.

Activate it

It’s a URL parameter on the agent landing page. Replace AGENT_CODE with your code from bithuman.ai → Developer:

https://agent.viewer.bithuman.ai/<AGENT_CODE>?rendering_mode=browser

For avatar mode (no agent worker, no LiveKit), pass the IMX model URL directly:

https://agent.viewer.bithuman.ai/?rendering_mode=avatar&model_url=https://your-storage/avatar.imx

The browser downloads the IMX (~50–200 MB, per-agent), the ONNX audio encoder (2.7 MB, shared across all agents), then runs the lip-sync pipeline at 25 FPS on a <canvas>.

What the browser does

MediaStreamTrack (TTS or mic)
  -> AudioContext + AudioWorklet (16 kHz, 640-sample chunks)
  -> Mel spectrogram (80 bins x 16 frames, Bluestein FFT)
  -> ONNX audio encoder (WASM, 512-D embedding)
  -> KNN cluster lookup (183 clusters, L2 distance)
  -> Frame composite (base frame + mouth patch, alpha-blended)
  -> <canvas> @ 25 FPS

The pipeline is bit-compatible with libessence on the server — same .imx file, same cluster centroids, same encoder weights. The browser just runs the inference loop in WASM.

Latency budget

40 ms per frame (25 FPS) on a modest laptop:

StageTypical
Mel FFT5–10 ms
ONNX encoder (WASM)10–20 ms
KNN lookup + composite5–10 ms

Network adds the LiveKit audio-track RTT in browser mode (typically 50–150 ms one-way to the nearest LiveKit edge). In avatar mode there’s no network in the loop at all once the IMX is cached.

Browser requirements

  • SharedArrayBuffer for ONNX Runtime Web multi-threading. Your page needs both Cross-Origin-Opener-Policy: same-origin and Cross-Origin-Embedder-Policy: credentialless (or require-corp) headers. The bitHuman-hosted landing page is already configured.
  • WebAssembly + AudioWorklet — all current Safari, Chrome, Firefox, Edge.
  • IndexedDB — ~50–200 MB of free quota for the IMX cache (per-agent).
  • requestVideoFrameCallback for smooth frame pacing (optional — the pipeline falls back to requestAnimationFrame on older Safari).

Cloud vs browser vs avatar — side by side

cloudbrowseravatar
Server avatar GPUyesnonone
Server brain (STT/LLM/TTS)yesyesnone
LiveKit subscriptionvideo + audioaudio onlynone
Browser ML worknonemel + ONNX + compositemel + ONNX + composite
Server → browser bandwidth0.5–2 Mbps video32–64 kbps audio0
Offline-capablenopartial (brain still needs net)yes (post-cache)
Setupcloud sessionappend ?rendering_mode=browserappend ?rendering_mode=avatar&model_url=…

Try it

A standalone JS/TS SDK is coming

The browser pipeline is currently distributed via the hosted agent-landing page only. A standalone JS/TS SDK that wraps the same pipeline for embedding in your own React / Vue / vanilla app is in Preview — track or comment in Discord.

Where to go next