Skip to main content
Learn how to prepare and upload media for optimal avatar generation results. Media Upload Guide

Image Upload

Perfect for: Facial likeness and character appearance

Requirements

RequirementValue
File SizeLess than 10MB
CharactersOne person only
PositionCentered in frame
OrientationFront-facing
ExpressionCalm and gentle
QualityHigh resolution, well-lit

Best Practices

  • Good lighting — avoid shadows on face
  • Clear focus — sharp, not blurry
  • Solo shots — no other people visible
  • Neutral expression — avoid extreme emotions
  • Professional quality — passport-style photos work well

Video Upload

Perfect for: Movement patterns and dynamic expressions

Requirements

RequirementValue
DurationLess than 30 seconds
CharactersOne person only
PositionCentered in frame
MovementMinimal distracting movement
QualityHigh resolution, stable footage

Best Practices

  • Stable camera — use tripod if possible
  • Consistent framing — keep character centered
  • Subtle movements — gentle head movements, natural blinking
  • Good lighting — consistent throughout video
  • Audio optional — focus on visual quality

Voice Upload

Perfect for: Voice cloning and personalized speech patterns

Requirements

RequirementValue
DurationLess than 1 minute
QualityClear voice, no background noise
FormatMP3, WAV, or M4A
ContentNatural speech in your target language

Best Practices

  • Record in a quiet environment
  • Use a good quality microphone
  • Speak naturally and clearly
  • Avoid music or sound effects
  • Include varied sentences for better voice modeling

Media Priority System

Understanding how different uploads influence and overwrite each other:

Key Priority Rules

  1. Video > Image — Video always overwrites image for likeness
  2. Image = Auto-Prompt — Images auto-generate persona, making manual prompts optional
  3. Voice — When uploaded, replaces auto-generated voice
  4. Prompt — Required only when no image/video provided

Upload Combinations

CombinationWhat Happens
Prompt OnlyGenerates likeness, voice, and movement from text description
Image OnlyUses image for likeness, auto-generates persona and voice
Voice + ImageImage for likeness, voice for speech patterns
Video + Voice + PromptFull character control — video for likeness, voice for speech, prompt for personality

Best Practices

Start simple. Upload an image for instant results, or use prompts for creative characters. You can always add voice or refine later.
Recommended Approaches:
  • Prompts Only — Good for creative/fictional characters
  • Image Only — Instant avatar from photo (no prompt needed)
  • Image + Voice — Realistic character recreation
Common Issues and Fixes:
IssueFix
Poor lighting in images/videosUse photo editing to improve lighting
Background noise in audioRecord audio in quiet spaces
Multiple people in frameCrop images to show only target person
Excessive movement in videosKeep movements subtle and natural