🎬🧠 AI Avatar Agentic Flow — Voice Clone + SadTalker (20s MP4)

Upload a short (~30s) video to clone the voice, then generate a 20-second talking avatar with lip-sync and head movements.

Features:

  • Automatic voice reference detection using VAD
  • Voice cloning with TTS fallbacks (XTTS v2 → edge-tts)
  • Animated avatar with SadTalker
  • 20-second output with perfect audio sync
Language
0 5
0.5 2
0.5 2
15 30

🧰 Tips & Troubleshooting

  • Processing Time: First run may take longer due to model downloads
  • Audio Length: Output is enforced to exactly 20 seconds
  • Voice Reference: Auto-finds ~6s speech chunk using Silero VAD
  • Language Support: XTTS v2 supports multiple languages
  • Fallbacks: Script generation and TTS have multiple fallback options