Skip to content

A versatile video localization tool that provides dubbing and audio synchronization with real-time pitch, accent, and emotional tone adjustments across multiple languages. ๐Ÿ—ฃ๏ธ

License

Notifications You must be signed in to change notification settings

palak-463/VoiceSyncPro

Repository files navigation

๐ŸŽ™๏ธ VoiceSyncPro

VoiceSyncPro is an innovative platform that transforms how video content is localized. It automates the translation of video audio, synchronizes it seamlessly with visuals, and produces polished, ready-to-use multilingual videos.

โœจ Features:
๐Ÿ–ฅ๏ธ Input & Pre-processing: Splits audio from video for independent processing.
๐Ÿ“ Advanced Transcription & Text Cleaning: Transcribes audio into text with over 95% accuracy, using cutting-edge machine learning models like Pyannote.
๐ŸŒ Context-Aware Translation: Translates transcripts into target languages while preserving semantic meaning and cultural nuances.
๐ŸŽต Audio Feature Analysis: Extracts pitch and tone for generating natural, non-robotic speech.
๐Ÿ‘„ Lip Sync & Audiovisual Synchronization: Synchronizes translated audio with lip movements and video timing for realistic playback.
๐ŸŽญ Expression Modeling: Generates corresponding facial expressions to match audio stress and intonation.
๐Ÿ”— Seamless Integration: Combines all elements into a fully synchronized and localized video output.

๐Ÿš€ Technologies Used:

Category Technology
Machine Learning PyTorch, Google Translate API, DeepL API
Video Processing FFmpeg, Pyannote Diarization System
Frontend Wagner
Backend Gradio, Python

๐Ÿ› ๏ธ System Architecture:
VoiceSyncPro uses a modular architecture to ensure flexibility and high efficiency.

  • Input & Pre-processing: Prepares video/audio for transcription and translation.
  • Transcription Module: Converts spoken dialogue into text with speaker segmentation.
  • Translation Module: Provides accurate, context-aware translations.
  • Audio Feature Extraction: Maintains original pitch and tone in synthesized audio.
  • Lip Sync & Synchronization: Aligns translated audio perfectly with visuals.
  • Expression Modeling: Generates realistic visual expressions from audio stress.
  • Final Output Generation: Produces the localized video, ready for distribution.

๐ŸŒŸ Key Highlights:

  • Speech-to-text transcription achieves 95%+ accuracy even in complex scenarios.
  • Processes 10-minute videos in just 15โ€“30 minutes on average.
  • Handles large file sizes and concurrent requests seamlessly.
  • Simplifies content localization, allowing users to focus on creativity.

๐Ÿ“Š User Benefits:
๐ŸŽฅ Content Creators: Localize videos for a global audience effortlessly.
๐Ÿซ Educators: Overcome language barriers to reach diverse learners.
๐Ÿ“ˆ Businesses: Reduce costs and time associated with manual video translation.

๐Ÿ‘ฉโ€๐Ÿ’ป How to Use:
Drop your video file into the interface.
Select the language for translation.
Receive the synchronized, localized video in minutes.

About

A versatile video localization tool that provides dubbing and audio synchronization with real-time pitch, accent, and emotional tone adjustments across multiple languages. ๐Ÿ—ฃ๏ธ

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published