Skip to content

A code using which you can now give voice and face to any AI model, written in python

License

Notifications You must be signed in to change notification settings

AnirudhG07/LipSyncing-Project

Repository files navigation

Giving AI Face and Voice

LipSyncing-Project


This is a project to develop voice and LipSyncing of text to see if we can create a model where AI can have addition such features. This project uses regular python and not Neural Networks and ML.

Final Conclusion of this work is- "Just speak to your laptop and NOW AI HAS A FACE WITH VOICE with its Output Response!"

The project involves mainly 3 Sub-topics!

    1. Text to Lip Mouthing i.e. giving a Face
    2. A Voice Behind it
    3. An AI behind it


FLOWCHART OF FUNCTIONING:

YOUR_VOICE / TEXT(input) ---> AI MODEL reponse text ---> LipSyncing+ Voice ---> AI Response with Audio+Face

STEP 1: LipSyncing(LS)

I have used freely available png's of a man and depending on the input text. I use METAPHONE library of python to create some level of phonetics to change input sentences to meta_word or meta_sentence(in my language). Each character is read and images are consecutively printed with fast speed to see moving mouth.

For example:

'Hello world' translates to 'helo vorlt'
Helo is displayed as:


h -> e -> l -> o

Another attempt as been made to put fade between transition of images.

STEP 2: Voice

Three different voice libraries which are pyttsx3, gtts and whisper AI(from openai) are used to successfully produce voice simultaneously with the text.
The voice and the image projection run independently and hence for each model, the voice has to be adjusted with different speeds manually. The appropriate readings are written in #comments in the code. You can choose the model and automatically from pre-set data image video and audio will be outputted simulataneously creating an effect of a speaking man.
Text without punctuations have proved almost 90% perfect voice and image_video fluency. Text with punctuation may sometime create discrapency due to uneven voice output of models used.

STEP 3: AI Model Application

AI models now can use this above made model and can have a face with a voice now! Just input your question (or prompt) to your AI (preferably text-generation model) and it will produce required output with a voice and a face mouthing it! In future as this project goes you may see ChatGPT or other AI's having a face and voice! One of example models made and pushed is Sentiment Analysis AI.


Extra Perk: Voice to Text(STT)

Just speak to your laptop, with python speech recognition and python speech to text, the converted text will go as STEP-3 procedes!


Scope of improvement

  • The audio libraries used are variable and hard to control as it runs independently. Especially during punctuations, the uneven pause breaks the flow, and some words where it takes an exceptionally long time to speak. For example, the prefix 'un', pyttsx3 speaks as unnnnbiased or unnnable, while my images run u-n-a-b-l-e png's.

  • Since voice and image projection run independently, they tend to deviate sometimes in between due to some words spoken differently compared to meta_word and for long texts, it may go out of sync too.

  • This is why I am not able to properly fix it to gpt-3.5-turbo or gpt-4 because the text it produces has good quantity of punctuation. To maintain flow of voice you cannot just remove the punctuation to make it perfect (that it will speak perfectly).

  • Whisper AI especially is very hard to use because it has an uneven big long pause((for different lengths of sentences) before it starts outputting any voice, so that is not very promising model to use.

About

A code using which you can now give voice and face to any AI model, written in python

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages