This project involves the use of Long Short Term Memory Networks(LSTMs) to predict the emotion exuded from a line of speech. The model was trained using the The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). The dataset contained 7356 audio files of 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent. The speech in the files includes calm, happy, sad, angry, fearful, surprise, and disgust emotions.
This project also includes an interactive web application built using Python(Streamlit) that takes in an audio file and outputs the emotion predicted from the file alongside a message indicating calmness or danger.
https://cate865-speech-emotion-recognition-app-o6nzvx.streamlit.app/