-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libfvad classifies any noice as a human voice. #23
Comments
That's my experience, too. |
me too |
For me it classifies Music as Human voice. Can anyone confirm whether it is trained to detect Music as Human Voice or not? |
I have done some looking into this, and in my opinion these problems are due to the nature of the WebRTC Voice Activity Detection algorithm. It does online estimation that attempts to separate "background" (slowly changing) from "foreground" (rapidly changing). This is done using a Gaussian Mixture Model over 6 frequency sub-bands, with coefficients set to prefer speech bands. Conceptually, is an energy-based VAD with adaptive threshold. So in practice, it acts more like a novelty detector - any (short) changes to the acoustical signal is considered a likely candidate to be "speech". This means that it is good for:
And that it is not good for:
So if those things are needed, one would need a more advanced algorithm. For example, a model trained on large dataets to separate speech from other sounds. This can possibly be done as a second stage after this VAD. |
Any noice or intense sound is classified as a human voice.
The text was updated successfully, but these errors were encountered: