Replies: 1 comment 3 replies
-
The values are actually speech and non-speech probabilities.
|
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi. Thank you very much for developing very nice SAD tool.
Let me ask a question about the SAD data-structure of pyannote-audio.
In my understanding, the dimension of speech activity detection scores is 1, as showing the following graph:
https://raw.githubusercontent.com/pyannote/pyannote-audio/master/tutorials/pretrained/model/segmentation.png
However, in the following code (like as the tutorial page https://github.com/pyannote/pyannote-audio/tree/master/tutorials/pretrained/model),
sad_scores = sad(test_file)
I noticed that the data structure of sad_scores has 2-dimensional array as pyannote.core.segment.SlidingWindow object.
I think one of the two dimensions must be for the values of sad_scores, but what is the values in the remaining another dimension?
Are these the output and h_t (score) of LSTM?
Also, what is the length of the scores compared to the wave length?
Thanks for any answer.
Beta Was this translation helpful? Give feedback.
All reactions