Can you give more details about the sliding window size? #176
Unanswered
elephantpanda
asked this question in
Q&A
Replies: 1 comment
-
That is essentially correct, yes. The main audio embedding model (trained by Google) expects a time step of 80 ms (at 16 khz), so in general that is what I recommend. You can go lower or higher than this, but performance may decrease the farther away you get from 80 ms. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi I am trying to implement this in Unity.
I am trying to find out what is the sliding window size and how much it is slid each step?
Presumably each window embedding outputs a vector of size (1,1,1,1,96)
Then you take several of these vectors and use a to try to detect a word with input something like: (1,22,96)
Is that correct? I have found that an audio sample size of 12600 gives the correct size but not sure about how much the window increments each step.
Beta Was this translation helpful? Give feedback.
All reactions