Can you give more details about the sliding window size? #176

elephantpanda · 2024-05-30T16:00:56Z

elephantpanda
May 30, 2024

Hi I am trying to implement this in Unity.

I am trying to find out what is the sliding window size and how much it is slid each step?

Presumably each window embedding outputs a vector of size (1,1,1,1,96)

Then you take several of these vectors and use a to try to detect a word with input something like: (1,22,96)

Is that correct? I have found that an audio sample size of 12600 gives the correct size but not sure about how much the window increments each step.

dscripka · 2024-06-13T00:17:58Z

dscripka
Jun 13, 2024
Maintainer

That is essentially correct, yes. The main audio embedding model (trained by Google) expects a time step of 80 ms (at 16 khz), so in general that is what I recommend. You can go lower or higher than this, but performance may decrease the farther away you get from 80 ms.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can you give more details about the sliding window size? #176

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Can you give more details about the sliding window size? #176

elephantpanda May 30, 2024

Replies: 1 comment

dscripka Jun 13, 2024 Maintainer

elephantpanda
May 30, 2024

dscripka
Jun 13, 2024
Maintainer