You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sometimes there might not be any ε , sometimes only 1a and/or 1b box (but at least 1a and 1b is required!! to get 'ab').
The hint is we need to just find 4 boxes, where 'a' begins , 'a' ends, 'b' begins and 'b' ends. Since, after 'a' ends, we don't have to worry about 'a' and same goes for 'b'. Fill up rest of the boxes with 'ε' .
At first, I thought we can add 4 boxes (i.e., 2U) to T and then find 4 boxes which represent the start and end points of 'a' and 'b' in order and then fill up rest with 'ε'. However, what if we choose 4 adjacent boxes? Then we cannot fill and 'a' or 'b'. The trick is to then add just 2 boxes (i.e., U) to T and find 4 boxes and now box number 1 and 3 will contain 'a' and 'b' respectively as their start points. This makes sure whatever 4 boxes we choose, we always end up with at least 1a and 1b.
So, the final answer is C(U+T,2T). Choose 2T boxes from U+T boxes, fill T start boxes with T characters respectively and rest with 'ε'.
What is the significance of this calculation?
The number of different alignments is what we have to go through to find the most probable alignment for a given audio file. CTC aims to reduce this search region by using dynamic programing. I will try to explain that next time.
If you have a simpler way of understanding the combination result, please share it. Thanks and good day! 🐸
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
This article presumes some basics about STT knowledge from the reader.
When reading about CTC loss on distill , I encountered the following sentence :
[{...ε...} {...a...} {...ε...} {...b...} {...ε...}]
.What is the significance of this calculation?
The number of different alignments is what we have to go through to find the most probable alignment for a given audio file. CTC aims to reduce this search region by using dynamic programing. I will try to explain that next time.
If you have a simpler way of understanding the combination result, please share it. Thanks and good day! 🐸
Beta Was this translation helpful? Give feedback.
All reactions