You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It looks to me like a single compression_ratio or avg_logprob which fails a threshold check causes the entire batch to have temperature incremented and be re-run with the higher temperature.
As batch_size increases I believe this makes it more likely that a single segment result with a parameter out of bounds will cause the entire batch to be reevaluated with higher temperature. With large enough batch sizes this may create a kind of toggling where temperature rapidly rises to 1.0 (or max) since the higher temperatures may create worse compression_ratio or avg_logprob in another segment in the batch?
I wonder if there's an efficient way to retain the good segments and only re-run the failed segments? The entire inference is re-run against the batch currently so it should be maximally inefficient right now - is there any reason in principle the re-run couldn't be with the smaller batch_size of eg just 1?
The text was updated successfully, but these errors were encountered:
We should be able to track which DecodingResults failed and which ones succeeded and re-run only the failed segments. We could either re-run only the failed segments within transcribe_with_fallback, which would still block further pipeline execution for the rest of the batch, or we could track fallback and temperature on a per-audio basis external of tanscribe_with_fallback. The latter one seems like it could be faster, but the first one could be an easy enough first step.
It looks to me like a single compression_ratio or avg_logprob which fails a threshold check causes the entire batch to have temperature incremented and be re-run with the higher temperature.
As batch_size increases I believe this makes it more likely that a single segment result with a parameter out of bounds will cause the entire batch to be reevaluated with higher temperature. With large enough batch sizes this may create a kind of toggling where temperature rapidly rises to 1.0 (or max) since the higher temperatures may create worse compression_ratio or avg_logprob in another segment in the batch?
I wonder if there's an efficient way to retain the good segments and only re-run the failed segments? The entire inference is re-run against the batch currently so it should be maximally inefficient right now - is there any reason in principle the re-run couldn't be with the smaller batch_size of eg just 1?
The text was updated successfully, but these errors were encountered: