-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Script for ASR inference on long files #2373
Conversation
jbalam-nv
commented
Jun 18, 2021
- speech_to_text_buffered_infer.py supports inference on long audio files by running inference on smaller chunks of audio and them merging the tokens for final transcription.
- This version only supports EncDecCTCModelBPE with AudioMelSpectrogramProcessor as the preprocessor
- Feature normalization is done on each buffer using "per_feature" normalization, future versions will have other methods for experimentation
Signed-off-by: jbalam-nv <[email protected]>
Signed-off-by: jbalam-nv <[email protected]>
Signed-off-by: jbalam-nv <[email protected]>
This pull request introduces 2 alerts when merging 4351694 into e070e04 - view on LGTM.com new alerts:
|
This pull request introduces 2 alerts when merging e00359a into e5bde15 - view on LGTM.com new alerts:
|
Signed-off-by: jbalam <[email protected]>
Signed-off-by: jbalam <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks great, minor comments
# Create a preprocessor to convert audio samples into raw features, | ||
# Normalization will be done per buffer in frame_bufferer | ||
# Do not normalize whatever the model's preprocessor setting is | ||
preprocessor_cfg.normalize = "None" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hack to get through without any normalization. None throws an error here
elif "fixed_mean" in normalize_type and "fixed_std" in normalize_type: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok. no_norm sounds like a good option
frame_overlap: duration of overlaps before and after current frame, seconds | ||
offset: number of symbols to drop for smooth streaming | ||
''' | ||
self.ZERO_LEVEL_SPEC_DB_VAL = -16.635 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe some info can be added on how this was calculated?
Signed-off-by: jbalam-nv <[email protected]>
Signed-off-by: jbalam <[email protected]>
* First version of script for buffered inference Signed-off-by: jbalam-nv <[email protected]> * Cleaned up commented code and added comments Signed-off-by: jbalam-nv <[email protected]> * More clean up and simplified the call to transcribe Signed-off-by: jbalam-nv <[email protected]> * Removed unused variables Signed-off-by: jbalam <[email protected]> * Style fix Signed-off-by: jbalam <[email protected]> * Added a comment for zero_level_spec_db constant Signed-off-by: jbalam-nv <[email protected]> * style fix Signed-off-by: jbalam <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]>
* First version of script for buffered inference Signed-off-by: jbalam-nv <[email protected]> * Cleaned up commented code and added comments Signed-off-by: jbalam-nv <[email protected]> * More clean up and simplified the call to transcribe Signed-off-by: jbalam-nv <[email protected]> * Removed unused variables Signed-off-by: jbalam <[email protected]> * Style fix Signed-off-by: jbalam <[email protected]> * Added a comment for zero_level_spec_db constant Signed-off-by: jbalam-nv <[email protected]> * style fix Signed-off-by: jbalam <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]>
* First version of script for buffered inference Signed-off-by: jbalam-nv <[email protected]> * Cleaned up commented code and added comments Signed-off-by: jbalam-nv <[email protected]> * More clean up and simplified the call to transcribe Signed-off-by: jbalam-nv <[email protected]> * Removed unused variables Signed-off-by: jbalam <[email protected]> * Style fix Signed-off-by: jbalam <[email protected]> * Added a comment for zero_level_spec_db constant Signed-off-by: jbalam-nv <[email protected]> * style fix Signed-off-by: jbalam <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Paarth Neekhara <[email protected]>