Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script for ASR inference on long files #2373

Merged
merged 13 commits into from
Jul 14, 2021
Merged

Script for ASR inference on long files #2373

merged 13 commits into from
Jul 14, 2021

Conversation

jbalam-nv
Copy link
Collaborator

  • speech_to_text_buffered_infer.py supports inference on long audio files by running inference on smaller chunks of audio and them merging the tokens for final transcription.
  • This version only supports EncDecCTCModelBPE with AudioMelSpectrogramProcessor as the preprocessor
  • Feature normalization is done on each buffer using "per_feature" normalization, future versions will have other methods for experimentation

@lgtm-com
Copy link

lgtm-com bot commented Jun 18, 2021

This pull request introduces 2 alerts when merging 4351694 into e070e04 - view on LGTM.com

new alerts:

  • 1 for Unused local variable
  • 1 for Unused import

@lgtm-com
Copy link

lgtm-com bot commented Jul 8, 2021

This pull request introduces 2 alerts when merging e00359a into e5bde15 - view on LGTM.com

new alerts:

  • 1 for Unused local variable
  • 1 for Unused import

@jbalam-nv jbalam-nv closed this Jul 13, 2021
Signed-off-by: jbalam <[email protected]>
Copy link
Collaborator

@titu1994 titu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great, minor comments

# Create a preprocessor to convert audio samples into raw features,
# Normalization will be done per buffer in frame_bufferer
# Do not normalize whatever the model's preprocessor setting is
preprocessor_cfg.normalize = "None"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String None?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hack to get through without any normalization. None throws an error here

elif "fixed_mean" in normalize_type and "fixed_std" in normalize_type:
, as we are looking for a key in a dict. We should probably clean up this normalize_batch function and add a "no_norm" option.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok. no_norm sounds like a good option

frame_overlap: duration of overlaps before and after current frame, seconds
offset: number of symbols to drop for smooth streaming
'''
self.ZERO_LEVEL_SPEC_DB_VAL = -16.635
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe some info can be added on how this was calculated?

@titu1994 titu1994 merged commit 1d29828 into main Jul 14, 2021
fayejf pushed a commit that referenced this pull request Jul 16, 2021
* First version of script for buffered inference

Signed-off-by: jbalam-nv <[email protected]>

* Cleaned up commented code and added comments

Signed-off-by: jbalam-nv <[email protected]>

* More clean up and simplified the call to transcribe

Signed-off-by: jbalam-nv <[email protected]>

* Removed unused variables

Signed-off-by: jbalam <[email protected]>

* Style fix

Signed-off-by: jbalam <[email protected]>

* Added a comment for zero_level_spec_db constant

Signed-off-by: jbalam-nv <[email protected]>

* style fix

Signed-off-by: jbalam <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
titu1994 added a commit to titu1994/NeMo that referenced this pull request Jul 20, 2021
* First version of script for buffered inference

Signed-off-by: jbalam-nv <[email protected]>

* Cleaned up commented code and added comments

Signed-off-by: jbalam-nv <[email protected]>

* More clean up and simplified the call to transcribe

Signed-off-by: jbalam-nv <[email protected]>

* Removed unused variables

Signed-off-by: jbalam <[email protected]>

* Style fix

Signed-off-by: jbalam <[email protected]>

* Added a comment for zero_level_spec_db constant

Signed-off-by: jbalam-nv <[email protected]>

* style fix

Signed-off-by: jbalam <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
paarthneekhara pushed a commit to paarthneekhara/NeMo that referenced this pull request Sep 17, 2021
* First version of script for buffered inference

Signed-off-by: jbalam-nv <[email protected]>

* Cleaned up commented code and added comments

Signed-off-by: jbalam-nv <[email protected]>

* More clean up and simplified the call to transcribe

Signed-off-by: jbalam-nv <[email protected]>

* Removed unused variables

Signed-off-by: jbalam <[email protected]>

* Style fix

Signed-off-by: jbalam <[email protected]>

* Added a comment for zero_level_spec_db constant

Signed-off-by: jbalam-nv <[email protected]>

* style fix

Signed-off-by: jbalam <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
@blisc blisc deleted the streaming_asr branch January 11, 2022 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants