Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements implemented in the audio processing module #2390

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mrfelpa
Copy link

@mrfelpa mrfelpa commented Oct 15, 2024

  • The relative import causing errors when the script was executed directly, so I changed from .utils import exact_div to from utils import exact_div. I also implemented a function (get_hann_window) to avoid repeated Hann window calculations.

whisper/audio.py Outdated

log_spec = torch.clamp(mel_spec, min=1e-10).log10()

log_spec_normalized = (log_spec + 4.0) / 4.0
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Nisarg236 I did it separately to try to keep the code clearer, but yes, it is possible to combine the two.

- I've implemented an update to the load_audio function to provide better control over the process, allowing for better error handling and resource management. Specifically, ffmpeg output streams are now explicitly handled, decoding errors are caught

- I implemented a fix in the load_audio function, direct shell command execution (which can introduce vulnerabilities) has been avoided. The ffmpeg command now uses hide_banner to avoid displaying sensitive information.

- Additional input validation checks were incorporated to verify function arguments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant