-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimization Suggestion for 'torchaudio.functional.merge_tokens' Function #3614
Comments
Thanks for the suggestion. x7 improvement sounds great. Would you like to make a PR? If not I will try to find some time adopting this later. There are unit tests for this, which you can run as audio/test/torchaudio_unittest/functional/functional_impl.py Lines 1232 to 1283 in 0ff6d26
|
@mthrok I will open a PR, thanks |
Optimizes merge_tokens method as discussed in #3614 Co-authored-by: Eyal Cohen <[email protected]>
@eyalcohen308 Thanks it's been merged. |
Optimizes merge_tokens method as discussed in #3614 Co-authored-by: Eyal Cohen <[email protected]>
FYI I confirm that the optimization works for CPU and CUDA
[seconds] codeimport timeit
import torch
import torchaudio.functional as F
def test(device):
d = torch.device(device)
tokens = torch.randint(256, (200, ), dtype=torch.int32, device=d)
scores = torch.randn((200, ), dtype=torch.float32, device=d)
elapsed = timeit.timeit(lambda: F.merge_tokens(tokens, scores), number=1000)
print(f"{device}: {elapsed}")
test("cpu")
test("cuda") |
🚀 The feature
Optimization of the merge_tokens function in 'torchaudio.functional.merge_tokens' to leverage PyTorch's tensor operations for improved efficiency and reduced runtime.
Proposed Implementation:
Motivation, pitch
While working on some CTC token sequence operations, I noticed that the current implementation of the merge_tokens function could benefit from tensor operations, potentially offering significant runtime improvements. Specifically, the current loop-based approach might be streamlined by taking advantage of PyTorch's capabilities. My tests indicate that my proposed optimization runs approximately 7 times faster over 1000 runs while producing equivalent outputs.
Alternatives
Proposed tensor-based optimization: This approach leverages PyTorch's tensor operations, eliminating explicit loops and potentially considerably reducing execution time.
Additional context
To validate the accuracy of the optimized function, a method was crafted to verify the outputs of both the original and proposed functions:
Benchmark code:
Output:
Integrating this optimization could bring about significant improvements in both performance and maintainability, benefiting a broad spectrum of users.
The text was updated successfully, but these errors were encountered: