Optimization Suggestion for 'torchaudio.functional.merge_tokens' Function #3614

eyalcohen308 · 2023-09-21T11:40:30Z

🚀 The feature

Optimization of the merge_tokens function in 'torchaudio.functional.merge_tokens' to leverage PyTorch's tensor operations for improved efficiency and reduced runtime.

Proposed Implementation:

def merge_tokens_optimized(tokens: torch.Tensor, scores: torch.Tensor, blank: int = 0) -> List[TokenSpan]:
    """Removes repeated tokens and blank tokens from the given CTC token sequence

    Args:
        tokens (torch.Tensor): Alignment tokens (unbatched)
        scores (torch.Tensor): Alignment scores (unbatched)
        blank (int, optional): Blank token. Defaults to 0.

    Returns:
        List[TokenSpan]: list of TokenSpan
    """

    # Compute the difference between consecutive tokens. prepend and append a -1 to make sure the first and last
    # tokens are not removed
    diff = torch.diff(
        tokens, prepend=torch.tensor([-1], device=tokens.device), append=torch.tensor([-1], device=tokens.device)
    )
    # Compute the change points and mask out the points where the new value is blank
    changes_wo_blank = torch.nonzero((diff != 0)).squeeze().tolist()

    tokens = tokens.tolist()
    # Create a TokenSpan for each change point.
    spans = [
        TokenSpan(token=token, start=start, end=end, score=scores[start:end].mean().item())
        for start, end in zip(changes_wo_blank[:-1], changes_wo_blank[1:])
        if (token := tokens[start]) != blank
    ]
    return spans

Motivation, pitch

While working on some CTC token sequence operations, I noticed that the current implementation of the merge_tokens function could benefit from tensor operations, potentially offering significant runtime improvements. Specifically, the current loop-based approach might be streamlined by taking advantage of PyTorch's capabilities. My tests indicate that my proposed optimization runs approximately 7 times faster over 1000 runs while producing equivalent outputs.

Alternatives

Proposed tensor-based optimization: This approach leverages PyTorch's tensor operations, eliminating explicit loops and potentially considerably reducing execution time.

Additional context

To validate the accuracy of the optimized function, a method was crafted to verify the outputs of both the original and proposed functions:

import torchaudio.functional as F
from torchaudio.functional import TokenSpan

def token_span_outputs_equal(span_list_1: List[TokenSpan], span_list_2: List[TokenSpan]) -> bool:
    """Compares two lists of TokenSpan objects for equality."""
    
    if len(span_list_1) != len(span_list_2):
        return False

    for span_1, span_2 in zip(span_list_1, span_list_2):
        if (
            span_1.token != span_2.token or 
            span_1.start != span_2.start or 
            span_1.end != span_2.end or 
            abs(span_1.score - span_2.score) > 1e-6  # Allowing a small tolerance for floating-point comparisons
        ):
            return False

    return True
    
   
output_original = F.merge_tokens(aligned_tokens, alignment_scores)
output_optimized = merge_tokens_optimized(aligned_tokens, alignment_scores)

assert token_span_outputs_equal(output_original, output_optimized), "Outputs are not equivalent!"

Benchmark code:

import timeit

pytorch_time = timeit.timeit(lambda: F.merge_tokens(aligned_tokens, alignment_scores), number=1000)
new_method_time = timeit.timeit(lambda: merge_tokens_optimized(aligned_tokens, alignment_scores), number=1000)

print(f"PyTorch method: {pytorch_time}")
print(f"New method: {new_method_time}")

Output:

Pytorch method: 7.622203521430492
New method: 1.0753349959850311

Integrating this optimization could bring about significant improvements in both performance and maintainability, benefiting a broad spectrum of users.

The text was updated successfully, but these errors were encountered:

mthrok · 2023-09-21T13:17:54Z

Hi @eyalcohen308

Thanks for the suggestion. x7 improvement sounds great. Would you like to make a PR? If not I will try to find some time adopting this later.

There are unit tests for this, which you can run as (cd test && pytest torchaudio_unittest/functional/functional_cpu_test.py -k test_merge_repeated_tokens).

audio/test/torchaudio_unittest/functional/functional_impl.py

Lines 1232 to 1283 in 0ff6d26

    
               @parameterized.expand( 
        
                   [ 
        
                       ([], [], []), 
        
                       ([F.TokenSpan(1, 0, 1, 1.0)], [1], [1.0]), 
        
                       ([F.TokenSpan(1, 0, 2, 0.5)], [1, 1], [0.4, 0.6]), 
        
                       ([F.TokenSpan(1, 0, 3, 0.6)], [1, 1, 1], [0.5, 0.6, 0.7]), 
        
                       ([F.TokenSpan(1, 0, 1, 0.8), F.TokenSpan(2, 1, 2, 0.9)], [1, 2], [0.8, 0.9]), 
        
                       ([F.TokenSpan(1, 0, 1, 1.0), F.TokenSpan(2, 1, 3, 0.5)], [1, 2, 2], [1.0, 0.4, 0.6]), 
        
                       ([F.TokenSpan(1, 0, 1, 0.8), F.TokenSpan(1, 2, 3, 1.0)], [1, 0, 1], [0.8, 0.9, 1.0]), 
        
                       ([F.TokenSpan(1, 0, 1, 0.8), F.TokenSpan(2, 2, 3, 1.0)], [1, 0, 2], [0.8, 0.9, 1.0]), 
        
                       ([F.TokenSpan(1, 0, 1, 1.0), F.TokenSpan(1, 2, 4, 0.5)], [1, 0, 1, 1], [1.0, 0.1, 0.4, 0.6]), 
        
                       ([F.TokenSpan(1, 0, 1, 1.0), F.TokenSpan(2, 2, 4, 0.5)], [1, 0, 2, 2], [1.0, 0.1, 0.4, 0.6]), 
        
                       ([F.TokenSpan(1, 0, 1, 1.0), F.TokenSpan(1, 3, 4, 0.4)], [1, 0, 0, 1], [1.0, 0.9, 0.7, 0.4]), 
        
                       ([F.TokenSpan(1, 0, 1, 1.0), F.TokenSpan(2, 3, 4, 0.4)], [1, 0, 0, 2], [1.0, 0.9, 0.7, 0.4]), 
        
                       ([F.TokenSpan(1, 0, 1, 1.0), F.TokenSpan(1, 3, 5, 0.5)], [1, 0, 0, 1, 1], [1.0, 0.9, 0.8, 0.6, 0.4]), 
        
                       ([F.TokenSpan(1, 0, 1, 1.0), F.TokenSpan(2, 3, 5, 0.5)], [1, 0, 0, 2, 2], [1.0, 0.9, 0.8, 0.6, 0.4]), 
        
                       ([F.TokenSpan(1, 0, 2, 0.9), F.TokenSpan(2, 2, 3, 0.5)], [1, 1, 2], [1.0, 0.8, 0.5]), 
        
                       ([F.TokenSpan(1, 0, 2, 0.9), F.TokenSpan(1, 3, 4, 0.7)], [1, 1, 0, 1], [1.0, 0.8, 0.1, 0.7]), 
        
                       ([F.TokenSpan(1, 0, 2, 0.9), F.TokenSpan(2, 3, 4, 0.7)], [1, 1, 0, 2], [1.0, 0.8, 0.1, 0.7]), 
        
                       ([F.TokenSpan(1, 0, 2, 0.9), F.TokenSpan(1, 3, 5, 0.4)], [1, 1, 0, 1, 1], [1.0, 0.8, 0.1, 0.5, 0.3]), 
        
                       ([F.TokenSpan(1, 0, 2, 0.9), F.TokenSpan(2, 3, 5, 0.4)], [1, 1, 0, 2, 2], [1.0, 0.8, 0.1, 0.5, 0.3]), 
        
                       ([F.TokenSpan(1, 0, 2, 0.9), F.TokenSpan(1, 4, 5, 0.3)], [1, 1, 0, 0, 1], [1.0, 0.8, 0.1, 0.5, 0.3]), 
        
                       ([F.TokenSpan(1, 0, 2, 0.9), F.TokenSpan(2, 4, 5, 0.3)], [1, 1, 0, 0, 2], [1.0, 0.8, 0.1, 0.5, 0.3]), 
        
                       ( 
        
                           [F.TokenSpan(1, 0, 2, 0.9), F.TokenSpan(1, 4, 6, 0.2)], 
        
                           [1, 1, 0, 0, 1, 1], 
        
                           [1.0, 0.8, 0.6, 0.5, 0.3, 0.1], 
        
                       ), 
        
                       ( 
        
                           [F.TokenSpan(1, 0, 2, 0.9), F.TokenSpan(2, 4, 6, 0.2)], 
        
                           [1, 1, 0, 0, 2, 2], 
        
                           [1.0, 0.8, 0.6, 0.5, 0.3, 0.1], 
        
                       ), 
        
                   ] 
        
               ) 
        
               def test_merge_repeated_tokens(self, expected, tokens, scores): 
        
                   scores_ = torch.tensor(scores, dtype=torch.float32, device=self.device) 
        
                   tokens_ = torch.tensor(tokens, dtype=torch.int64, device=self.device) 
        
                   spans = F.merge_tokens(tokens_, scores_, blank=0) 
        
                   print(tokens_, scores_) 
        
                   self._assert_tokens(spans, expected) 
        
                   # Append blanks at the beginning and at the end. 
        
                   for num_prefix, num_suffix in itertools.product([0, 1, 2], repeat=2): 
        
                       tokens_ = ([0] * num_prefix) + tokens + ([0] * num_suffix) 
        
                       scores_ = ([0.1] * num_prefix) + scores + ([0.1] * num_suffix) 
        
                       tokens_ = torch.tensor(tokens_, dtype=torch.int64, device=self.device) 
        
                       scores_ = torch.tensor(scores_, dtype=torch.float32, device=self.device) 
        
                       expected_ = [F.TokenSpan(s.token, s.start + num_prefix, s.end + num_prefix, s.score) for s in expected] 
        
                       print(tokens_, scores_) 
        
                       spans = F.merge_tokens(tokens_, scores_, blank=0) 
        
                       self._assert_tokens(spans, expected_)

eyalcohen308 · 2023-09-21T13:36:35Z

@mthrok I will open a PR, thanks

Optimizes merge_tokens method as discussed in #3614 Co-authored-by: Eyal Cohen <[email protected]>

mthrok · 2023-09-21T17:33:49Z

@eyalcohen308 Thanks it's been merged. ~~Note that we have a release 2.1 scheduled in a couple of weeks, but the library code for release branch is finalized. So this commit will not be part of the 2.1. It will be part of 2.2.~~ It's cherry-picked to 2.1.

Optimizes merge_tokens method as discussed in #3614 Co-authored-by: Eyal Cohen <[email protected]>

mthrok · 2023-09-21T18:15:53Z

FYI I confirm that the optimization works for CPU and CUDA

Device	CPU	CUDA
old	4.31	16.61
new	1.89	4.49

[seconds]

code

import timeit

import torch
import torchaudio.functional as F


def test(device):
    d = torch.device(device)
    tokens = torch.randint(256, (200, ), dtype=torch.int32, device=d)
    scores = torch.randn((200, ), dtype=torch.float32, device=d)
    elapsed = timeit.timeit(lambda: F.merge_tokens(tokens, scores), number=1000)
    print(f"{device}: {elapsed}")


test("cpu")
test("cuda")

eyalcohen308 mentioned this issue Sep 21, 2023

[Cherry-picked 2.1] Optimize merge_tokens method #3615

Merged

mthrok closed this as completed in #3615 Sep 21, 2023

mthrok pushed a commit that referenced this issue Sep 21, 2023

Optimize merge_tokens method (#3615)

0a27bb0

Optimizes merge_tokens method as discussed in #3614 Co-authored-by: Eyal Cohen <[email protected]>

mthrok pushed a commit that referenced this issue Sep 21, 2023

Optimize merge_tokens method (#3615)

6ea1133

Optimizes merge_tokens method as discussed in #3614 Co-authored-by: Eyal Cohen <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization Suggestion for 'torchaudio.functional.merge_tokens' Function #3614

Optimization Suggestion for 'torchaudio.functional.merge_tokens' Function #3614

eyalcohen308 commented Sep 21, 2023 •

edited by mthrok

Loading

mthrok commented Sep 21, 2023

eyalcohen308 commented Sep 21, 2023

mthrok commented Sep 21, 2023 •

edited

Loading

mthrok commented Sep 21, 2023

Optimization Suggestion for 'torchaudio.functional.merge_tokens' Function #3614

Optimization Suggestion for 'torchaudio.functional.merge_tokens' Function #3614

Comments

eyalcohen308 commented Sep 21, 2023 • edited by mthrok Loading

🚀 The feature

Motivation, pitch

Alternatives

Additional context

mthrok commented Sep 21, 2023

eyalcohen308 commented Sep 21, 2023

mthrok commented Sep 21, 2023 • edited Loading

mthrok commented Sep 21, 2023

eyalcohen308 commented Sep 21, 2023 •

edited by mthrok

Loading

mthrok commented Sep 21, 2023 •

edited

Loading