Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in HF tokenizer when using truncation and optWithOverflowingTokens(false) #3316

Closed
lesters opened this issue Jul 10, 2024 · 1 comment · Fixed by #3317
Closed
Labels
bug Something isn't working

Comments

@lesters
Copy link

lesters commented Jul 10, 2024

We have observed a native memory leak when using ai.djl.huggingface.tokenizers.HuggingFaceTokenizer. Using the default (false) option of optWithOverflowingTokens we see a significant increase in memory usage over time when we have long strings that are truncated to shorter token sequences. When we set optWithOverFlowingTokens to true, we do not see this memory increase. This is particularly evident when very long strings are truncated to short token sequences.

Testing back, this behaviour started in version 0.27.0, and tracing that to the release notes, it seems that this PR might be the culprit: #2957.

Particularly: https://github.com/deepjavalibrary/djl/pull/2957/files#diff-62d10f278a5a7644ce30deff638cf6ead21457bca60b9cc7430d115dd2fa2b38R533-R537

It seems that by calling TokenizersLibrary.LIB.getOverflowing(encoding), a clone will be created that is only cleaned up when withOverflowTokens is true, as toEncoding is then called recursively on the overflowing handles which eventually calls TokenizersLibrary.LIB.deleteEncoding(encoding); on this copy.

So when withOverflowTokens is false, this cleanup does not occur.

@lesters
Copy link
Author

lesters commented Jul 10, 2024

Code to reproduce problem:

var input = "this will become a long string".repeat(256);

var tokenizer = ai.djl.huggingface.tokenizers.HuggingFaceTokenizer.builder()
        .optTokenizerPath(Path.of("src/test/models/huggingface/bert-base-uncased.json"))
        .optMaxLength(5)
        .optTruncation(true)
        .optWithOverflowingTokens(false)
        .build();

while (true) {
    tokenizer.encode(input);
}

Memory usage will increase very rapidly here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant