You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
using vs-code debugger, when the list-comprehension omits comparison, such as the following, it is fast.
tokens_to_add = [
token
for index, token in sorted(added_tokens_decoder.items(), key=lambda x: x[0])
]
There is comment saying
# The following logic will be replace with a single add_tokens once a fix is pushed to tokenizers
# allows converting a slow -> fast, non-legacy: if the `tokenizer.json` does not have all the added tokens
# uses the information stored in `added_tokens_decoder`.
# this is costly for fast tokenizers as we re-compute the regex again. But not all tokens are added tokens
Any plan to replace this code?
Expected behavior
Faster tokenizer initialization
The text was updated successfully, but these errors were encountered:
System Info
Ubuntu 20.04.2 LTS
transformers==4.40.1
Who can help?
@ArthurZucker
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
reason: this list-comprehension takes long time
transformers/src/transformers/tokenization_utils_fast.py
Line 173 in 2b9e252
using vs-code debugger, when the list-comprehension omits comparison, such as the following, it is fast.
There is comment saying
Any plan to replace this code?
Expected behavior
Faster tokenizer initialization
The text was updated successfully, but these errors were encountered: