You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The python/rust upstream transformer tokenizer save_pretrained function adds a new key on the model level in the tokenizer.json configuration. model.byte_fallback which causes an exception when calling the native createTokenizerFromString. Maybe related to using a older rust version of the transformer tokenizers?
Expected Behavior
Able to load the tokenizer from a tokenizer.json file.
Error Message
Caused by: java.lang.RuntimeException:
data did not match any variant of untagged enum PreTokenizerWrapper at line 73 column 3
at ai.djl.huggingface.tokenizers.jni.TokenizersLibrary.createTokenizerFromString(Native Method)
How to Reproduce?
Install a recent version of the transformers library
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('intfloat/multilingual-e5-small')
tokenizer.save_pretrained("saved")
Attempt to load the saved tokenizer.json file with 0.27.0 using HuggingFaceTokenizer.newInstance
The text was updated successfully, but these errors were encountered:
Thank you for the swift reply! Yes, using pre-existing tokenizer files works great, but if people do any type of changes and saves the tokenizer file, it breaks.
Description
The python/rust upstream transformer tokenizer save_pretrained function adds a new key on the model level in the tokenizer.json configuration.
model.byte_fallback
which causes an exception when calling the native createTokenizerFromString. Maybe related to using a older rust version of the transformer tokenizers?Expected Behavior
Able to load the tokenizer from a tokenizer.json file.
Error Message
How to Reproduce?
transformers
libraryAttempt to load the saved
tokenizer.json
file with 0.27.0 usingHuggingFaceTokenizer.newInstance
The text was updated successfully, but these errors were encountered: