-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tokenizer.save_pretrained fails when add_special_tokens=True|False #28472
Comments
It is indeed related to that PR, but it is also related to the fact that |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Could you please let me know how to overcome this issue or if this is addressed in any of the latest releases of transformers library. This seems have further impact with save_checkpoints in the Trainer of the SFTTrainer and causing it to fail. |
Hey a PR was open see #31233 |
System Info
transformers-4.34
python-3.11
Who can help?
@ArthurZucker
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
The snippet:
add_special_tokens=
being present, absent, True/False on 4.33 and belowadd_special_tokens=
is not added to the list of tokenizer parameters on 4.34+add_special_tokens=
is present in parameters (with both True/False values) on 4.34+ with the following error:The issue happens on any tokenizer, not only on LLama one. I can confirm it failing the same way on
bert-base-uncased
If you go to the
tokenization_utils_base
and dump the tokenizer_config just before thejson.dumps
, you may see thatadd_special_tokens
surprizingly became a method, and not a bool:My feeling that the issue is related to the #23909 PR which refactored a lot of tokenizer internals, so in the current version:
add_special_tokens
is a part of kwargs passed to the tokenizerSpecialTokensMixin.add_special_tokens
having the same namejson.dumps
, the method is being serialized instead of the kwargs parameter.Expected behavior
Not crashing with
TypeError: Object of type method is not JSON serializable
as in was pre #23909 in 4.33.The text was updated successfully, but these errors were encountered: