You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
This issue is related to this HuggingFace post on the official forum, hence the similar title, and to my knowledge, no answer was given as to whether this is the normal tokenizer behavior. I ran into the same problem as the original poster while trying to tokenize a sentence after adding new tokens: the adjacent tokens of the newly added ones aren't computed with their preceded escape symbol.
System Info
transformers
version: 4.31Who can help?
@ArthurZucker
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
This issue is related to this HuggingFace post on the official forum, hence the similar title, and to my knowledge, no answer was given as to whether this is the normal tokenizer behavior. I ran into the same problem as the original poster while trying to tokenize a sentence after adding new tokens: the adjacent tokens of the newly added ones aren't computed with their preceded escape symbol.
Expected behavior
The tokens
told
,Frodo
,to
andLothlorien
should be preceded with aĠ
character if I am not mistaken ; e.g.:The text was updated successfully, but these errors were encountered: