You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Elements widget uses two different tokenizers: nltk.tokenize.RegexpTokenizer.span_tokenize in tagger.py and spacy in util.py. This leads to alignment problems of the tokens and the part-of-speech tags which can be made visible by adding strict=True to the zip() calls in tagger.py. These two token sets should be harmonized.
The text was updated successfully, but these errors were encountered:
The Elements widget uses two different tokenizers:
nltk.tokenize.RegexpTokenizer.span_tokenize
intagger.py
andspacy
inutil.py
. This leads to alignment problems of the tokens and the part-of-speech tags which can be made visible by addingstrict=True
to thezip()
calls intagger.py
. These two token sets should be harmonized.The text was updated successfully, but these errors were encountered: