You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using spacy-affixes as part of the SpaCy pipeline, as explained in the usage guide. It has been working properly until I tried the following sentence: "Sube el paro". When doing nlp("Sube el paro.") I'm getting the following error:
Traceback (most recent call last):
File "/home/usuario/.local/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3319, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-21-751769ff6949>", line 1, in <module>
nlp("Sube el paro.")
File "/home/usuario/.local/lib/python3.6/site-packages/spacy/language.py", line 435, in __call__
doc = proc(doc, **component_cfg.get(name, {}))
File "/home/usuario/.local/lib/python3.6/site-packages/spacy_affixes/main.py", line 163, in __call__
self.apply_rules(retokenizer, token, rule)
File "/home/usuario/.local/lib/python3.6/site-packages/spacy_affixes/main.py", line 140, in apply_rules
token, [*rule["affix_text"], token_sub], heads
File "_retokenize.pyx", line 88, in spacy.tokens._retokenize.Retokenizer.split
ValueError: [E117] The newly split tokens must match the text of the original token. New orths: subSube. Old text: Sube.
From my experience and tries, I can say the bug happens with texts like:
nlp("Sube el paro.")
nlp("Sube")
nlp("Subir")
nlp("Subiendo")
But not with texts like:
nlp("sube el paro.")
nlp("sube")
nlp("Subasta")
nlp("Subimos")
Given the error thrown, something related to matching prefix "sub" might be messing things up.
My configuration
Ubuntu 18.04.3 LTS
Python 3.6.9
spacy-affixes 0.1.4
spacy 2.2.3
The text was updated successfully, but these errors were encountered:
In out experience, prefix splitting can cause more trouble than is worth. We're looking at the problematic Freeling rule (^sub) to figure out a solution. In the meantime, you could try only using suffixes rules (e.g., clitics) if that fits your scenario. We use something like this in other projects:
Thank you very much @versae for your workaround, it solved the problems mentioned. I'll keep an eye on any solutions you find on the Freeling rule issue.
I'm using
spacy-affixes
as part of the SpaCy pipeline, as explained in the usage guide. It has been working properly until I tried the following sentence: "Sube el paro". When doingnlp("Sube el paro.")
I'm getting the following error:From my experience and tries, I can say the bug happens with texts like:
But not with texts like:
Given the error thrown, something related to matching prefix "sub" might be messing things up.
My configuration
The text was updated successfully, but these errors were encountered: