-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
error indexing a text document with languagetool-wikipedia index #364
Comments
That's strange, I copied that text to a plain text file and call |
I can reproduce the issue with the file you sent. I assume the error is in |
Removing some parts of the string stops the bug from appearing but I have no idea why. It seems really random. |
Shorter than 265 chars: it works, longer: doesn't work. |
But with 260 spaces it no longer gives a bug. Even 260 spaces with random characters at the end doesn't produce it. |
… yet 260 hashtags does the trick. |
Should be fixed now. |
Hm, I stil get it. |
You mean with exactly the same file, the one you sent me via email? I cannot reproduce with that. |
No, another one, just sent you the sample over the email (it was from the same corpus). |
Seems my last fix only moved the problem from texts with more than 255 chars to texts with more than 2*255 chars. However, I don't understand what exactly the problem is and I won't be able to spend more time on fixing it for the time being, sorry. |
I guess then the easy workaround is to use the |
I don't remember, you'll need to check the source. |
…e bug in the text indexer (gihub issue #364)
Seems to be fixed (roughly). |
When indexing this sentence in a plain-text UTF-8 file:
Dwa dni później, 1 sierpnia, ministrowie finansów Wspólnoty Europejskiej nie mogąc już dłużej walczyć z rynkiem podjęli decyzję o rozszerzeniu z 2, 25 (w przypadku hiszpańskiej pesety i portugalskiego eskudo z 6).o 15 proc. granic wahań kursów w ramach ESW.
I get the following error:
Exception in thread "main" java.lang.IllegalArgumentException: startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=0,endOffset=0,lastStartOffset=253 for field 'field' at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:641) at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:344) at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:300) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:232) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:458) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1363) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1142) at org.languagetool.dev.index.Indexer.add(Indexer.java:173) at org.languagetool.dev.index.Indexer.indexText(Indexer.java:136) at org.languagetool.dev.index.Indexer.run(Indexer.java:109) at org.languagetool.dev.index.Indexer.main(Indexer.java:73) at org.languagetool.dev.wikipedia.Main.main(Main.java:54) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
The text was updated successfully, but these errors were encountered: