-
Notifications
You must be signed in to change notification settings - Fork 641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why is there a drastic difference in search time between lucenenet in c# and lucene in java while the other statistics are roughly comparable? #333
Comments
this is interesting, but this such a different doesn't make scene. 10 seconds to 52 minutes (~180K seconds) is not reasonable in any way. i think you should re-test your program. |
This certainly sounds oversuspicious, we have an index with over a million documents, each with thousands of fields. If i do a free text search (meaning it accesses all fields) starting and ending with a wildcard (eg. |
Without seeing your test, there is not a lot I can tell you. You didn't even mention what versions of Lucene/Lucene.NET you are testing. Of course, the only apples to apples way to test this would be either to run:
Since Lucene 4.8.0 was designed to run with Java 6, you would also need to get a copy of a Java 6 runtime to run it on. And since Java 6 is not available for download from any official source anymore, I strongly suspect you are not doing either of these exact version tests on the version of Java it was designed to run on. Do note that we have recently set up benchmarks across each of the betas, and we have approximately doubled search performance since 4.8.0-beta00007, so if you are testing on an older beta you will definitely see performance degradation. Of course, it is possible you have stumbled upon a severe bottleneck in a specific Analyzer, Tokenizer, Codec, Query, or other component, but again, without seeing the code there isn't much we can do. Could you post this POC somewhere in a form where it can be run in both Java and .NET without too much extra configuration, and some setup instructions to get it up and running? |
Hi, Lucene java version 8.6.0 Unfortunately, I am using multiple libraries for text extraction thus requiring some additional configuration. What I can do is post the index creation and search code snippet that I used for Lucenenet. Would that be enough? I won't be posting the Java version as its working fine and I do not need help with that but do let me know if you need that too. |
Considering that your request concerns a "Difference" between the two, posting both is still relevant if anyone is to help you with spotting any notable difference in the two implementations... Other things that might prove interesting is:
That being said... Currently 4.8 have full focus, so you may ultimately be better of asking this on forums such as StackOverflow etc. But posting your code and any other information you could think of that was relevant here and/or in a StackOverflow question could mean someone could point to something that is not done in a optimal way. As i said above, we have over 4 times the documents and a huge amount of fields, and I can't even get near what you describe even with the most evil queries I can imagine. |
Since this is not really a fair comparison (4.x+ is a completely different design than 3.x) and as others pointed out there is probably something misconfigured or misused to see results like what you are seeing, I am considering this matter closed. But as @jeme pointed out, you might be more successful getting help with the issue if you post some code. However, since this seems to be more of a usability issue than an actual bug, it doesn't belong here, either ask on StackOverflow or on the user mailing list. |
I have implemented a Lucene POC in Java and dotnet. The stats are roughly comparable except for the search time(time required to get the matching docs). Java application roughly takes 9-10 seconds whereas Dotnet took 52 minutes. I have indexed 99000 documents which comprise of pdf,docs,txt and etc. Indexing for both of the POCs was performed on the same files. Is this disparity in search time expected due to the java version being superior or is there some error in my coding for lucenedotnet ?
The text was updated successfully, but these errors were encountered: