Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.Net 6 and 8 slower than .Net 472 #929

Open
1 task done
ttustonic opened this issue Mar 12, 2024 · 10 comments
Open
1 task done

.Net 6 and 8 slower than .Net 472 #929

ttustonic opened this issue Mar 12, 2024 · 10 comments
Labels

Comments

@ttustonic
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Describe the bug

I have a quite large index with 5-gram field on very large strings (tens or hundreds characters).
The program searches this index and looks for matches using NGramPhraseQuery. I usually have to look for a match for few hundred or thousand input queries.
I tested with .Net framework 4.72 , .Net 6 and .Net 8.
Results for .Net 6 and 8 are 2-4 times slower than .Net framework 4.72 for the same input.
Did you see this behaviour before, or have an idea what to look for?

Thanks, Tom

Expected Behavior

No response

Steps To Reproduce

No response

Exceptions (if any)

No response

Lucene.NET Version

4.8.0-beta00016

.NET Version

4.7.2 and 8.0.100

Operating System

Windows 10

Anything else?

No response

@jeme
Copy link
Contributor

jeme commented Mar 12, 2024

I think it would be very helpful if you tried to setup a code sample or even better something running under https://benchmarkdotnet.org/articles/guides/getting-started.html.

At the very least share relevant pieces of your code, the team will have no idea on how to begin to attempt to reproduce this as is, so more information is likely needed.

@eladmarg
Copy link
Contributor

This is very strange
Can you reproduce this by test?

@ttustonic
Copy link
Author

Thanks!
Ok, I'll try to create benchmark for this. It's not a benchmark for the different methods, but the same code under different frameworks, but, hopefully it will be OK. There might be some legal issues about sharing code and data so I'll see about creating a test program. Also, index is quite large (a few hundred megs).
Yes, I can reproduce this in my tests, running the same searches with .Net framework, 6 and 8.
I recompiled J2N and Lucene to explicitly target .Net 8, and checked whether all the feature flags are used, but it's still the same.

@jeme
Copy link
Contributor

jeme commented Mar 13, 2024

Without knowing the details, perhaps https://github.com/bchavez/Bogus can be of some assistance to create test data.

@ttustonic
Copy link
Author

Here are benchmark results. I have also added a benchmark class. I had to rename the field names etc. but it's basically a simple search of 5-gram field.
There are two methods: SearchWithFullMatchResult and SearchWithEmptyMatchResult.
They both look for about 36000 inputs and return results.
Full method look for additional data for found hits, empty method just return a list of dummy match results.

.Net 6 full is slightly better than 472, but empty test is much slower. Also, I observed that 6 and 8 results are all over the place, sometimes the same method repeated 3 times in a row gives 2x difference in execution time. This can also be seen from the result, Stdev is quite big for .Net 6 and 8 (especially for an empty test).

This lead us to believe that there might be a problem in runtime or garbage collector, so I run the same benchmark using server GC, and now the results are great with 6 and even better with 8.

So, out immediate problem seems to be solved, but you perhaps that's something to look for.

As for test index, I can't share it publicly (and, besides, it's quite large, test case is 900MB), but perhaps there's a way to share it privately.

Thanks, Tom

Workstation GC

BenchmarkDotNet v0.13.12, Windows 10 (10.0.19045.4046/22H2/2022Update)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
  [Host]     : .NET Framework 4.8.1 (4.8.9181.0), X64 RyuJIT VectorSize=256
  DefaultJob : .NET Framework 4.8.1 (4.8.9181.0), X64 RyuJIT VectorSize=256

Method Mean Error StdDev
SearchWithFullMatchResult 47.05 s 0.235 s 0.220 s
SearchWithEmptyMatchResult 10.52 s 0.061 s 0.054 s
BenchmarkDotNet v0.13.12, Windows 10 (10.0.19045.4046/22H2/2022Update)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.200
  [Host]     : .NET 6.0.27 (6.0.2724.6912), X64 RyuJIT AVX2
  DefaultJob : .NET 6.0.27 (6.0.2724.6912), X64 RyuJIT AVX2

Method Mean Error StdDev
SearchWithFullMatchResult 45.26 s 0.664 s 0.588 s
SearchWithEmptyMatchResult 34.49 s 0.686 s 1.108 s
BenchmarkDotNet v0.13.12, Windows 10 (10.0.19045.4046/22H2/2022Update)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.200
  [Host]     : .NET 8.0.2 (8.0.224.6711), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.2 (8.0.224.6711), X64 RyuJIT AVX2

Method Mean Error StdDev
SearchWithFullMatchResult 54.08 s 0.776 s 0.648 s
SearchWithEmptyMatchResult 45.05 s 0.739 s 1.011 s

Server GC

BenchmarkDotNet v0.13.12, Windows 10 (10.0.19045.4170/22H2/2022Update)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
  [Host]     : .NET Framework 4.8.1 (4.8.9181.0), X64 RyuJIT VectorSize=256
  DefaultJob : .NET Framework 4.8.1 (4.8.9181.0), X64 RyuJIT VectorSize=256

Method Mean Error StdDev
SearchWithFullMatchResult 47.72 s 0.159 s 0.133 s
SearchWithEmptyMatchResult 11.19 s 0.159 s 0.141 s
BenchmarkDotNet v0.13.12, Windows 10 (10.0.19045.4170/22H2/2022Update)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.200
  [Host]     : .NET 6.0.28 (6.0.2824.12007), X64 RyuJIT AVX2
  DefaultJob : .NET 6.0.28 (6.0.2824.12007), X64 RyuJIT AVX2


Method Mean Error StdDev
SearchWithFullMatchResult 24.183 s 0.3353 s 0.3137 s
SearchWithEmptyMatchResult 8.680 s 0.0447 s 0.0418 s
BenchmarkDotNet v0.13.12, Windows 10 (10.0.19045.4170/22H2/2022Update)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.200
  [Host]     : .NET 8.0.2 (8.0.224.6711), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.2 (8.0.224.6711), X64 RyuJIT AVX2

Method Mean Error StdDev
SearchWithFullMatchResult 17.113 s 0.1021 s 0.0955 s
SearchWithEmptyMatchResult 6.735 s 0.0773 s 0.0723 s

@NightOwl888
Copy link
Contributor

Great, this gets us a starting point. But, we will need more to narrow this down more than target framework to have any useful place to start (after all, we are maintaining over 600,000 lines of code).

One thing that would be helpful to know is whether you are benchmarking using the released components, and if not, which tag(s) or branch(es) you are compiling from source with. We have gone several hundred commits beyond the last release, so this may be relevant.

But now that you have created some benchmarks, would you consider contributing them back to us by forking the branch for #310? It is difficult for us to tell which components in the codebase are extremely common and which are extremely rare. The best we came up with was to benchmark the code in the demos, but I suspect this is still a far cry from a common real world combination of components that we should be using to determine whether a change we make is for the better or worse in terms of performance.

We still haven't seen any code, and that is the crucial bit. You don't technically need to provide an index, just the code to generate one and the code you are using to search that exhibits the behavior you are seeing. The data to generate the index can be completely based on a random seed so it generates the same data every time it is run. That way you don't have to share any real production data, and you don't have to provide a large index to us, just the code to generate the same index every time it runs.

While @jeme suggested a 3rd party package to generate test data with, do note that RandomizedTesting.Generators has several methods for generating simple or realistic strings. It is also important to base the implementation off of J2N.Randomizer instead of System.Random because .NET doesn't provide an implementation that guarantees the same random values on every target framework and operating system.

@ttustonic
Copy link
Author

Ups, seems that I forgot to attach benchmark code and full results. I have replaced field names and removed input, but you can see what I'm measuring.
The benchmark is using Lucene.Net nuget package 4.8.0-beta00016.
As for the data, I'll be able to provide index that I used for testing and make a real test program. The data that I'm indexing are protein sequences, indexed by 5-grams. This test index is about 500 megs zipped.
Indexing code is a bit complicated, because I have providers for many different protein sources, so I'd have to extract just the code for this test data source. So, if you can accept 500 megs index, it would be easier for me, and if not, I'll make a small indexing program.
The source for the test data is
https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz
Thanks, Tom

BenchmarkLucene.zip
benchmarkRes.zip

@paulirwin
Copy link
Contributor

We need to evaluate if this is still true with our beta 17 release.

@paulirwin
Copy link
Contributor

I just re-read the results posted above, and it looks like Server GC fixes the issues. Perhaps we should have something in our docs about that. I still would be interested in seeing how beta17 fares. @ttustonic If you wouldn't mind keeping your eyes peeled for the beta 17 release in a few days, and running it again once that is released, that would be appreciated. Thanks!

@ttustonic
Copy link
Author

Here are the results. I repeated tests with the version 00016, and run new benchmark with 00017.

ver 00016 repeat
Workstation GC


BenchmarkDotNet v0.13.12, Windows 10 (10.0.19045.5011/22H2/2022Update)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
  [Host]               : .NET Framework 4.8.1 (4.8.9277.0), X64 RyuJIT VectorSize=256
  .NET 6.0             : .NET 6.0.35 (6.0.3524.45918), X64 RyuJIT AVX2
  .NET 8.0             : .NET 8.0.8 (8.0.824.36612), X64 RyuJIT AVX2
  .NET Framework 4.7.2 : .NET Framework 4.8.1 (4.8.9277.0), X64 RyuJIT VectorSize=256


Method Job Runtime Mean Error StdDev Median
SearchWithFullMatchResult .NET 6.0 .NET 6.0 47.23 s 0.839 s 0.784 s 47.21 s
SearchWithEmptyMatchResult .NET 6.0 .NET 6.0 35.39 s 0.705 s 1.307 s 35.98 s
SearchWithFullMatchResult .NET 8.0 .NET 8.0 60.89 s 1.180 s 1.211 s 60.79 s
SearchWithEmptyMatchResult .NET 8.0 .NET 8.0 53.63 s 1.067 s 2.180 s 54.05 s
SearchWithFullMatchResult .NET Framework 4.7.2 .NET Framework 4.7.2 46.04 s 0.172 s 0.152 s 46.09 s
SearchWithEmptyMatchResult .NET Framework 4.7.2 .NET Framework 4.7.2 10.58 s 0.094 s 0.083 s 10.60 s

Server GC


BenchmarkDotNet v0.13.12, Windows 10 (10.0.19045.5011/22H2/2022Update)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.400
  [Host]               : .NET 8.0.8 (8.0.824.36612), X64 RyuJIT AVX2
  .NET 6.0             : .NET 6.0.35 (6.0.3524.45918), X64 RyuJIT AVX2
  .NET 8.0             : .NET 8.0.8 (8.0.824.36612), X64 RyuJIT AVX2
  .NET Framework 4.7.2 : .NET Framework 4.8.1 (4.8.9277.0), X64 RyuJIT VectorSize=256


Method Job Runtime Mean Error StdDev
SearchWithFullMatchResult .NET 6.0 .NET 6.0 23.911 s 0.1193 s 0.1116 s
SearchWithEmptyMatchResult .NET 6.0 .NET 6.0 8.936 s 0.0930 s 0.0870 s
SearchWithFullMatchResult .NET 8.0 .NET 8.0 16.681 s 0.1385 s 0.1156 s
SearchWithEmptyMatchResult .NET 8.0 .NET 8.0 7.151 s 0.1428 s 0.1336 s
SearchWithFullMatchResult .NET Framework 4.7.2 .NET Framework 4.7.2 45.861 s 0.3823 s 0.3576 s
SearchWithEmptyMatchResult .NET Framework 4.7.2 .NET Framework 4.7.2 10.845 s 0.0958 s 0.0896 s

Version 00017
Workstation GC


BenchmarkDotNet v0.13.12, Windows 10 (10.0.19045.5011/22H2/2022Update)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.400
  [Host]               : .NET 8.0.8 (8.0.824.36612), X64 RyuJIT AVX2
  .NET 6.0             : .NET 6.0.35 (6.0.3524.45918), X64 RyuJIT AVX2
  .NET 8.0             : .NET 8.0.8 (8.0.824.36612), X64 RyuJIT AVX2
  .NET Framework 4.7.2 : .NET Framework 4.8.1 (4.8.9277.0), X64 RyuJIT VectorSize=256


Method Job Runtime Mean Error StdDev
SearchWithFullMatchResult .NET 6.0 .NET 6.0 48.35 s 0.620 s 0.550 s
SearchWithEmptyMatchResult .NET 6.0 .NET 6.0 36.33 s 0.722 s 1.373 s
SearchWithFullMatchResult .NET 8.0 .NET 8.0 63.31 s 1.095 s 0.971 s
SearchWithEmptyMatchResult .NET 8.0 .NET 8.0 55.18 s 1.102 s 2.662 s
SearchWithFullMatchResult .NET Framework 4.7.2 .NET Framework 4.7.2 47.37 s 0.131 s 0.116 s
SearchWithEmptyMatchResult .NET Framework 4.7.2 .NET Framework 4.7.2 11.30 s 0.069 s 0.064 s

Server GC


BenchmarkDotNet v0.13.12, Windows 10 (10.0.19045.5011/22H2/2022Update)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.400
  [Host]               : .NET 8.0.8 (8.0.824.36612), X64 RyuJIT AVX2
  .NET 6.0             : .NET 6.0.35 (6.0.3524.45918), X64 RyuJIT AVX2
  .NET 8.0             : .NET 8.0.8 (8.0.824.36612), X64 RyuJIT AVX2
  .NET Framework 4.7.2 : .NET Framework 4.8.1 (4.8.9277.0), X64 RyuJIT VectorSize=256


Method Job Runtime Mean Error StdDev
SearchWithFullMatchResult .NET 6.0 .NET 6.0 24.193 s 0.0857 s 0.0760 s
SearchWithEmptyMatchResult .NET 6.0 .NET 6.0 9.195 s 0.1838 s 0.1719 s
SearchWithFullMatchResult .NET 8.0 .NET 8.0 17.075 s 0.0988 s 0.0876 s
SearchWithEmptyMatchResult .NET 8.0 .NET 8.0 7.116 s 0.1073 s 0.1004 s
SearchWithFullMatchResult .NET Framework 4.7.2 .NET Framework 4.7.2 45.745 s 0.3381 s 0.2997 s
SearchWithEmptyMatchResult .NET Framework 4.7.2 .NET Framework 4.7.2 10.605 s 0.0473 s 0.0419 s

It seems that with the newer version of .Net 8.0 things got worse with old version 00016. Also, it seems that the version 00017 seems to be a bit worse.

benchmark-00016.zip

benchmark-00017.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants