Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of Lemire's nearly divisionless method #79790

Merged
merged 32 commits into from
Feb 18, 2023

Conversation

mla-alm
Copy link
Contributor

@mla-alm mla-alm commented Dec 17, 2022

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Dec 17, 2022
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

@dnfadmin
Copy link

dnfadmin commented Dec 17, 2022

CLA assistant check
All CLA requirements met.

@ghost
Copy link

ghost commented Dec 17, 2022

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

#75395

Author: mla-alm
Assignees: -
Labels:

area-System.Runtime, community-contribution

Milestone: -

@danmoseley
Copy link
Member

Could you please add benchmark(s) to dotnet/performance so we protect your work when it goes in? Or is coverage good enough already?

@mla-alm
Copy link
Contributor Author

mla-alm commented Dec 21, 2022

Which numbers or intervals to pick for benchmarks decides which algorithm is "optimal". IMO it could be interesting to include the edge cases:
Next(int.MinValue, 1)
NextInt64(long.MinValue, 1)
However these would actually point towards an implementation like swiftlang/swift@87b3f60.
For this implementation I guess the benchmarks are sufficient.

@stephentoub
Copy link
Member

/benchmark microbenchmarks aspnet-perf-win runtime --variable filter=System.Tests.Perf_Random*

@pr-benchmarks
Copy link

pr-benchmarks bot commented Jan 31, 2023

Benchmark started for microbenchmarks on aspnet-perf-win with runtime and arguments --variable filter=System.Tests.Perf_Random*. Logs: link

@pr-benchmarks
Copy link

pr-benchmarks bot commented Jan 31, 2023

microbenchmarks - aspnet-perf-win

| benchmark               | mean (microbenchmarks.base) | mean (microbenchmarks.pr) | ratio | allocated (microbenchmarks.base) | allocated (microbenchmarks.pr) | ratio |
| ----------------------- | --------------------------- | ------------------------- | ----- | -------------------------------- | ------------------------------ | ----- |
| ctor                    |                    80.83 ns |                  81.42 ns |  1.01 |                             72 B |                           72 B |  1.00 |
| ctor_seeded             |                    291.5 ns |                  292.1 ns |  1.00 |                            304 B |                          304 B |  1.00 |
| Next                    |                    7.939 ns |                  8.179 ns |  1.03 |                              0 B |                            0 B |       |
| Next_int                |                    8.980 ns |                  9.050 ns |  1.01 |                              0 B |                            0 B |       |
| Next_int_int            |                    9.059 ns |                  9.905 ns |  1.09 |                              0 B |                            0 B |       |
| Next_int_int_unseeded   |                    7.311 ns |                  2.102 ns |  0.29 |                              0 B |                            0 B |       |
| Next_int_unseeded       |                    7.443 ns |                  2.261 ns |  0.30 |                              0 B |                            0 B |       |
| Next_long               |                    37.09 ns |                  35.41 ns |  0.95 |                              0 B |                            0 B |       |
| Next_long_long          |                    37.65 ns |                  39.01 ns |  1.04 |                              0 B |                            0 B |       |
| Next_long_long_unseeded |                    8.304 ns |                  2.659 ns |  0.32 |                              0 B |                            0 B |       |
| Next_long_unseeded      |                    4.268 ns |                  2.549 ns |  0.60 |                              0 B |                            0 B |       |
| Next_unseeded           |                    1.844 ns |                  1.746 ns |  0.95 |                              0 B |                            0 B |       |
| NextBytes               |                      5.3 μs |                    5.6 μs |  1.05 |                              0 B |                            0 B |       |
| NextBytes_span          |                      5.5 μs |                    5.6 μs |  1.01 |                              0 B |                            0 B |       |
| NextBytes_span_unseeded |                    119.1 ns |                  118.8 ns |  1.00 |                              0 B |                            0 B |       |
| NextBytes_unseeded      |                    119.1 ns |                  119.0 ns |  1.00 |                              0 B |                            0 B |       |
| NextDouble              |                    7.980 ns |                  7.980 ns |  1.00 |                              0 B |                            0 B |       |
| NextDouble_unseeded     |                    1.991 ns |                  1.961 ns |  0.98 |                              0 B |                            0 B |       |
| NextSingle              |                    7.995 ns |                  7.999 ns |  1.00 |                              0 B |                            0 B |       |
| NextSingle_unseeded     |                    2.000 ns |                  1.977 ns |  0.99 |                              0 B |                            0 B |       |

Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation and 64-bit perf look good. I'm building locally for 32-bit to check that.

@stephentoub
Copy link
Member

stephentoub commented Jan 31, 2023

This Xoshiro128 implementation is only used on 32-bit. I saw some measurements were done, but I believe those were on 64-bit... have we measured how this performs on 32-bit?

True, I have only tested on 64-bit.

On 32-bit, I get numbers like this:

Method Toolchain Mean Error StdDev Median Min Max Ratio
Next_int_unseeded \main_x86\corerun.exe 13.130 ns 0.4286 ns 0.4936 ns 12.949 ns 12.475 ns 14.104 ns 1.00
Next_int_unseeded \pr_x86\corerun.exe 5.612 ns 0.0956 ns 0.0847 ns 5.602 ns 5.489 ns 5.813 ns 0.43
Next_int_int_unseeded \main_x86\corerun.exe 12.036 ns 0.1935 ns 0.1616 ns 11.969 ns 11.864 ns 12.342 ns 1.00
Next_int_int_unseeded \pr_x86\corerun.exe 5.296 ns 0.0605 ns 0.0565 ns 5.288 ns 5.208 ns 5.423 ns 0.44
Next_long_unseeded \main_x86\corerun.exe 11.874 ns 0.2669 ns 0.2496 ns 11.850 ns 11.462 ns 12.305 ns 1.00
Next_long_unseeded \pr_x86\corerun.exe 14.409 ns 0.1976 ns 0.1650 ns 14.360 ns 14.121 ns 14.758 ns 1.22
Next_long_long_unseeded \main_x86\corerun.exe 15.087 ns 0.2617 ns 0.2320 ns 15.085 ns 14.771 ns 15.666 ns 1.00
Next_long_long_unseeded \pr_x86\corerun.exe 18.001 ns 0.2120 ns 0.1983 ns 17.967 ns 17.711 ns 18.399 ns 1.19

So, at least according to these benchmarks, this PR represents a potential regression on 32-bit, in particular for NextInt64.

Can you measure on 32-bit as well, @mla-alm?

@mla-alm
Copy link
Contributor Author

mla-alm commented Jan 31, 2023

I am running on Linux, so cannot measure on 32-bit.

public long Next_long_long_unseeded() => _randomUnseeded.NextInt64(100, 10000);
So this benchmark for the 32-bit is in reality only handling the case of an int on the 'main' test.

Maybe it is better to be conservative and leave the implementation (at least the NextInt64) as is in Xoshiro128?

@jeffhandley
Copy link
Member

Maybe it is better to be conservative and leave the implementation (at least the NextInt64) as is in Xoshiro128?

@stephentoub What do you think about that idea?

@stephentoub
Copy link
Member

Yeah, let's leave the Xoshiro128's NextInt64 unmodified for now, other than a comment indicating why it's not getting the same treatment as the others.

@mla-alm
Copy link
Contributor Author

mla-alm commented Feb 16, 2023

Xoshiro128's NextInt64 is as it was before. And comment added.

Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@danmoseley
Copy link
Member

@mla-alm Thank you! Perhaps you'd be interested in another contribution somewhere?

@EgorBo
Copy link
Member

EgorBo commented Feb 21, 2023

Improvements on x64 dotnet/perf-autofiling-issues#13151

@mla-alm
Copy link
Contributor Author

mla-alm commented Mar 6, 2023

Hi @danmoseley.
Sure I might. Do you have any suggestions?
Otherwise I might look around, when I find the time.

@mla-alm mla-alm deleted the mla-alm/lemire branch March 19, 2023 05:52
@EgorBo
Copy link
Member

EgorBo commented Mar 28, 2023

@danmoseley
Copy link
Member

mla-alm

See your perf win confirmed above!

In terms of what to take on next -- any issue that's help wanted (or possibly others)

https://github.com/dotnet/runtime/issues?q=is%3Aopen+is%3Aissue+label%3A%22help+wanted%22
https://github.com/dotnet/aspnetcore/issues?q=is%3Aopen+is%3Aissue+label%3A%22help+wanted%22

if you have a particular area you're interested in, you could ask them

https://github.com/dotnet/runtime/blob/main/docs/area-owners.md

or ask here or in discussions.

we are always happy to have new regular contributors.

@ghost ghost locked as resolved and limited conversation to collaborators Apr 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Runtime community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants