Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Perf -47%] PerfLabTests.BlockCopyPerf.CallBlockCopy #37808

Closed
DrewScoggins opened this issue Jun 12, 2020 · 13 comments
Closed

[Perf -47%] PerfLabTests.BlockCopyPerf.CallBlockCopy #37808

DrewScoggins opened this issue Jun 12, 2020 · 13 comments
Labels
arch-x64 area-System.Buffers os-linux Linux OS (any supported distro) tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark
Milestone

Comments

@DrewScoggins
Copy link
Member

Run Information

Architecture x64
OS Windows 10.0.18362
Changes diff

Regressions in PerfLabTests.BlockCopyPerf

Benchmark Baseline Test Test/Base Modality Baseline Outlier
CallBlockCopy 5.03 ns 7.39 ns 1.47 False

graph
Historical Data in Reporting System

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f netcoreapp5.0 --filter 'PerfLabTests.BlockCopyPerf*';

Histogram

PerfLabTests.BlockCopyPerf.CallBlockCopy(numElements: 0)

[4.738 ; 5.406) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[5.406 ; 6.074) | 
[6.074 ; 6.607) | 
[6.607 ; 7.275) | @@@@@@@@@@@@
[7.275 ; 7.979) | @@@@@@@

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

@Dotnet-GitSync-Bot
Copy link
Collaborator

I couldn't figure out the best area label to add to this issue. Please help me learn by adding exactly one area label.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Jun 12, 2020
@DrewScoggins
Copy link
Member Author

From @adamsitnik

Looks like a very simple regression. I would expect the BlockCopy to return very quickly for 0 elements.

This needs an investigation. We should also consider removing this test case - I am not sure if testing BlockCopy for 0 elements adds value.

@DrewScoggins DrewScoggins added the tenet-performance-benchmarks Issue from performance benchmark label Jun 12, 2020
@EgorBo
Copy link
Member

EgorBo commented Jun 12, 2020

Both BlockCopy and InitBlockUnaligned should be intrinsics in case of constant (small) size argument - so we can optimize it to no-op or unroll.

@jkotas
Copy link
Member

jkotas commented Jun 12, 2020

BlockCopy comes with fairly non-trivial argument validation. You would need to hardcode all that into the JIT...

@EgorBo
Copy link
Member

EgorBo commented Jun 12, 2020

BlockCopy comes with fairly non-trivial argument validation. You would need to hardcode all that into the JIT...

We can start from Buffer.MemoryCopy
e.g.

Buffer.MemoryCopy(srcPtr, dstPtr, 8, 8);

to

*(long*)dstPtr = *(long*)strPtr
// or several (unrolled) lines

@jkotas
Copy link
Member

jkotas commented Jun 12, 2020

I would focus on Span.CopyTo/TryCopyTo and tell everybody to use that if they want this sort of optimizations. Buffer.MemoryCopy is rarely used and legacy, not worth spending target effort on.

@ghost
Copy link

ghost commented Jun 23, 2020

Tagging subscribers to this area: @tannergooding
Notify danmosemsft if you want to be subscribed.

@GrabYourPitchforks
Copy link
Member

I am not sure if testing BlockCopy for 0 elements adds value.

I agree with this. It's useful as a unit test to make sure we don't error out, but as a perf test I don't think it's all that useful. I'd expect the vast majority use case to pass non-0 arguments to this method. And per Jan's earlier comment, we should really be encouraging use of strongly-typed APIs for optimal throughput.

Closing as won't fix. If this ends up being a problem please reopen.

@adamsitnik adamsitnik removed the untriaged New issue has not been triaged by the area owner label Jul 6, 2020
@DrewScoggins
Copy link
Member Author

DrewScoggins commented Jul 10, 2020

Run Information

Architecture x64
OS ubuntu 18.04
Changes diff

Regressions in PerfLabTests.BlockCopyPerf

Benchmark Baseline Test Test/Base Modality Baseline Outlier
CallBlockCopy 2.65 ns 6.75 ns 2.54 Bimodal False
CallBlockCopy 13.28 ns 23.06 ns 1.74 Bimodal False

Related Issue on x64 Windows

[Perf -43%] PerfLabTests.BlockCopyPerf.CallBlockCopy

Related Issue on x86 Windows

[Perf -74%] PerfLabTests.BlockCopyPerf.CallBlockCopy

graph
graph
Historical Data in Reporting System

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f netcoreapp5.0 --filter 'PerfLabTests.BlockCopyPerf*';

Histogram

PerfLabTests.BlockCopyPerf.CallBlockCopy(numElements: 0)

[2.111 ;  2.873) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[2.873 ;  3.636) | 
[3.636 ;  4.399) | 
[4.399 ;  4.801) | 
[4.801 ;  5.563) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[5.563 ;  5.865) | 
[5.865 ;  6.549) | @@@@@@@@@@@@@@@@@@@@@
[6.549 ;  7.312) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[7.312 ;  7.924) | @@@@@@@@@@@@@@@
[7.924 ;  8.686) | 
[8.686 ;  9.425) | 
[9.425 ; 10.188) | @

PerfLabTests.BlockCopyPerf.CallBlockCopy(numElements: 1000)

[13.195 ; 15.231) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[15.231 ; 17.393) | @@@@
[17.393 ; 19.429) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[19.429 ; 21.180) | @@@
[21.180 ; 21.995) | 
[21.995 ; 24.031) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[24.031 ; 24.844) | @
[24.844 ; 27.444) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[27.444 ; 29.479) | @@
[29.479 ; 31.515) | 
[31.515 ; 33.732) | 
[33.732 ; 35.803) | @@

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

@DrewScoggins DrewScoggins reopened this Jul 10, 2020
@DrewScoggins
Copy link
Member Author

Reopening this because on Ubuntu x64 we are seeing regressions in the 1000 element case.

@adamsitnik
Copy link
Member

I've sent a PR to remove the 0 elements benchmark case: dotnet/performance#1465

@adamsitnik
Copy link
Member

We know that BlockCopy is bimodal, but I currently don't have enough data to close the issue. I am going to update it later this week when I get reports from others

PerfLabTests.BlockCopyPerf.CallBlockCopy(numElements: 1000)

Conclusion Base Diff Base/Diff Modality Operating System Bit Processor Name Base Runtime Diff Runtime
Slower 42.11 99.48 0.42 Windows 10.0.18363.1016 Arm Microsoft SQ1 3.0 GHz .NET Core 3.1.6 5.0.100-rc.1.20413.9
Slower 23.34 24.66 0.95 Windows 10.0.18363.959 X64 Intel Xeon CPU E5-1650 v4 3.60GHz .NET Core 3.1.6 5.0.100-rc.1.20404.3
Faster 31.98 25.41 1.26 Windows 10.0.18363.959 X64 Intel Xeon CPU E5-1650 v4 3.60GHz .NET Core 3.1.6 5.0.100-rc.1.20418.3
Slower 27.67 36.89 0.75 Windows 10.0.19041.450 X64 Intel Core i7-5557U CPU 3.10GHz (Broadwell) .NET Core 3.1.6 5.0.100-rc.1.20413.9
Slower 16.66 38.78 0.43 ubuntu 18.04 X64 Intel Xeon CPU E5-1650 v4 3.60GHz .NET Core 3.1.6 5.0.100-rc.1.20403.23
Slower 19.30 28.95 0.67 macOS Mojave 10.14.5 X64 Intel Core i7-5557U CPU 3.10GHz (Broadwell) .NET Core 3.1.6 5.0.100-rc.1.20404.2

PerfLabTests.BlockCopyPerf.CallBlockCopy(numElements: 100)

Conclusion Base Diff Base/Diff Modality Operating System Bit Processor Name Base Runtime Diff Runtime
Slower 13.95 23.09 0.60 Windows 10.0.18363.1016 Arm Microsoft SQ1 3.0 GHz .NET Core 3.1.6 5.0.100-rc.1.20413.9
Same 7.75 8.12 0.95 Windows 10.0.18363.959 X64 Intel Xeon CPU E5-1650 v4 3.60GHz .NET Core 3.1.6 5.0.100-rc.1.20404.3
Slower 8.05 8.59 0.94 Windows 10.0.18363.959 X64 Intel Xeon CPU E5-1650 v4 3.60GHz .NET Core 3.1.6 5.0.100-rc.1.20418.3
Same 9.35 9.45 0.99 Windows 10.0.19041.450 X64 Intel Core i7-5557U CPU 3.10GHz (Broadwell) .NET Core 3.1.6 5.0.100-rc.1.20413.9
Slower 7.56 8.70 0.87 bimodal ubuntu 18.04 X64 Intel Xeon CPU E5-1650 v4 3.60GHz .NET Core 3.1.6 5.0.100-rc.1.20403.23
Slower 8.75 9.76 0.90 macOS Mojave 10.14.5 X64 Intel Core i7-5557U CPU 3.10GHz (Broadwell) .NET Core 3.1.6 5.0.100-rc.1.20404.2

PerfLabTests.BlockCopyPerf.CallBlockCopy(numElements: 10)

Conclusion Base Diff Base/Diff Modality Operating System Bit Processor Name Base Runtime Diff Runtime
Slower 9.91 11.03 0.90 Windows 10.0.18363.1016 Arm Microsoft SQ1 3.0 GHz .NET Core 3.1.6 5.0.100-rc.1.20413.9
Same 6.01 6.31 0.95 Windows 10.0.18363.959 X64 Intel Xeon CPU E5-1650 v4 3.60GHz .NET Core 3.1.6 5.0.100-rc.1.20404.3
Same 6.29 6.57 0.96 Windows 10.0.18363.959 X64 Intel Xeon CPU E5-1650 v4 3.60GHz .NET Core 3.1.6 5.0.100-rc.1.20418.3
Same 7.26 7.61 0.95 Windows 10.0.19041.450 X64 Intel Core i7-5557U CPU 3.10GHz (Broadwell) .NET Core 3.1.6 5.0.100-rc.1.20413.9
Slower 5.42 7.35 0.74 ubuntu 18.04 X64 Intel Xeon CPU E5-1650 v4 3.60GHz .NET Core 3.1.6 5.0.100-rc.1.20403.23
Slower 6.32 7.72 0.82 macOS Mojave 10.14.5 X64 Intel Core i7-5557U CPU 3.10GHz (Broadwell) .NET Core 3.1.6 5.0.100-rc.1.20404.2

@adamsitnik
Copy link
Member

Based on the full historical data we can assume that this benchmark is simply unstable:

obraz

Closing.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-x64 area-System.Buffers os-linux Linux OS (any supported distro) tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark
Projects
None yet
Development

No branches or pull requests

8 participants