x64 vs ARM64 Microbenchmarks Performance Study Report #67339

adamsitnik · 2022-03-30T13:24:26Z

Recently @kunalspathak asked me if I could produce a report similar to #66848 for x64 vs arm64 comparison.

I took .NET 7 Preview2 results provided by @AndyAyersMS, @kunalspathak and myself for #66848, hacked the tool a little bit (it was not designed to compare different architecture results) and compared x64 vs arm64 using the following configs:

my old 4 year old macBook Pro x64: macOS Monterey 12.2.1, Intel Core i7-5557U CPU 3.10GHz (Broadwell), 1 CPU, 4 logical and 2 physical cores vs @AndyAyersMS M1 Max arm64: macOS Monterey 12.2.1, Apple M1 Max 2.40GHz, 1 CPU, 10 logical and 10 physical cores
@kunalspathak Windows 10 (10.0.20348.587) Intel Xeon Platinum 8272CL CPU 2.60GHz, 2 CPU, 104 logical and 52 physical cores vs @kunalspathak Windows 11 (10.0.25058.1000) ARM64 machine with lots of cores

Of course it was not an apples-to-apples comparision, just the best thing we could do right now.

Full public results (without absolute values, as I don't have the permission to share them) can be found here.
Internal MS results (with absolute values) can be found here. If you don't have the access please ping me on Teams.

As usual, I've focused on the benchmarks that take longer to execute on arm64 compared to x64. If you are interested in benchmarks that take less to execute, you need to read the report linked above in the reverse order.

Benchmarks:

@kunalspathak

System.Numerics.Tests.Perf_BitOperations.PopCount_ulong is 5-8 time slower (most likely due to lack of vectorization). PopCount_uint is slower only on Windows.

@tannergooding @GrabYourPitchforks

lot of Base64Encode benchmarks like System.Buffers.Text.Tests.Base64Tests.Base64Encode(NumberOfBytes: 1000) are 6 up to 16 times slower Optimize System.Buffers for arm64 using cross-platform intrinsics #35033

@stephentoub @kouvel

Some RentReturnArrayPoolTests benchmarks are up to few times slower, but these are multi-threaded and very often multimodal benchmarks. Faster thread local statics #63619
System.Threading.Tests.Perf_Timer.AsynchronousContention is 2-3 times slower.

@wfurt @MihaZupan

A lot of SocketSendReceivePerfTest benchmarks likeSystem.Net.WebSockets.Tests.SocketSendReceivePerfTest.ReceiveSend are 2 times slower.

@dotnet/area-system-drawing

System.Drawing.Tests.Perf_Image_Load.Image_FromStream_NoValidation are few times slower on Windows. Only the NoValidation benchmarks seem to run slower.

@stephentoub

Few RegularExpressions benchmarks like System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "(?i)Sher[a-z]+|Hol[a-z]+", Options: Compiled) are 40-50% slower. This pattern uses IndexOfAny("HOho") to find the next possible match location. It has a 256-bit vectorization path on x64 but only 128-bit on ARM64.

@jkotas @AndyAyersMS

PerfLabTests.LowLevelPerf.GenericClassGenericStaticField benchmark can be from 16% to x3 times slower. Same goes for PerfLabTests.LowLevelPerf.GenericClassGenericStaticMethod.

@dotnet/jit-contrib

System.Security.Cryptography.Tests.Perf_Hashing.Sha1 is 17-55% slower. (Potentially differences in the GDI+ code)
System.IO.Tests.Perf_StreamWriter.WriteString(writeLength: 100) is 21-46% slower.
System.Text.Json.Serialization.Tests.WriteJson<BinaryData>.SerializeToStream benchmark can be from 16% to x4 times slower. Optimize System.Buffers for arm64 using cross-platform intrinsics #35033
SIMD.ConsoleMandel benchmarks are 40% slower . Double Vector128 for SpanHelpers.IndexOf(byte,byte,int) on ARM64 #66993
Burgers.Test3 is 12-59% slower Double Vector128 for SpanHelpers.IndexOf(byte,byte,int) on ARM64 #66993
A lot of System.Collections.Contains benchmarks are 2-3 times slower (most likely due to lack of vectorization). Same goes for System.Memory.Span<Char>.IndexOfValue, System.Memory.Span<Char>.Fill, System.Memory.Span<Int32>.StartsWith, System.Memory.Span<Byte>.IndexOfAnyTwoValues and System.Memory.ReadOnlySpan.IndexOfString(Ordinal). Double Vector128 for SpanHelpers.IndexOf(byte,byte,int) on ARM64 #66993
A lot of SequenceCompareTo benchmarks are 30% up to 4 times slower Double Vector128 for SpanHelpers.IndexOf(byte,byte,int) on ARM64 #66993

@tannergooding

System.MathBenchmarks.Double.Exp and System.MathBenchmarks.Single.Exp are 35% slower. Optimize jump stubs on arm64 #62302

@dotnet/area-system-globalization

System.Globalization.Tests.Perf_DateTimeCultureInfo.Parse(culturestring: ja) benchmark can be from 20% to x7 times slower (it's most likely an ICU problem). Initializing the "ja" culture takes 200ms when using ICU #31273
Various Perf_Interlocked benchmarks are slower, but this is expected due to memory model differences.
Various Perf_Process.Start benchmarks are slower, but only on macOS so it's most likely a macOS issue.

The text was updated successfully, but these errors were encountered:

ghost · 2022-03-30T13:24:32Z

Tagging subscribers to this area: @dotnet/area-meta
See info in area-owners.md if you want to be subscribed.

Issue Details

Recently @kunalspathak asked me if I could produce a report similar to #66848 for x64 vs arm64 comparison.

I took .NET 7 Preview2 results provided by @AndyAyersMS, @kunalspathak and myself for #66848, hacked the tool a little bit (it was not designed to compare different architecture results) and compared x64 vs arm64 using the following configs:

my old 4 year old macBook Pro x64: macOS Monterey 12.2.1, Intel Core i7-5557U CPU 3.10GHz (Broadwell), 1 CPU, 4 logical and 2 physical cores vs @AndyAyersMS M1 Max arm64: macOS Monterey 12.2.1, Apple M1 Max 2.40GHz, 1 CPU, 10 logical and 10 physical cores
@kunalspathak Windows 10 (10.0.20348.587) Intel Xeon Platinum 8272CL CPU 2.60GHz, 2 CPU, 104 logical and 52 physical cores vs @kunalspathak Windows 11 (10.0.25058.1000) ARM64 machine with lots of cores

Of course it was not an apples-to-apples comparision, just the best thing we could do right now.

Full public results (without absolute values, as I don't have the permission to share them) can be found here.
Internal MS results (with absolute values) can be found here. If you don't have the access please ping me on Teams.

As usual, I've focused on the benchmarks that take longer to execute on arm64 compared to x64. If you are interested in benchmarks that take less to execute, you need to read the report linked above in the reverse order.

Benchmarks:

A lot of Base64Encode benchmarks like System.Buffers.Text.Tests.Base64Tests.Base64Encode(NumberOfBytes: 1000) are 6 up to 16 times slower (most likely due to lack of vectorization). @tannergooding @GrabYourPitchforks is it expected?
System.Numerics.Tests.Perf_BitOperations.PopCount_ulong is 5-8 time slower (most likely due to lack of vectorization). PopCount_uint is slower only on Windows. @kunalspathak is this expected?
Some RentReturnArrayPoolTests benchmarks are up to few times slower, but these are multi-threaded and very often multimodal benchmarks. @stephentoub @kouvel is it expected?
System.Globalization.Tests.Perf_DateTimeCultureInfo.Parse(culturestring: ja) benchmark can be from 20% to x7 times slower (it's most likely an ICU problem). @dotnet/area-system-globalization is it expected?
A lot of System.Collections.Contains benchmarks are 2-3 times slower (most likely due to lack of vectorization). Same goes for System.Memory.Span<Char>.IndexOfValue, System.Memory.Span<Char>.Fill, System.Memory.Span<Int32>.StartsWith, System.Memory.Span<Byte>.IndexOfAnyTwoValues and System.Memory.ReadOnlySpan.IndexOfString(Ordinal). @tannergooding @EgorBo is it expected?
A lot of SequenceCompareTo benchmarks are 30% up to 4 times slower (most likely due to lack of vectorization). @tannergooding @EgorBo is it expected?
System.Text.Json.Serialization.Tests.WriteJson<BinaryData>.SerializeToStream benchmark can be from 16% to x4 times slower. @dotnet/jit-contrib is this expected?
System.Threading.Tests.Perf_Timer.AsynchronousContention is 2-3 times slower. @stephentoub @kouvel is it expected?
A lot of SocketSendReceivePerfTest benchmarks likeSystem.Net.WebSockets.Tests.SocketSendReceivePerfTest.ReceiveSend are 2 times slower. @wfurt @MihaZupan is it expected?
System.Drawing.Tests.Perf_Image_Load.Image_FromStream_NoValidation are few times slower on Windows. @dotnet/area-system-drawing is it expected? Only the NoValidation benchmarks seem to run slower.
PerfLabTests.LowLevelPerf.GenericClassGenericStaticField benchmark can be from 16% to x3 times slower. Same goes for PerfLabTests.LowLevelPerf.GenericClassGenericStaticMethod. @jkotas @AndyAyersMS is it expected?
Few RegularExpressions benchmarks like System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "(?i)Sher[a-z]+|Hol[a-z]+", Options: Compiled) are 40-50% slower (most likely it's using a method that has not been vectorized). @stephentoub is it expected?
Burgers.Test3 is 12-59% slower (most likely it's using a method that has not been vectorized). @dotnet/jit-contrib is it expected?
System.Security.Cryptography.Tests.Perf_Hashing.Sha1 is 17-55% slower (most likely due to lack of vectorization). @dotnet/jit-contrib is it expected?
SIMD.ConsoleMandel benchmarks are 40% slower (most likely due to lack of vectorization). @dotnet/jit-contrib is it expected?
System.IO.Tests.Perf_StreamWriter.WriteString(writeLength: 100) is 21-46% slower (most likely due to lack of vectorization). @dotnet/jit-contrib is it expected?
System.MathBenchmarks.Double.Exp and ystem.MathBenchmarks.Single.Exp are 35% slower. @tannergooding is it expected?
Various Perf_Interlocked benchmarks are slower, but this is expected due to memory model differences.
Various Perf_Process.Start benchmarks are slower, but only on macOS so it's most likely a macOS issue.

Author:	adamsitnik
Assignees:	-
Labels:	`area-Meta`, `tenet-performance`, `tracking`
Milestone:	-

EgorBo · 2022-03-30T14:14:08Z

Nice! I did a similar report last week and shared on our perf meeting last Monday

A lot of Base64Encode benchmarks like System.Buffers.Text.Tests.Base64Tests.Base64Encode(NumberOfBytes: 1000) are 6 up to 16 times slower (most likely due to lack of vectorization). @tannergooding @GrabYourPitchforks is it expected?

Base64 (for utf8) is only vectorized for x64, there is an issue for arm64 #35033 (I think we wanted to assign it to someone to ramp up)

System.Numerics.Tests.Perf_BitOperations.PopCount_ulong is 5-8 time slower (most likely due to lack of vectorization).

it is properly accelerated (I compared it with __builtin_popcnt in LLVM), the problem that popcnt is vector only on arm64 so we have some overhead on packing/extracting - 5 instructions vs 1 on x64

Some RentReturnArrayPoolTests benchmarks are up to few times slower

My guess that Rent-Return is most likely bottle-necked on TLS access speed, can be improved with #63619 if arm64 has special registers for that.

A lot of System.Collections.Contains benchmarks are 2-3 times slower (most likely due to lack of vectorization).

A lot of SequenceCompareTo benchmarks are 30% up to 4 times slower (most likely due to lack of vectorization

That is expected due to lack of Vector256 I believe, I proposed to add dual-vector128 for arm64 here #66993

Burgers.Test3 is 12-59% slower (most likely it's using a method that has not been vectorized)

SIMD.ConsoleMandel benchmarks are 40% slower

Same here, it uses Vector<T> so it's Vector256 on x64 vs Vector128 on arm64

Various Perf_Interlocked benchmarks are slower, but this is expected due to memory model differences.

Correct, the codegen for interlocked ops is completely fine on both arm64 8.0 and 8.1 (Atomics)

System.MathBenchmarks.Double.Exp and ystem.MathBenchmarks.Single.Exp are 35% slower.

If arm64 was M1 than it's the jump-stubs issue, see #62302 (comment)

PerfLabTests.LowLevelPerf.GenericClassGenericStaticField benchmark can be from 16% to x3 times slower. Same goes for PerfLabTests.LowLevelPerf.GenericClassGenericStaticMethod. @jkotas @AndyAyersMS is it expected?

My guess that it's because we don't use relocs on arm64 and have to compose full 64bit address using several instructions to access a static field. E.g.:

static int field;

void IncrementField() => field++;

X64:

       FF05C6CC4200         inc      dword ptr [(reloc 0x7ffeb73eac3c)]

arm64:

        D2958780          movz    x0, #0xac3c
        F2B6E760          movk    x0, #0xb73b LSL #16
        F2CFFFC0          movk    x0, #0x7ffe LSL #32
        B9400001          ldr     w1, [x0]
        11000421          add     w1, w1, #1
        B9000001          str     w1, [x0]

Overall, I have a feeling that we might get a very nice boost for many benchmarks/GC if we integrate PGO for native code (VM/GC)

vcsjones · 2022-03-30T14:56:34Z

System.Security.Cryptography.Tests.Perf_Hashing.Sha1 is 17-55% slower (most likely due to lack of vectorization). jit-contrib is it expected?

The SHA1.ComputeHash is going to be backed by the platform's SHA1 implementation (OpenSSL, CNG, SecurityTransforms) and doesn't do any vectorization itself anywhere. It's possible that the platform the tests were run under do not have optimized ARM64 implementations of SHA1.

danmoseley · 2022-03-30T15:11:41Z

Nice! I did a similar report last week and shared on our perf meeting last Monday

@EgorBo that data seems like something you could share on a gist for everyone? (Or perhaps just the scenarios with unusual ratios)

danmoseley · 2022-03-30T15:49:27Z

The System.Drawing ones may just be a difference in Windows GDI+ performance since it's largely a wrapper.

AndyAyersMS · 2022-03-30T15:58:09Z

PerfLabTests.LowLevelPerf.GenericClassGenericStaticField benchmark can be from 16% to x3 times slower. Same goes for PerfLabTests.LowLevelPerf.GenericClassGenericStaticMethod. @jkotas @AndyAyersMS is it expected?

My guess that it's because we don't use relocs on arm64 and have to compose full 64bit address using several instructions to access a static field.

https://github.com/dotnet/performance/blob/d7dac8a7ca12a28d099192f8a901cf8e30361384/src/benchmarks/micro/runtime/perflab/LowLevelPerf.cs#L320-L325

Access for generic statics (for shared generics at least, maybe for all?) can more complicated -- the address must be looked up in runtime data structures. Worth investigating.

tarekgh · 2022-03-30T18:20:06Z

System.Globalization.Tests.Perf_DateTimeCultureInfo.Parse(culturestring: ja) benchmark can be from 20% to x7 times slower (it's most likely an ICU problem).

Most likely it is because of ICU. We already having the issue #31273 tracking that. I don't know though why ARM64 runs make more slower.

danmoseley · 2022-03-30T18:31:19Z

Access for generic statics (for shared generics at least, maybe for all?) can more complicated -- the address must be looked up in runtime data structures. Worth investigating.

@EgorBo perhaps you could open an issue and update the top post?

EgorBo · 2022-03-30T18:56:04Z

@EgorBo perhaps you could open an issue and update the top post?

Access for generic statics (for shared generics at least, maybe for all?) can more complicated -- the address must be looked up in runtime data structures. Worth investigating.

right, but it doesn't look to be the case here since it's not shared

@EgorBo that data seems like something you could share on a gist for everyone?

Sure, let me see how to export an excel sheet to gist 😄

ericstj · 2022-03-30T20:31:50Z

The System.Drawing ones may just be a difference in Windows GDI+ performance since it's largely a wrapper.

There is a lot of interop in this scenario. Could be differences in interop or performance of this callback

runtime/src/libraries/System.Drawing.Common/src/System/Drawing/Internal/GPStream.COMWrappers.cs

Line 29 in 3ae8739

    
           public unsafe Interop.HRESULT CopyTo(IntPtr pstm, ulong cb, ulong* pcbRead, ulong* pcbWritten)

Could compare to performance of a load that doesn't use stream, and thus would be more of a GDI+ baseline. cc @eerhardt

danmoseley · 2022-03-30T20:33:03Z

@jkoritzinsky for that interop possibility. Jeremy anything notable in the interop here - any potentially relevant known issue on Arm64?

EgorBo · 2022-03-30T22:08:10Z

System.Text.Json.Serialization.Tests.WriteJson.SerializeToStream benchmark can be from 16% to x4 times slower.

this one serializes an array of bytes so it spends most of the time encoding data into base64. So it's the same as #35033

jkoritzinsky · 2022-03-30T22:10:42Z

for that interop possibility. Jeremy anything notable in the interop here - any potentially relevant known issue on Arm64?

We don't have any notable differences (or even any differences I can think of) in the portion of interop used there for ARM64 vs x64. I definitely wouldn't be amazed at all if some portion of GDI+ is better optimized for x64 and we're just seeing that here. @dotnet/interop-contrib if anyone else on the interop team has any issues that come to mind.

danmoseley · 2022-03-30T22:55:10Z

For the regex ones -- do we know we have vectorization gaps that are specific to Arm64 in any areas like -- StartsWith, IndexOf, IndexOfAny - @EgorBo ? (For char, not byte)

stephentoub · 2022-03-31T01:53:09Z

Few RegularExpressions benchmarks like System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "(?i)Sher[a-z]+|Hol[a-z]+", Options: Compiled) are 40-50% slower (most likely it's using a method that has not been vectorized).

For the regex ones -- do we know we have vectorization gaps that are specific to Arm64 in any areas like -- StartsWith, IndexOf, IndexOfAny - @EgorBo ? (For char, not byte)

The cited pattern will use IndexOfAny("HOho") to find the next possible match location. It has a 256-bit vectorization path on x64 but only 128-bit on ARM64.

danmoseley · 2022-03-31T17:39:25Z

@EgorBo is that IndexOfAny(char, char..) work part of #66993 ?

EgorBo · 2022-03-31T18:34:12Z

@EgorBo is that IndexOfAny(char, char..) work part of #66993 ?

It is, but I start to think that we won't be able to properly lower Vector256 to double Vector128s in JIT, so I wonder if we should do that on C#/IL level instead e.g. Source-Generators if we really want to - some say that generally these APIs mostly work with small data and cases when we need to open a 0.5Mb book and find a word in it are rare..

tannergooding · 2022-03-31T18:48:35Z

I really don't think its worth focusing on or investing in that.

Like you mentioned, doing it in the JIT is somewhat problematic because you have to take Vector256<T> which is a user-defined non HVA struct (not equivalent to struct Hva256<T> { Vector128<T> _lower; Vector128<T> _upper; }) and then decompose it into 2x efficient 128-bit operations.

Decomposition here isn't necessarily trivial and has questionable perf throughput for various operations leading users to a potential pit of failure, particularly when running on low-power devices (may negatively impact Mobile).

We could do some clever things here and other various optimizations to make it work nicely (including treating it as an HVA), but its not a small amount of work.

On top of that, it won't really "close" the gap. The places where doing 2x 128-bit ops on ARM64 are likely the same places where doing 2x 256-bit ops on x64 would provide similar gains.

We simply shouldn't be trying to compare 128-bit Arm64 vs 256-bit x64, just like we shouldn't compare 256-bit x64 to 512-bit x64 (or 128-bit x64 to 256-bit x64); nor should we try to compare ARM SVE (if/when we get that support) against x64.

We should instead, when doing x64 vs Arm64 comparisons compare 128-bit Arm64 to 128-bit x64. The "simplest" way to do that here is generally COMPlus_EnableAVX2=0, but more ideally we'd just have a way to force 128-bit code paths without disabling any ISAs.

danmoseley · 2022-03-31T20:39:41Z

some say that generally these APIs mostly work with small data and cases when we need to open a 0.5Mb book and find a word in it are rare..

I don't think you can assume this given they're critical to regex matching. @stephentoub @joperezr may have a better sense of typical regex text lengths (of course it also depends on how common hits are)

We simply shouldn't be trying to compare 128-bit Arm64 vs 256-bit x64

Comparing across hardware is inevitably bogus -- I thought the purpose of this exercise was to look for unusual ratios that might suggest room for targeted improvement by whatever means. Just sounds like there may not be a means, in this case.

EgorBo · 2022-03-31T20:48:45Z

On top of that, it won't really "close" the gap. The places where doing 2x 128-bit ops on ARM64 are likely the same places where doing 2x 256-bit ops on x64 would provide similar gains.

I support your point, however, I think SpanHelpers methods are core performance primitives (just like memset and memcpy) in many things, especially IndexOf, IndexOfAny and SequenceEqual, I've seen these 3 in a lot of profiles in different apps (but I've not measured the average input size they worked on) so they might deserve to have 2x256 path or even 4x256 - that's what native compilers do when you ask them to unroll a loop on e.g. Skylake - they will even do 2*(4*256) per iteration. Although, in order to close the gap here for arm64 we need SVE2 😄

We can add JIT support here, e.g. JIT will be responsible to replace SpanHelpers.IndexOf with a call to a heavily optimized pipelined version if inputs are usually big (PGO)

EgorBo · 2022-03-31T20:59:57Z

https://godbolt.org/z/MxhGPPvaj

here I wrote a simple loop to add 2 to all elements in an array of integers.

arm64 with all ISAs available - two SVE2 vectors
arm64 for Apple-M1 - two Vector128 operations
x64 Skylake - 2 groups of 4 Vector256 operations

I didn't even use -O3 here 😐

tannergooding · 2022-03-31T23:07:45Z

I support your point, however, I think SpanHelpers methods are core performance primitives (just like memset and memcpy) in many things, especially IndexOf, IndexOfAny and SequenceEqual, I've seen these 3 in a lot of profiles in different apps (but I've not measured the average input size they worked on) so they might deserve to have 2x256 path or even 4x256 - that's what native compilers do when you ask them to unroll a loop on e.g. Skylake - they will even do 2*(4*256) per iteration. Although, in order to close the gap here for arm64 we need SVE2 😄

Right. My point is that we shouldn't drive the work solely based on closing some non-representative Arm64 vs x64 perf gap, because that will be impossible given the two sets of hardware we have (particularly if we actually try and do our best for each platform).

If it is perf critical, we should be hand tuning this to fit our needs for all the relevant platforms. If that includes manually unrolling and pipelining, then that's fine (assuming numbers across the hardware we care about show the respective gains).

danmoseley · 2022-04-01T01:10:55Z

These API's are perf critical (certainly for 'char', if it matters)-- if we think it's feasible at reasonable cost to make them significantly faster on this architecture by whatever means, can we get an issue open for that?

EgorBo · 2022-04-01T09:20:30Z

These API's are perf critical (certainly for 'char', if it matters)-- if we think it's feasible at reasonable cost to make them significantly faster on this architecture by whatever means, can we get an issue open for that?

Sure, but I'd love to mine some data first for some apps, 1st parties, benchmarks to understand typical inputs better

adamsitnik added area-Meta tenet-performance Performance related issue tracking This issue is tracking the completion of other related issues. labels Mar 30, 2022

dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Mar 30, 2022

adamsitnik removed the untriaged New issue has not been triaged by the area owner label Mar 30, 2022

adamsitnik mentioned this issue Apr 1, 2022

Improving ARM64 Performance in .NET 7.0 #64820

Closed

32 tasks

danmoseley mentioned this issue Apr 5, 2022

.Net 6. Running processes as separate tasks increases the execution time drastically #67506

Open

jeffhandley added this to the 7.0.0 milestone Apr 9, 2022

kunalspathak mentioned this issue May 20, 2022

Optimize System.Buffers for arm64 using cross-platform intrinsics #35033

Closed

2 tasks

jeffhandley modified the milestones: 7.0.0, Future Aug 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x64 vs ARM64 Microbenchmarks Performance Study Report #67339

x64 vs ARM64 Microbenchmarks Performance Study Report #67339

adamsitnik commented Mar 30, 2022 •

edited by jeffhandley

Loading

ghost commented Mar 30, 2022

EgorBo commented Mar 30, 2022 •

edited

Loading

vcsjones commented Mar 30, 2022

danmoseley commented Mar 30, 2022

danmoseley commented Mar 30, 2022

AndyAyersMS commented Mar 30, 2022

tarekgh commented Mar 30, 2022

danmoseley commented Mar 30, 2022

EgorBo commented Mar 30, 2022 •

edited

Loading

ericstj commented Mar 30, 2022

danmoseley commented Mar 30, 2022

EgorBo commented Mar 30, 2022 •

edited

Loading

jkoritzinsky commented Mar 30, 2022

danmoseley commented Mar 30, 2022 •

edited

Loading

stephentoub commented Mar 31, 2022 •

edited

Loading

danmoseley commented Mar 31, 2022 •

edited

Loading

EgorBo commented Mar 31, 2022 •

edited

Loading

tannergooding commented Mar 31, 2022

danmoseley commented Mar 31, 2022

EgorBo commented Mar 31, 2022 •

edited

Loading

EgorBo commented Mar 31, 2022 •

edited

Loading

tannergooding commented Mar 31, 2022

danmoseley commented Apr 1, 2022

EgorBo commented Apr 1, 2022

x64 vs ARM64 Microbenchmarks Performance Study Report #67339

x64 vs ARM64 Microbenchmarks Performance Study Report #67339

Comments

adamsitnik commented Mar 30, 2022 • edited by jeffhandley Loading

ghost commented Mar 30, 2022

EgorBo commented Mar 30, 2022 • edited Loading

vcsjones commented Mar 30, 2022

danmoseley commented Mar 30, 2022

danmoseley commented Mar 30, 2022

AndyAyersMS commented Mar 30, 2022

tarekgh commented Mar 30, 2022

danmoseley commented Mar 30, 2022

EgorBo commented Mar 30, 2022 • edited Loading

ericstj commented Mar 30, 2022

danmoseley commented Mar 30, 2022

EgorBo commented Mar 30, 2022 • edited Loading

jkoritzinsky commented Mar 30, 2022

danmoseley commented Mar 30, 2022 • edited Loading

stephentoub commented Mar 31, 2022 • edited Loading

danmoseley commented Mar 31, 2022 • edited Loading

EgorBo commented Mar 31, 2022 • edited Loading

tannergooding commented Mar 31, 2022

danmoseley commented Mar 31, 2022

EgorBo commented Mar 31, 2022 • edited Loading

EgorBo commented Mar 31, 2022 • edited Loading

tannergooding commented Mar 31, 2022

danmoseley commented Apr 1, 2022

EgorBo commented Apr 1, 2022

adamsitnik commented Mar 30, 2022 •

edited by jeffhandley

Loading

EgorBo commented Mar 30, 2022 •

edited

Loading

EgorBo commented Mar 30, 2022 •

edited

Loading

EgorBo commented Mar 30, 2022 •

edited

Loading

danmoseley commented Mar 30, 2022 •

edited

Loading

stephentoub commented Mar 31, 2022 •

edited

Loading

danmoseley commented Mar 31, 2022 •

edited

Loading

EgorBo commented Mar 31, 2022 •

edited

Loading

EgorBo commented Mar 31, 2022 •

edited

Loading

EgorBo commented Mar 31, 2022 •

edited

Loading