Remove mono specific SpanHelpers #79215

BrzVlad · 2022-12-05T09:03:34Z

#73768 did various changes that hurt span performance on mono. The change was reverted afterwards on mono by restoring old code in #75917.

This PR removes mono specific code and solves performance problems via small tweaks within non vectorized code inside SpanHelpers, as well as by adding a couple of optimizations to mono interpreter.

ghost · 2022-12-05T09:03:43Z

Tagging subscribers to this area: @BrzVlad
See info in area-owners.md if you want to be subscribed.

Issue Details

#73768 did various changes that hurt span performance on mono. The change was reverted afterwards on mono by restoring old code in #75917.

This PR removes mono specific code and solves performance problems via small tweaks within non vectorized code inside SpanHelpers, as well as by adding a couple of optimizations to mono interpreter.

Author:	BrzVlad
Assignees:	-
Labels:	`area-Codegen-Interpreter-mono`
Milestone:	-

src/libraries/System.Private.CoreLib/src/System/MemoryExtensions.cs

BrzVlad · 2022-12-08T09:17:37Z

@vargaz Could you take another look at this ?

…ssions (dotnet#75917)" This reverts commit 254844a.

This would replace code like ``` load b.neq add ret load b.neq add ret load .... ``` with ``` load b.eq load b.eq load ... ``` This makes the code more compact in the hot loop, reduces overall code size and thus improves performance. This pattern is widely used and it was also used before with Span lookups.

…al branch

Before we were marking bblocks as dead if they had their in_count 0. This is not enough however, since it doesn't account for loops. We now do a full traversal of the bblock graph to detect unreachable bblocks.

Consider for example the following pattern used commonly with conditional branches: ``` br.s [nil <- nil], BB0 ... ceq0.i4 [32 <- 40], br.s [nil <- nil], BB1 BB0: ldc.i4.0 [32 <- nil], BB1: brfalse.i4.s [nil <- 32], BB_EXIT BB2: ldstr [56 <- nil], 2 ``` This commit reorders this code to look like: ``` br.s [nil <- nil], BB0 ... ceq0.i4 [32 <- 40], brfalse.i4.s [nil <- 32], BB_EXIT br.s [nil <- nil], BB2 BB0 ldc.i4.0 [32 <- nil], BB1: brfalse.i4.s [nil <- 32], BB_EXIT BB2: ldstr [56 <- nil], 2 ``` This means we will have duplicated brfalse instructions, but every basic block reaching the conditional branch will have information about the condition. For example ceq0.i4 + brfalse is equivalent to brtrue, ldc.i4.0 + brfalse is equivalent to unconditional branch. After other future optimizations applied on the bblocks graph, like removal, merging and propagation of target, the resulting code in this example would look like: ``` br.s [nil <- nil], BB_EXIT ... brtrue.i4.s [nil <- 40], BB_EXIT BB2: ldstr [56 <- nil], 2 ``` Which is a great simplification over the original code.

… targets Even though they can be become unreachable in the current method, they can still be called when the unoptimized method gets tiered at this point. Add assert to prevent such issues in the future.

If we are unlikely to gain anything from propagating the condition (if we don't have information about any of the condition operand vars), simply avoid the optimization.

If we store in a var and this var is not used and redefined by the end of the basic block, then we can clear the original store.

We detect if a var's value never escapes the definition of a bblock. We mark such vars and clear unused definitions of that var from other bblocks.

If a bblock contains only an unconditional br, then all bblocks branching into it can just call the target directly instead.

This pattern is used in low level unsafe code when using (var + ct1) as an index into an array, where ct2 is the sizeof of array element. Also fix diplay of two shorts when dumping instructions.

These new instructions can apply addition and multiplication with constant to the offset var.

radekdoulik · 2022-12-19T10:58:48Z

Nice! I see interpreter improvements in the SpanHelper bytes measurements. The chars are mixed, improvements in firefox, some slower in chrome. Overall it is great improvement.

https://radekdoulik.github.io/WasmPerformanceMeasurements/?startDate=2022-12-05T10%3A43%3A18.830Z&endDate=2022-12-19T10%3A43%3A18.830Z&tasks=&flavors=2%2C3

It also improved Index of chars with the aot/simd. I think that's because the current code uses S.R.I.Vector128, where we have better coverage, compared to the older, which used S.N.Vector.

https://radekdoulik.github.io/WasmPerformanceMeasurements/?startDate=2022-12-05T10%3A43%3A18.830Z&endDate=2022-12-19T10%3A43%3A18.830Z&tasks=&flavors=4%2C8

/cc @lewing

MihaZupan · 2022-12-19T15:10:11Z

@BrzVlad can you please take a look at the changes in #78861 (they only affect platforms with Sse2/AdvSimd support). Should those be updated to better match what was done in this PR? I'm trying to avoid accidentally undoing the improvements you made here.

BrzVlad · 2022-12-22T10:24:03Z

@MihaZupan I don't think there are any problems between these 2 PRs. As long as your code is guarded everywhere with IsSupported / IsHardwareAccelerated then there should be changed for mono interpreter.

BrzVlad added the area-Codegen-Interpreter-mono label Dec 5, 2022

BrzVlad requested a review from vargaz as a code owner December 5, 2022 09:03

ghost assigned BrzVlad Dec 5, 2022

BrzVlad mentioned this pull request Dec 5, 2022

Remove mono specific SpanHelpers #78015

Closed

jkotas reviewed Dec 5, 2022

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/MemoryExtensions.cs Show resolved Hide resolved

BrzVlad force-pushed the fix-interp-index-of-regression2 branch from 91fa0cf to 7ce055f Compare December 5, 2022 17:54

This was referenced Dec 5, 2022

SyndicationFeed_Write_RSS_Atom test failing with "IOException : The file '/tmp/' already exists." #78454

Closed

iOS & tvOS legs are failing to AOT System.Net.Http.Json #79279

Closed

BrzVlad force-pushed the fix-interp-index-of-regression2 branch from 7ce055f to 374442d Compare December 8, 2022 20:40

This was referenced Dec 9, 2022

Tracking issue for CI build timeouts #76454

Closed

Precondition failure: File has not had execution verified #79439

Closed

BrzVlad added 13 commits December 12, 2022 19:26

Revert "[Mono] Restore old code to solve the recent SpanHelpers regre…

c3cbacd

…ssions (dotnet#75917)" This reverts commit 254844a.

[mono][interp] Replace compare + brfalse/brtrue with single condition…

c7d817f

…al branch

[mono][interp] Dump in/out links for bblocks during verbose logging

300afdb

[mono][interp] Improve detection of dead bblocks

e9f50e2

Before we were marking bblocks as dead if they had their in_count 0. This is not enough however, since it doesn't account for loops. We now do a full traversal of the bblock graph to detect unreachable bblocks.

[mono][interp] Don't optimize out bblocks that are tiering patchpoint…

83b2d05

… targets Even though they can be become unreachable in the current method, they can still be called when the unoptimized method gets tiered at this point. Add assert to prevent such issues in the future.

[mono][interp] Make bblock reordering more conservative

fe8288a

If we are unlikely to gain anything from propagating the condition (if we don't have information about any of the condition operand vars), simply avoid the optimization.

[mono][interp] Add basic removal of unused defines

65feed0

If we store in a var and this var is not used and redefined by the end of the basic block, then we can clear the original store.

[mono][interp] Clear unused defines of local only vars

3f7bd3d

We detect if a var's value never escapes the definition of a bblock. We mark such vars and clear unused definitions of that var from other bblocks.

[mono][interp] Propagate target branches

7b41739

If a bblock contains only an unconditional br, then all bblocks branching into it can just call the target directly instead.

[mono][interp] Add super instruction for (var + ct1) * ct2

3558b6d

This pattern is used in low level unsafe code when using (var + ct1) as an index into an array, where ct2 is the sizeof of array element. Also fix diplay of two shorts when dumping instructions.

[mono][interp] Add new ldind super instruction

32b8723

These new instructions can apply addition and multiplication with constant to the offset var.

BrzVlad force-pushed the fix-interp-index-of-regression2 branch from 374442d to 32b8723 Compare December 12, 2022 17:27

build-analysis bot mentioned this pull request Dec 13, 2022

[wasm] Library tests failing during linking for AOT - SIGKILL #79569

Closed

vargaz approved these changes Dec 16, 2022

View reviewed changes

BrzVlad merged commit 200a90a into dotnet:main Dec 18, 2022

MihaZupan mentioned this pull request Dec 19, 2022

Remove Mono SpanHelpers workaround #79821

Merged

This was referenced Dec 20, 2022

[Perf] Linux/x64: 271 Improvements on 12/18/2022 12:43:31 PM dotnet/perf-autofiling-issues#10989

Closed

[mono][interpreter] Fix performance of new span helpers #76326

Closed

radical mentioned this pull request Jan 4, 2023

[wasm] Regression in System.Text.Json tests with AOT #80179

Closed

BrzVlad mentioned this pull request Jan 13, 2023

Mono interpreter performance optimizations #47520

Open

24 tasks

ghost locked as resolved and limited conversation to collaborators Jan 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove mono specific SpanHelpers #79215

Remove mono specific SpanHelpers #79215

BrzVlad commented Dec 5, 2022

ghost commented Dec 5, 2022

BrzVlad commented Dec 8, 2022

radekdoulik commented Dec 19, 2022

MihaZupan commented Dec 19, 2022

BrzVlad commented Dec 22, 2022

Remove mono specific SpanHelpers #79215

Remove mono specific SpanHelpers #79215

Conversation

BrzVlad commented Dec 5, 2022

ghost commented Dec 5, 2022

BrzVlad commented Dec 8, 2022

radekdoulik commented Dec 19, 2022

MihaZupan commented Dec 19, 2022

BrzVlad commented Dec 22, 2022