Add IndexOfAnyValues.Contains #78996

MihaZupan · 2022-11-29T21:34:04Z

I also moved all the IndexOfAnyValues-specific tests into their own file (probably should have done that from the get-go).

Benchmarks are of the form of

public int SomeContains()
{
    int sum = 0;
    foreach (char c in VeryLongInput)
    {
        if (SomeContainsCheck(c)) sum++;
    }
    return sum;
}

For 6 values when compared to a string:

Method	Length	Mean	Error
Contains	100000	76.31 us	0.227 us
ContainsString	100000	371.73 us	1.701 us

For the values "ab" which go through the range contains for IndexOfAnyValues:

Method	Length	Mean	Error
Contains	100000	58.15 us	1.138 us
ContainsString	100000	359.89 us	3.287 us
ContainsCharIsBetween	100000	51.15 us	1.020 us

dotnet-issue-labeler · 2022-11-29T21:34:11Z

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

ghost · 2022-11-29T21:34:11Z

Tagging subscribers to this area: @dotnet/area-system-memory
See info in area-owners.md if you want to be subscribed.

Issue Details

Implements #78722

I also moved all the IndexOfAnyValues-specific tests into their own file (probably should have done that from the get-go).

Benchmarks are of the form of

public int SomeContains()
{
    int sum = 0;
    foreach (char c in VeryLongInput)
    {
        if (SomeContainsCheck(c)) sum++;
    }
    return sum;
}

For 6 values when compared to a string:

Method	Length	Mean	Error
Contains	100000	76.31 us	0.227 us
ContainsString	100000	371.73 us	1.701 us

For the values "ab" which go through the range contains for IndexOfAnyValues:

Method	Length	Mean	Error
Contains	100000	58.15 us	1.138 us
ContainsString	100000	359.89 us	3.287 us
ContainsCharIsBetween	100000	51.15 us	1.020 us

Author:	MihaZupan
Assignees:	-
Labels:	`area-System.Memory`
Milestone:	8.0.0

stephentoub · 2022-11-29T22:46:27Z

src/libraries/System.Private.CoreLib/src/System/IndexOfAnyValues/IndexOfAny2Values.cs

@@ -19,6 +20,10 @@ public IndexOfAny2Values(ReadOnlySpan<T> values)

        internal override T[] GetValues() => new[] { _e0, _e1 };

+        [MethodImpl(MethodImplOptions.AggressiveInlining)]
+        internal override bool ContainsCore(T value) =>
+            value == _e0 || value == _e1;


Just curious what the perf looks like if this is made branchless with an | instead of an ||. I don't know if that's a good tradeoff or not given typical usage, e.g. how likely it is the first check will succeed or fail.

Method Length Mean Error Ratio

2 || 100000 87.79 us 0.290 us 1.00

2 | 100000 87.81 us 0.369 us 1.00

3 || 100000 110.8 us 0.66 us 1.00

3 | 100000 109.7 us 0.48 us 0.99

4 || 100000 109.9 us 0.52 us 1.00

4 | 100000 181.2 us 0.53 us 1.65

Codegen for (| is on the right)

2 values: https://www.diffchecker.com/3ytgU658

4 values: https://www.diffchecker.com/LfsR8ge8

These are all for cases where the condition is never hit

We could also experiment with the sort of optimizations like your ([x, x + 32)) in the future here

These are all for cases where the condition is never hit

Yeah, it'll depend on the data and how likely the branch predictor is to get it right. If the condition is never hit, the branch predictor is basically always going to be right, both clauses will always execute, and they should effectively be identical, which is what your data shows (at least for 2 and 3... I'm surprised 4 falls off a cliff).

If I instead change the data to alternate between matching the first char and not matching anything, the data looks different, at least on my machine:

using BenchmarkDotNet.Attributes; using BenchmarkDotNet.Running; [MemoryDiagnoser] public partial class Program { static void Main(string[] args) => BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args); private char _c1, _c2; private char[] _values = new char[100000]; [GlobalSetup] public void Setup() { for (int i = 0; i < _values.Length; i++) { _values[i] = (char)(i % 2); } _c1 = (char)0; _c2 = 'c'; } [Benchmark(Baseline = true)] public int Count_Logical() { int count = 0; foreach (char c in _values) { if ((_c1 == c) || (_c2 == c)) count++; } return count; } [Benchmark] public int Count_Bitwise() { int count = 0; foreach (char c in _values) { if ((_c1 == c) | (_c2 == c)) count++; } return count; } }

Method Mean Error StdDev Ratio

Count_Logical 118.12 us 0.337 us 0.315 us 1.00

Count_Bitwise 77.48 us 0.423 us 0.353 us 0.66

Interesting... do you think it's worth changing to | (at least for 2 & 3) in that case?

Let me rerun these with randomized inputs...

I'm fine with either.

Meh, this can go either way. I'll leave it as-is for now.

Where main is || and pr is |:

Method Toolchain Length MatchChance Mean Error Ratio

Contains_FirstMatch main 100000 30 291.3 us 1.46 us 1.00

Contains_FirstMatch pr 100000 30 271.0 us 1.82 us 0.93

Contains_SecondMatch main 100000 30 272.6 us 1.19 us 1.00

Contains_SecondMatch pr 100000 30 271.3 us 0.86 us 1.00

Contains_FirstMatch main 100000 40 339.2 us 1.13 us 1.00

Contains_FirstMatch pr 100000 40 336.2 us 3.16 us 0.99

Contains_SecondMatch main 100000 40 335.4 us 1.05 us 1.00

Contains_SecondMatch pr 100000 40 334.2 us 1.33 us 1.00

Contains_FirstMatch main 100000 50 348.5 us 1.25 us 1.00

Contains_FirstMatch pr 100000 50 366.6 us 1.57 us 1.05

Contains_SecondMatch main 100000 50 371.9 us 1.40 us 1.00

Contains_SecondMatch pr 100000 50 365.8 us 2.05 us 0.98

Contains_FirstMatch main 100000 60 319.1 us 0.61 us 1.00

Contains_FirstMatch pr 100000 60 349.2 us 6.46 us 1.10

Contains_SecondMatch main 100000 60 341.4 us 1.89 us 1.00

Contains_SecondMatch pr 100000 60 338.9 us 4.85 us 0.99

Contains_FirstMatch main 100000 70 280.0 us 5.47 us 1.00

Contains_FirstMatch pr 100000 70 283.6 us 5.27 us 1.01

Contains_SecondMatch main 100000 70 282.4 us 5.39 us 1.00

Contains_SecondMatch pr 100000 70 281.2 us 5.57 us 1.00

src/libraries/System.Private.CoreLib/src/System/IndexOfAnyValues/IndexOfAny4Values.cs

stephentoub

I didn't review the tests and assume they only moved and didn't change at all. Otherwise, LGTM.

MihaZupan · 2022-11-29T23:03:12Z

I didn't review the tests and assume they only moved and didn't change at all.

Except for the new contains-specific ones, they're unchanged.

stephentoub · 2022-11-30T01:31:20Z

Except for the new contains-specific ones, they're unchanged.

New test LGTM

Add IndexOfAnyValues.Contains

4ac5920

MihaZupan added the area-System.Memory label Nov 29, 2022

MihaZupan added this to the 8.0.0 milestone Nov 29, 2022

MihaZupan requested a review from stephentoub November 29, 2022 21:34

dotnet-issue-labeler bot added the new-api-needs-documentation label Nov 29, 2022

ghost assigned MihaZupan Nov 29, 2022

stephentoub reviewed Nov 29, 2022

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/IndexOfAnyValues/IndexOfAny4Values.cs Outdated Show resolved Hide resolved

stephentoub approved these changes Nov 29, 2022

View reviewed changes

Unsafe => unsafe

013fa36

MihaZupan merged commit b85e152 into dotnet:main Nov 30, 2022

build-analysis bot mentioned this pull request Nov 30, 2022

CI build failure: Build MacCatalyst x64 Release AllSubsets_Mono - XcodeBuildApp task failed unexpectedly - Could not find System.Runtime.Tests.app #78778

Closed

ghost locked as resolved and limited conversation to collaborators Dec 30, 2022

jeffhandley added the blog-candidate Completed PRs that are candidate topics for blog post coverage label Mar 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add IndexOfAnyValues.Contains #78996

Add IndexOfAnyValues.Contains #78996

MihaZupan commented Nov 29, 2022 •

edited

Loading

dotnet-issue-labeler bot commented Nov 29, 2022

ghost commented Nov 29, 2022

stephentoub Nov 29, 2022

MihaZupan Nov 29, 2022

MihaZupan Nov 29, 2022

MihaZupan Nov 30, 2022

stephentoub Nov 30, 2022

MihaZupan Nov 30, 2022 •

edited

Loading

stephentoub Nov 30, 2022

MihaZupan Nov 30, 2022

stephentoub left a comment

MihaZupan commented Nov 29, 2022 •

edited

Loading

stephentoub commented Nov 30, 2022

Method	Length	Mean	Error	Ratio
2 \|\|	100000	87.79 us	0.290 us	1.00
2 \|	100000	87.81 us	0.369 us	1.00

3 \|\|	100000	110.8 us	0.66 us	1.00
3 \|	100000	109.7 us	0.48 us	0.99

4 \|\|	100000	109.9 us	0.52 us	1.00
4 \|	100000	181.2 us	0.53 us	1.65

Method	Mean	Error	StdDev	Ratio
Count_Logical	118.12 us	0.337 us	0.315 us	1.00
Count_Bitwise	77.48 us	0.423 us	0.353 us	0.66

Method	Toolchain	Length	MatchChance	Mean	Error	Ratio
Contains_FirstMatch	main	100000	30	291.3 us	1.46 us	1.00
Contains_FirstMatch	pr	100000	30	271.0 us	1.82 us	0.93

Contains_SecondMatch	main	100000	30	272.6 us	1.19 us	1.00
Contains_SecondMatch	pr	100000	30	271.3 us	0.86 us	1.00

Contains_FirstMatch	main	100000	40	339.2 us	1.13 us	1.00
Contains_FirstMatch	pr	100000	40	336.2 us	3.16 us	0.99

Contains_SecondMatch	main	100000	40	335.4 us	1.05 us	1.00
Contains_SecondMatch	pr	100000	40	334.2 us	1.33 us	1.00

Contains_FirstMatch	main	100000	50	348.5 us	1.25 us	1.00
Contains_FirstMatch	pr	100000	50	366.6 us	1.57 us	1.05

Contains_SecondMatch	main	100000	50	371.9 us	1.40 us	1.00
Contains_SecondMatch	pr	100000	50	365.8 us	2.05 us	0.98

Contains_FirstMatch	main	100000	60	319.1 us	0.61 us	1.00
Contains_FirstMatch	pr	100000	60	349.2 us	6.46 us	1.10

Contains_SecondMatch	main	100000	60	341.4 us	1.89 us	1.00
Contains_SecondMatch	pr	100000	60	338.9 us	4.85 us	0.99

Contains_FirstMatch	main	100000	70	280.0 us	5.47 us	1.00
Contains_FirstMatch	pr	100000	70	283.6 us	5.27 us	1.01

Contains_SecondMatch	main	100000	70	282.4 us	5.39 us	1.00
Contains_SecondMatch	pr	100000	70	281.2 us	5.57 us	1.00

Add IndexOfAnyValues.Contains #78996

Add IndexOfAnyValues.Contains #78996

Conversation

MihaZupan commented Nov 29, 2022 • edited Loading

dotnet-issue-labeler bot commented Nov 29, 2022

ghost commented Nov 29, 2022

stephentoub Nov 29, 2022

Choose a reason for hiding this comment

MihaZupan Nov 29, 2022

Choose a reason for hiding this comment

MihaZupan Nov 29, 2022

Choose a reason for hiding this comment

MihaZupan Nov 30, 2022

Choose a reason for hiding this comment

stephentoub Nov 30, 2022

Choose a reason for hiding this comment

MihaZupan Nov 30, 2022 • edited Loading

Choose a reason for hiding this comment

stephentoub Nov 30, 2022

Choose a reason for hiding this comment

MihaZupan Nov 30, 2022

Choose a reason for hiding this comment

stephentoub left a comment

Choose a reason for hiding this comment

MihaZupan commented Nov 29, 2022 • edited Loading

stephentoub commented Nov 30, 2022

MihaZupan commented Nov 29, 2022 •

edited

Loading

MihaZupan Nov 30, 2022 •

edited

Loading

MihaZupan commented Nov 29, 2022 •

edited

Loading