Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IndexOfAnyValues.Contains #78996

Merged
merged 2 commits into from
Nov 30, 2022

Conversation

MihaZupan
Copy link
Member

@MihaZupan MihaZupan commented Nov 29, 2022

Closes #78722

I also moved all the IndexOfAnyValues-specific tests into their own file (probably should have done that from the get-go).

Benchmarks are of the form of

public int SomeContains()
{
    int sum = 0;
    foreach (char c in VeryLongInput)
    {
        if (SomeContainsCheck(c)) sum++;
    }
    return sum;
}

For 6 values when compared to a string:

Method Length Mean Error
Contains 100000 76.31 us 0.227 us
ContainsString 100000 371.73 us 1.701 us

For the values "ab" which go through the range contains for IndexOfAnyValues:

Method Length Mean Error
Contains 100000 58.15 us 1.138 us
ContainsString 100000 359.89 us 3.287 us
ContainsCharIsBetween 100000 51.15 us 1.020 us

@dotnet-issue-labeler
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@ghost
Copy link

ghost commented Nov 29, 2022

Tagging subscribers to this area: @dotnet/area-system-memory
See info in area-owners.md if you want to be subscribed.

Issue Details

Implements #78722

I also moved all the IndexOfAnyValues-specific tests into their own file (probably should have done that from the get-go).

Benchmarks are of the form of

public int SomeContains()
{
    int sum = 0;
    foreach (char c in VeryLongInput)
    {
        if (SomeContainsCheck(c)) sum++;
    }
    return sum;
}

For 6 values when compared to a string:

Method Length Mean Error
Contains 100000 76.31 us 0.227 us
ContainsString 100000 371.73 us 1.701 us

For the values "ab" which go through the range contains for IndexOfAnyValues:

Method Length Mean Error
Contains 100000 58.15 us 1.138 us
ContainsString 100000 359.89 us 3.287 us
ContainsCharIsBetween 100000 51.15 us 1.020 us
Author: MihaZupan
Assignees: -
Labels:

area-System.Memory

Milestone: 8.0.0

@ghost ghost assigned MihaZupan Nov 29, 2022
@@ -19,6 +20,10 @@ public IndexOfAny2Values(ReadOnlySpan<T> values)

internal override T[] GetValues() => new[] { _e0, _e1 };

[MethodImpl(MethodImplOptions.AggressiveInlining)]
internal override bool ContainsCore(T value) =>
value == _e0 || value == _e1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious what the perf looks like if this is made branchless with an | instead of an ||. I don't know if that's a good tradeoff or not given typical usage, e.g. how likely it is the first check will succeed or fail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method Length Mean Error Ratio
2 || 100000 87.79 us 0.290 us 1.00
2 | 100000 87.81 us 0.369 us 1.00
3 || 100000 110.8 us 0.66 us 1.00
3 | 100000 109.7 us 0.48 us 0.99
4 || 100000 109.9 us 0.52 us 1.00
4 | 100000 181.2 us 0.53 us 1.65

Codegen for (| is on the right)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all for cases where the condition is never hit

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also experiment with the sort of optimizations like your ([x, x + 32)) in the future here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all for cases where the condition is never hit

Yeah, it'll depend on the data and how likely the branch predictor is to get it right. If the condition is never hit, the branch predictor is basically always going to be right, both clauses will always execute, and they should effectively be identical, which is what your data shows (at least for 2 and 3... I'm surprised 4 falls off a cliff).

If I instead change the data to alternate between matching the first char and not matching anything, the data looks different, at least on my machine:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

[MemoryDiagnoser]
public partial class Program
{
    static void Main(string[] args) => BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);

    private char _c1, _c2;

    private char[] _values = new char[100000];

    [GlobalSetup]
    public void Setup()
    {
        for (int i = 0; i < _values.Length; i++)
        {
            _values[i] = (char)(i % 2);
        }
        _c1 = (char)0;
        _c2 = 'c';
    }

    [Benchmark(Baseline = true)]
    public int Count_Logical()
    {
        int count = 0;
        foreach (char c in _values)
        {
            if ((_c1 == c) || (_c2 == c)) count++;
        }
        return count;
    }

    [Benchmark]
    public int Count_Bitwise()
    {
        int count = 0;
        foreach (char c in _values)
        {
            if ((_c1 == c) | (_c2 == c)) count++;
        }
        return count;
    }
}
Method Mean Error StdDev Ratio
Count_Logical 118.12 us 0.337 us 0.315 us 1.00
Count_Bitwise 77.48 us 0.423 us 0.353 us 0.66

Copy link
Member Author

@MihaZupan MihaZupan Nov 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting... do you think it's worth changing to | (at least for 2 & 3) in that case?

Let me rerun these with randomized inputs...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with either.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meh, this can go either way. I'll leave it as-is for now.

Where main is || and pr is |:

Method Toolchain Length MatchChance Mean Error Ratio
Contains_FirstMatch main 100000 30 291.3 us 1.46 us 1.00
Contains_FirstMatch pr 100000 30 271.0 us 1.82 us 0.93
Contains_SecondMatch main 100000 30 272.6 us 1.19 us 1.00
Contains_SecondMatch pr 100000 30 271.3 us 0.86 us 1.00
Contains_FirstMatch main 100000 40 339.2 us 1.13 us 1.00
Contains_FirstMatch pr 100000 40 336.2 us 3.16 us 0.99
Contains_SecondMatch main 100000 40 335.4 us 1.05 us 1.00
Contains_SecondMatch pr 100000 40 334.2 us 1.33 us 1.00
Contains_FirstMatch main 100000 50 348.5 us 1.25 us 1.00
Contains_FirstMatch pr 100000 50 366.6 us 1.57 us 1.05
Contains_SecondMatch main 100000 50 371.9 us 1.40 us 1.00
Contains_SecondMatch pr 100000 50 365.8 us 2.05 us 0.98
Contains_FirstMatch main 100000 60 319.1 us 0.61 us 1.00
Contains_FirstMatch pr 100000 60 349.2 us 6.46 us 1.10
Contains_SecondMatch main 100000 60 341.4 us 1.89 us 1.00
Contains_SecondMatch pr 100000 60 338.9 us 4.85 us 0.99
Contains_FirstMatch main 100000 70 280.0 us 5.47 us 1.00
Contains_FirstMatch pr 100000 70 283.6 us 5.27 us 1.01
Contains_SecondMatch main 100000 70 282.4 us 5.39 us 1.00
Contains_SecondMatch pr 100000 70 281.2 us 5.57 us 1.00

Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't review the tests and assume they only moved and didn't change at all. Otherwise, LGTM.

@MihaZupan
Copy link
Member Author

MihaZupan commented Nov 29, 2022

I didn't review the tests and assume they only moved and didn't change at all.

Except for the new contains-specific ones, they're unchanged.

@stephentoub
Copy link
Member

Except for the new contains-specific ones, they're unchanged.

New test LGTM

@MihaZupan MihaZupan merged commit b85e152 into dotnet:main Nov 30, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Dec 30, 2022
@jeffhandley jeffhandley added the blog-candidate Completed PRs that are candidate topics for blog post coverage label Mar 14, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Memory blog-candidate Completed PRs that are candidate topics for blog post coverage new-api-needs-documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[API Proposal]: IndexOfAnyValues<T>.Contains(T)
3 participants