Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specialize Contains for Iterators in LINQ #112684

Merged
merged 4 commits into from
Feb 20, 2025

Conversation

stephentoub
Copy link
Member

It appears that Contains ends up being reasonably common after a series of LINQ operations, whether explicitly at the call site or because a method that returns an enumerable uses LINQ internally and then the call site does Contains.

We can optimize Contains for a bunch of operators, just as we can for First/Last. In some cases, we can skip the operator completely, e.g. Contains on a Shuffle or OrderBy is no different from one on the underlying source, in other cases we can optimize by processing the source directly, e.g. a Contains on a Concat can end up doing a Contains on each source, which can in turn pick up vectorized implementations if those individual sources support them. Some of the operators actually already provided Contains implementations as part of implementing IList, and this just makes those implementations accessible. In other cases, new overrides of a new virtual Contains on Iterator are added.

Method Toolchain Mean
AppendContains \main\corerun.exe 3,292.31 ns
AppendContains \pr\corerun.exe 102.05 ns
ConcatContains \main\corerun.exe 2,699.63 ns
ConcatContains \pr\corerun.exe 104.02 ns
DefaultIfEmptyContains \main\corerun.exe 90.02 ns
DefaultIfEmptyContains \pr\corerun.exe 69.29 ns
OrderByContains \main\corerun.exe 18,030.09 ns
OrderByContains \pr\corerun.exe 104.50 ns
ReverseContains \main\corerun.exe 494.79 ns
ReverseContains \pr\corerun.exe 99.41 ns
WhereSelectContains \main\corerun.exe 2,022.71 ns
WhereSelectContains \pr\corerun.exe 290.14 ns
ShuffleTakeContains \main\corerun.exe 4,439.82 ns
ShuffleTakeContains \pr\corerun.exe 129.11 ns
UnionContains \main\corerun.exe 17,385.06 ns
UnionContains \pr\corerun.exe 103.67 ns
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);

[MemoryDiagnoser(false)]
public class Tests
{
    private int[] _source = Enumerable.Range(0, 1000).ToArray();

    [Benchmark]
    public bool AppendContains() => _source.Append(100).Contains(999);

    [Benchmark]
    public bool ConcatContains() => _source.Concat(_source).Contains(999);

    [Benchmark]
    public bool DefaultIfEmptyContains() => _source.DefaultIfEmpty(42).Contains(999);

    [Benchmark]
    public bool OrderByContains() => _source.OrderBy(x => x).Contains(999);

    [Benchmark]
    public bool ReverseContains() => _source.Reverse().Contains(999);

    [Benchmark]
    public bool WhereSelectContains() => _source.Where(x => true).Select(x => x).Contains(999);

    [Benchmark]
    public bool ShuffleTakeContains() => _source.Shuffle().Take(5).Contains(999);

    [Benchmark]
    public bool UnionContains() => _source.Union(_source).Contains(999);
}

It appears that Contains ends up being reasonably common after a series of LINQ operations, whether explicitly at the call site or because a method that returns an enumerable uses LINQ internally and then the call site does Contains.

We can optimize Contains for a bunch of operators, just as we can for First/Last. In some cases, we can skip the operator completely, e.g. Contains on a Shuffle or OrderBy is no different from one on the underlying source, in other cases we can optimize by processing the source directly, e.g. a Contains on a Concat can end up doing a Contains on each source, which can in turn pick up vectorized implementations if those individual sources support them. Some of the operators actually already provided Contains implementations as part of implementing IList, and this just makes those implementations accessible. In other cases, new overrides of a new virtual Contains on Iterator are added.
@Copilot Copilot bot review requested due to automatic review settings February 19, 2025 05:25
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-linq
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Overview

This PR specializes and optimizes the Contains implementations for various LINQ iterators and operators (such as Append, Concat, Shuffle, DefaultIfEmpty, OfType, etc.) to improve performance by avoiding unnecessary enumeration and applying direct algorithms.

  • Optimizes Contains in iterators and speed-optimized LINQ operators.
  • Introduces specialized and overridden Contains methods in multiple files.
  • Enhances test coverage with new test cases to verify operator behavior.

Changes

File Description
src/libraries/System.Linq/tests/ContainsTests.cs Adds tests covering Contains via various LINQ operations.
src/libraries/System.Linq/src/System/Linq/Shuffle.SpeedOpt.cs Specializes and overrides Contains for Shuffle iterators with hypergeometric probability sampling.
src/libraries/System.Linq/src/System/Linq/DefaultIfEmpty.SpeedOpt.cs Implements optimized Contains for DefaultIfEmpty.
src/libraries/System.Linq/src/System/Linq/Concat.SpeedOpt.cs Adds Contains override for concatenated sequences.
src/libraries/System.Linq/src/System/Linq/OfType.SpeedOpt.cs Adds Contains implementation for OfType iterators.
src/libraries/System.Linq/src/System/Linq/Where.SpeedOpt.cs Specializes Contains for Where and WhereSelect iterators.
src/libraries/System.Linq/src/System/Linq/AppendPrepend.SpeedOpt.cs Implements Contains for Append/Prepend operators.
src/libraries/System.Linq/src/System/Linq/Union.SpeedOpt.cs Provides a Contains override that iterates over unioned sequences.
src/libraries/System.Linq/src/System/Linq/Select.SpeedOpt.cs Introduces Contains methods for various Select iterators.
Other files Update Contains in Reverse, Cast, SelectMany, Contains, Distinct, Iterator and SkipTake iterators for optimization.

Copilot reviewed 19 out of 19 changed files in this pull request and generated 1 comment.

Tip: If you use Visual Studio Code, you can request a review from Copilot before you push from the "Source Control" tab. Learn more

@lewing
Copy link
Member

lewing commented Feb 19, 2025

/azp run runtime

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@stephentoub stephentoub merged commit 59414b5 into dotnet:main Feb 20, 2025
83 of 86 checks passed
@stephentoub stephentoub deleted the iteratorconcat branch February 20, 2025 15:14
@@ -16,6 +16,8 @@ private sealed partial class DistinctIterator<TSource>
public override int GetCount(bool onlyIfCheap) => onlyIfCheap ? -1 : new HashSet<TSource>(_source, _comparer).Count;

public override TSource? TryGetFirst(out bool found) => _source.TryGetFirst(out found);

public override bool Contains(TSource value) => _source.Contains(value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DistinctIterator may have a custom comparer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, thanks. I'd convinced myself it doesn't matter, but it obviously does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants