-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specialize Contains for Iterators in LINQ #112684
Conversation
It appears that Contains ends up being reasonably common after a series of LINQ operations, whether explicitly at the call site or because a method that returns an enumerable uses LINQ internally and then the call site does Contains. We can optimize Contains for a bunch of operators, just as we can for First/Last. In some cases, we can skip the operator completely, e.g. Contains on a Shuffle or OrderBy is no different from one on the underlying source, in other cases we can optimize by processing the source directly, e.g. a Contains on a Concat can end up doing a Contains on each source, which can in turn pick up vectorized implementations if those individual sources support them. Some of the operators actually already provided Contains implementations as part of implementing IList, and this just makes those implementations accessible. In other cases, new overrides of a new virtual Contains on Iterator are added.
Tagging subscribers to this area: @dotnet/area-system-linq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Overview
This PR specializes and optimizes the Contains implementations for various LINQ iterators and operators (such as Append, Concat, Shuffle, DefaultIfEmpty, OfType, etc.) to improve performance by avoiding unnecessary enumeration and applying direct algorithms.
- Optimizes Contains in iterators and speed-optimized LINQ operators.
- Introduces specialized and overridden Contains methods in multiple files.
- Enhances test coverage with new test cases to verify operator behavior.
Changes
File | Description |
---|---|
src/libraries/System.Linq/tests/ContainsTests.cs | Adds tests covering Contains via various LINQ operations. |
src/libraries/System.Linq/src/System/Linq/Shuffle.SpeedOpt.cs | Specializes and overrides Contains for Shuffle iterators with hypergeometric probability sampling. |
src/libraries/System.Linq/src/System/Linq/DefaultIfEmpty.SpeedOpt.cs | Implements optimized Contains for DefaultIfEmpty. |
src/libraries/System.Linq/src/System/Linq/Concat.SpeedOpt.cs | Adds Contains override for concatenated sequences. |
src/libraries/System.Linq/src/System/Linq/OfType.SpeedOpt.cs | Adds Contains implementation for OfType iterators. |
src/libraries/System.Linq/src/System/Linq/Where.SpeedOpt.cs | Specializes Contains for Where and WhereSelect iterators. |
src/libraries/System.Linq/src/System/Linq/AppendPrepend.SpeedOpt.cs | Implements Contains for Append/Prepend operators. |
src/libraries/System.Linq/src/System/Linq/Union.SpeedOpt.cs | Provides a Contains override that iterates over unioned sequences. |
src/libraries/System.Linq/src/System/Linq/Select.SpeedOpt.cs | Introduces Contains methods for various Select iterators. |
Other files | Update Contains in Reverse, Cast, SelectMany, Contains, Distinct, Iterator and SkipTake iterators for optimization. |
Copilot reviewed 19 out of 19 changed files in this pull request and generated 1 comment.
Tip: If you use Visual Studio Code, you can request a review from Copilot before you push from the "Source Control" tab. Learn more
Co-authored-by: Copilot <[email protected]>
/azp run runtime |
Azure Pipelines successfully started running 1 pipeline(s). |
@@ -16,6 +16,8 @@ private sealed partial class DistinctIterator<TSource> | |||
public override int GetCount(bool onlyIfCheap) => onlyIfCheap ? -1 : new HashSet<TSource>(_source, _comparer).Count; | |||
|
|||
public override TSource? TryGetFirst(out bool found) => _source.TryGetFirst(out found); | |||
|
|||
public override bool Contains(TSource value) => _source.Contains(value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DistinctIterator
may have a custom comparer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, thanks. I'd convinced myself it doesn't matter, but it obviously does.
It appears that Contains ends up being reasonably common after a series of LINQ operations, whether explicitly at the call site or because a method that returns an enumerable uses LINQ internally and then the call site does Contains.
We can optimize Contains for a bunch of operators, just as we can for First/Last. In some cases, we can skip the operator completely, e.g. Contains on a Shuffle or OrderBy is no different from one on the underlying source, in other cases we can optimize by processing the source directly, e.g. a Contains on a Concat can end up doing a Contains on each source, which can in turn pick up vectorized implementations if those individual sources support them. Some of the operators actually already provided Contains implementations as part of implementing IList, and this just makes those implementations accessible. In other cases, new overrides of a new virtual Contains on Iterator are added.