Exploration: What would it take to achieve cost free Linq #2482
Replies: 57 comments 38 replies
-
I definitely think that turning LINQ into a zero cost abstraction would be amazing and it is worth investigating how to get there. Though most of the proposed solution is about avoiding heap allocations for temporary objects, even though doing that is already being worked on in CoreCLR (https://github.com/dotnet/coreclr/issues/20253) and that version does not require any language support or using unwieldy deeply nested generic types. Another necessary feature is devirtualization, which is also being worked on (https://github.com/dotnet/coreclr/issues/9908). So, my question is: isn't a better approach to invest more into CoreCLR? I think doing that could in theory achieve all benefits of this proposal, with none of its drawbacks. And it's even better, in that it would improve the performance of all code, not just the code that has been specifically tuned. Though I believe getting CoreCLR there would require a significant amount of work, possibly much more than implementing this proposal. |
Beta Was this translation helpful? Give feedback.
-
@svick However I'm a bit worried by a comment @HaloFour said he heard Brian Goetz made. He said Brian Goetz described JVM escape analysis as a failed feature. Given how much hotspot has invested into stack allocating objects, I think relying on the JIT to do all the work for us may be overly optimistic. |
Beta Was this translation helpful? Give feedback.
-
If that's the case, then I think it would be very valuable to understand what made it fail. Maybe there's a way CLR can avoid making the same mistakes? Or maybe it fails at the code patterns commonly encountered in Java, but it would work fine for LINQ? |
Beta Was this translation helpful? Give feedback.
-
It was a comment he made at a recent conference where he was discussing Java evolution/futures, including Valhalla which seeks to add proper value types (or "inline" types) to the JRE and language. He didn't go into any detail and I didn't think to ask the question. The videos for the conference haven't yet been posted. It would be awesome if the CoreCLR team could implement escape analysis in such a way that idiomatic LINQ would require zero-allocations. |
Beta Was this translation helpful? Give feedback.
-
I see two other potential enablers, but both are runtime-related:
|
Beta Was this translation helpful? Give feedback.
-
Since invoking a lambda/delegate is effectively the same thing as invoking a virtual method (you treat the delegate's |
Beta Was this translation helpful? Give feedback.
-
@erozenfeld |
Beta Was this translation helpful? Give feedback.
-
If you take a look at https://github.com/dotnet/corefx/blob/master/src/System.Linq/src/System/Linq/Select.cs, |
Beta Was this translation helpful? Give feedback.
-
@YairHalberstadt That does make it harder, but I don't think it makes it impossible. For example, if the input is statically known to be |
Beta Was this translation helpful? Give feedback.
-
@svick |
Beta Was this translation helpful? Give feedback.
-
Currently we expose stuff like |
Beta Was this translation helpful? Give feedback.
-
Are you sure? I think it should be possible to allocate different number of bytes from the stack in different branches. And even if that was true, that's just the current implementation. If that implementation prevented optimizations, it could be changed.
I think it could still be possible. For example, if all the possible returned types inherited from |
Beta Was this translation helpful? Give feedback.
-
It is |
Beta Was this translation helpful? Give feedback.
-
@svick Maybe I don't know what devirtualization is, but I understand it as getting rid of vtable lookup. Double inlining means quite literally inlining the code. Of course, it works better with eager evaluation, as generator functions cause syntactic diabetes and can't really be inlined. Here's what Kotlin does: public inline fun <T, R> Array<out T>.map(transform: (T) -> R): List<R> {
return mapTo(ArrayList<R>(size), transform)
}
public inline fun <T, R, C : MutableCollection<in R>> Array<out T>.mapTo(destination: C, transform: (T) -> R): C {
for (item in this)
destination.add(transform(item))
return destination
} So when you write var result = source.map { it * 2 } the compiler actually rewrites it into something like: val tmp = ArrayList<Int>(source.size)
for (item in tmp) {
tmp.add(item * 2)
}
var result = tmp Of course, as C# has no idea what's inside |
Beta Was this translation helpful? Give feedback.
-
Some thoughts: Even if not specifically for LINQ, I think a way to do allocationless Func<> lambdas is a fantastic feature that will enable more JIT-time optimization, the likes of which we see in C++. I like this LINQ focus a lot, though. Not just due to the changes outlined here, but because turning LINQ from a convenience feature into a performance feature has great ramifications down the road: other optimizations that we've thought of but decided not to do (ordered enumerables and merge joins, etc.) suddenly become more worthwhile. If we could make this happen for standard LINQ automatically, without any special syntax changes, it would be a killer feature. Otherwise it is still a useful feature but maybe not immediately useful enough to be part of corefx. If we could make it happen for sub-queries, e.g. seamlessly slip in and out of IEnumerable and allocationless depending on what overloads exist, it would be even better. This would provide compatibility with all the existing code out there. |
Beta Was this translation helpful? Give feedback.
-
What do you mean by "statically compiled". I'm not sure what it means here, or what relationship it would have to do with delegates not being allocated in the heap. |
Beta Was this translation helpful? Give feedback.
-
Statically compiled means the compiler would rewrite your delegate variable into a function as if it was that way to begin with. Let's say we have this delegate declaration:
And you have this somewhere in our code:
Static compilation would rewrite that into:
No runtime allocation. |
Beta Was this translation helpful? Give feedback.
-
No. What gets generated here is more like: static IntReturnInt s_cachedLambda = new IntReturnInt(actual_function);
...
var result = s_cachedLambda(someInt); As to what the runtime does here, you'd have to check with it. It's potentially possible it could attempt to optimize this further. Note: none of this really has to do with 'static compilation'. there are dynamic languages/compilers that will perform the above optimzations, and there are static languages/compilers that will not. |
Beta Was this translation helpful? Give feedback.
-
Then you could just say .NET doesn't have static compilation. |
Beta Was this translation helpful? Give feedback.
-
I think this is a matter of you using terms in way that makes sense to you, but doesn't follow normal nomenclature here. Regardless, this tangent doesn't seem to be going anywhere. if you have further questions you might want to consider taking them to our gitter.im/dotnet/csharplang channel, or discord.gg/csharp (#roslyn channel). |
Beta Was this translation helpful? Give feedback.
-
It does compile the lambda to a method, but it then needs a delegate instance pointing to that method, which is allocated at runtime. I recommend playing around with https://sharplab.io to get a feel for how the compiler works. |
Beta Was this translation helpful? Give feedback.
-
NOTE No 2: Hmmm. This message is becoming quite overloaded... Anyway, now I'm just confused (read previous NOTE below), because I have created a second test which follows a Select().Where().Max() pattern and ValueLinq is running in half (!) the time of the handcoded version. I'm assuming I have done something stupid, but otherwise I just need someone who can analyse assembly to help me understand. Anyway, this will be the last additional edit in this comment; from now on I'll document things where they should be, over at the ValueLinq repoNOTE: The code mentioned here has now been pushed up to the here. I did find an error with the hand-coded version (it was doing some additional casting between int and doubles) which has been rectified. This meant that the baseline was slower than it should of been. Hence ValueLinq mentioned here was not ~10% slower, but rather ~100% slower. This is still very good (really!), but obviously a bit of a disappointment. What's the old saying? Don't count your chickens before they hatch! Benchmark.net results hereOK, I'm getting a little in front of myself in that what I have here isn't in my repo yet (but I'll give it a little scrub and have it up by this weekend), but I'm somewhat excited by progress and thought I would share my results at this current stage... Anyway, I'm taking a crack at writing a value-based Linq which you can peruse here. (There is some precedent here and here), but I'm approaching it somewhat differently, merging in some of my ideas that I explored with a previous Linq implementation here (boy, what a glutton for punishment writing two versions of Linq!!!) Anyway, my goal is that it should be (almost) as simple as changing from But, as an extension to the "standard" Linq functionality I have being playing a bit with some value-type based pure functions (the ones I'm using here lifed from @reegeek's StructLinq) in order to test how close we can get to matching straight code. So let's start with that: [Benchmark]
public double Flatout()
{
var total = 0.0;
foreach (var x in _ints)
{
if ((x & 1) == 0)
total += x * 2;
}
return total;
} OK; Pretty simple, the linq transform of this is: [Benchmark(Baseline = true)]
public double Linq() =>
_ints
.Where(x => (x & 1) == 0)
.Select(x => x * 2)
.Sum(); And, the Cistern.ValueLinq looks exactly the same: [Benchmark]
public double CisternValueLinq_normal() =>
_ints
.Where(x => (x & 1) == 0)
.Select(x => x * 2)
.Sum(); Now [Benchmark]
public double Flatout_cast_to_array()
{
var total = 0.0;
foreach (var x in (int[])_ints)
{
if ((x & 1) == 0)
total += x * 2;
}
return total;
} OK, now we have a version which which uses the value-type structs to perform the actions. These are pure functions. They could copy some state around, but as per discussions above and in other forums this can lead to some perverse outcomes. Anyway, lets show them: struct DoubleAnInt : IFunc<int, int> { public int Invoke(int t) => t * 2; }
struct FilterEvenInts : IFunc<int, bool> { public bool Invoke(int t) => (t & 1) == 0; } And then we can use them in my new Linq as follows public double CisternValueLinq_struct() =>
_ints
.Where(new FilterEvenInts()) // ug, sugar please
.Select(new DoubleAnInt(), default(int)) // ug, sugar please + better type inference...
.Sum(); Which once again touches on the type inference issue that has been mentioned in the preceding messages. Now, the following doesn't exist, but I'm imagining you could have an additional syntax something like the following, where a >=> would create a value type IFunc, and allow a layer of type-inference to do some magic. But hey, one can but dream:
And finally a "nothing up my sleave" version that seemly switches between the value type representation and the [Benchmark]
public double CisternValueLinq_struct_nothing_up_my_sleve()
{
IEnumerable<int> collection = GetCollection();
IEnumerable<int> withWhere = AddWhere(collection);
IEnumerable<int> andSelect = AddSelect(withWhere);
var result = andSelect.Sum();
return result;
IEnumerable<int> GetCollection() => _ints;
IEnumerable<int> AddWhere(IEnumerable<int> stuff) => stuff.Where(new FilterEvenInts());
IEnumerable<int> AddSelect(IEnumerable<int> stuff) => stuff.Select(new DoubleAnInt(), default(int));
} So now the drum-roll for how this runs!
Some comments:
So what's the catch? Well, this thing is generics heavy. I mean really heavy. So you're going to have a longer JIT compilation phase, as well as code bloat. Anyway, I'll polish up what I've got, so you can run it for yourselves (if you don't believe me :-P) But I'm tired and I've got some real work to do. |
Beta Was this translation helpful? Give feedback.
-
This would be a great feature to have. The most convenient way from the user side is to change from using delegates to using interfaces, and ask the compiler to generate a struct from the lambda expression, implementing that interface. In that case, I think it is related to some extent with those proposals of anonymous type (or local type) declaration, for example, #4301 (comment). |
Beta Was this translation helpful? Give feedback.
-
I think this would benefit from escape analysis. See dotnet/runtime#11192 |
Beta Was this translation helpful? Give feedback.
-
Another thought: instead of creating a Shape (the void InvokeAction<A>(A a) where A : Action => a();
InvokeAction(() => Console.WriteLine("From inside lambda")); |
Beta Was this translation helpful? Give feedback.
-
I feel Kotlin-like approach very necessary. Recently CoreLib removed a bunch of usages of Linq to remove the dependency of System.Linq assembly in some workloads. If inlined, it can be used in such scenarios again. |
Beta Was this translation helpful? Give feedback.
-
It feels like this could almost be achieved via source generators, now that those have been released 🙂 |
Beta Was this translation helpful? Give feedback.
-
The original LINQ preview compiler could inline the LINQ queries if they were created and used inside a foreach statement, creating no lambdas or query objects. The originating collection might still allocate an enumerator though if it was not one of the simple types like array or list. We decided not to go with this feature in the actual release. |
Beta Was this translation helpful? Give feedback.
-
As I see it there are two problems with linq today:
I think solving (1) first is the way to go and this is the best path I can see for this to move forward:
This would get us everything we need except allocating delegates on the stack and enumerating a linq expression should have equivalent performance to constructing a
By having a generic type, it should be possible to avoid the allocation entirely.
This is an implementation detail of Why existential types?I do not think the ergonomics of types with all these additional generic parameters are good enough to ship. Also, they are really just an implementation detail. Existential types allow us to pass this important information to the JIT compiler without asking the user to see it everywhere. runtime / C# catch-22We currently have the problem that existential types are not a priority for the language design team right now. Part of that is because Shapes/Roles needs more investigation, but the other part is because there is no point in implementing existential types unless the runtime thinks they can actually consume them. It's unclear to me what kinds of breaks the runtime team is willing to accept on DelegatesThis would not help with display class allocation, but as you point out in the original post that is a much thornier issue. If you return the enumerable then a display class must be allocated. It would be very odd to tell someone that this is not allowed: public static IEnumerable<T> GetAnnotatedNodes<T>(this SyntaxNode node, SyntaxAnnotation syntaxAnnotation) where T : SyntaxNode
=> node.GetAnnotatedNodesAndTokens(syntaxAnnotation).Select(static n => n.AsNode()).OfType<T>(); However, for cases where the delegate does not escape the function you can use local functions to avoid and allocation: var tokenMap = pairs
.GroupBy(GetSyntaxNode, GetSyntaxAnnotation).ToDictionary(GetKey, GetAnnotations);
return root.ReplaceNodes(tokenMap.Keys, (o, n) => o.WithAdditionalAnnotations(tokenMap[o]));
static SyntaxNode GetSyntaxNode(Tuple<SyntaxNode, SyntaxAnnotation> pair) => pair.Item1;
static SyntaxAnnotation GetSyntaxAnnotation(Tuple<SyntaxNode, SyntaxAnnotation> pair) => pair.Item2;
static SyntaxNode GetKey(IGrouping<SyntaxNode,SyntaxAnnotation> group) => group.Key;
static SyntaxAnnotation[] GetAnnotations(IGrouping<SyntaxNode,SyntaxAnnotation> group) => group.ToArray(); Considering that you can use What can you do about it?Well dear reader, benchmarking would be very helpful. All of these changes can be done today, just with ugly interface declarations. If folks share the performance improvements they see in their codebases with a change like this it would be great evidence that this needs to be prioritized. |
Beta Was this translation helpful? Give feedback.
-
Realistically, Roslyn could simply port/adopt/merge https://github.com/dubiousconst282/DistIL and finally step away from the policy of almost never optimizing IL. Roslyn has sufficient knowledge to unroll, inline and otherwise lower many, many LINQ forms by doing even a simple form of escape analysis (LINQ consumed in place? can be completely optimized away). The performance benefits this would bring are significant and will enable no-compromise terse syntax making C# competitive with newer languages which have zero/low-cost compile-time checked iterator expressions. |
Beta Was this translation helpful? Give feedback.
-
Introduction
rust and C++ are languages which attempt to provide cost free abstractions. This has two meanings:
C# does not attempt to do either of these, but instead tries to make sure that the value of any abstraction justifies it's costs, and that these costs are kept to a minimum. Unfortunately this can mean that some of C#s features are best avoided in high performance scenarios.
One example of such a feature is Linq, one of C#'s 'killer features'.
In many (the vast majority of?) cases a Linq query is consumed immediately, and could in theory be converted to an equavelent foreach loop. Of the remainder, it is very common for the query not to escape the method in which it was declared.
For example
Is equavelent to
However there are a number of associated costs with the Linq version.
As a result a number of projects (such as Roslyn) ban Linq in any places where performance is necessary.
The aim here is to discuss how these costs might be mitigated, and what language features might be necessary.
This is not a feature request, or a proposal. It's merely the beginnings of an investigation into how a number of disparate language features could come together to take C# into avenues which are currently impossible. Perhaps it will inform C#s long term road plan, perhaps not, but I've found it interesting to explore nonetheless, and I hope you will too.
Avoiding allocating the enumerator, and avoiding virtual dispatch.
Firstly we add the following interface:
Secondly we change the signature of Select to this:
Full Code is avaliable on Sharplab.
Since, the CLR is guaranteed not to box structs if used as type parameters, if other Linq methods are similarly defined, the methods will all accept struct enumerables, get struct enumerators, and return struct enumerables. This means that no enumerables or enumerators will need to be allocated. Also the JIT should devirtualise all these methods. In theory, given a powerful enough JIT, the generated machine code for iterating over the collections should be as efficient as if it had been directly hand written (not including the costs associated with the lambda). See here for an example of the generated asm, and what the equivalent would look like if it were hand written.
Now this signature looks utterly hideous, but to the consumer they should ideally never need to worry about this, due to type inference and type parameter inference. Unfortunately #741 will have to be implemented first.
The producer will have to work with these hideous signatures, but with a little practice this isn't so hard, and as it's only library writers that will have to so, so this is less problematic. I do have some ideas for using something akin to rust's impl trait to make things significantly easier for those library writers. Once the ideas are more fully formed I may write them up.
There are however 3 pain points that the consumer will have to deal with.
In order to use this with a non-struct enumerable, you would have to create a struct enumerable wrapper around it.
No enumerables currently implement the IEnumerable<T, TEnumerator> interface.
If a single type parameter can't be inferred you would have to write all of them, even if Improved overload inference from generic type parameters/constraints #741 were implemented.
The first and second problems could be solved using shapes/extension everything.
The third problem can be solved through partial type inference, for which a championed proposal already exists: #1349.
avoiding allocating the lambda and the display class.
The first step to avoid allocating the display class, is obviously to make it a struct. The problem with this is that when passing the struct to Select a copy is made, meaning that if the lambda mutates any locals the effect will not be seen outside the lambda.
We could pass the struct by ref, but then we wouldn't be able to store the struct by ref on the returned SelectEnumerator.
What we could however do is make the display class a ref struct. Then just like a Span can store a ref through the ByReference JIT intrinsic, the display class would store the locals by ref using ByReference. The compiler already knows how to make sure a stackalloced Span is never returned from the method it is declared in, and it would similarly make sure the display class is never returned from the method it is returned in.
Since the display class is a ref struct, the returned enumerable from Select has to be a ref struct, in order to store display classes. This means it can't implement an interface. However we've already discussed above how we could replace IEnumerable with the shape SEnumerable<T, TEnumerator>, and return structs, so this should not be a problem.
We should create the following shape:
We then change the signature of Select to:
And introduce the following conversion from a lambda to an SFunc:
This leaves us with a fully allocation free Select statement, which in theory could be reduced by a powerful enough JITter to almost exactly the same code as we would have coded ourselves.
Issues
This is because ref structs cannot be stored on the heap, and we can't guarantee how a type parameter is used.
I would suggest adding a new constraint
ref struct
. A type parameter with this constraint must be treated as if it were a ref struct (i.e. cannot be put on the heap). This should allow any type, including both ref structs and normal structs, to be used as a type argument for that parameter. Note that this is opposite to normal constraints, which decrease the number of types that can be used as type arguments, this constraint increases the number of types that can be used.This has been discussed already at #1148
I don't know the best way to deal with this. Exposing it publicly as an unspeakable type might be considered. However this would allow anyone to write unsafe IL without needing permission to run unsafe code. I imagine someone from the runtime could shed more light on this.
The compiler would have to prevent this, as doing so would be unsafe, as the Params struct would no longer exist, and so we would be storing a ref to essentially random memory. The compiler already knows how to do this for a stack alloced span, so this should not be tricky to implement. However it does somewhat limit the number of places this cost-free linq can be used.
Like everything, this can be solved with another layer of indirection (by implementing a sort of COW, but one which updates all references to point to the copy rather than the original).
Essentially we would store locals in a Params struct, and then store a reference to that on the stack. We only ever access locals through that reference. We then store a reference to that reference in the display class. We can then box the underlying Params struct when necessary, and update the ref to that struct so that it points to the boxed struct, rather than the original. Since all access to the Params struct was through that reference, everything which had access to the original Params struct, now automatically accesses the boxed version. Now all our ref structs can safely be replaced with normal struct alternatives, since they are storing refs to items on the heap, not the stack.
This might sound confusing, and I would like to show you the code for it. However C# has no way of even writing this concept right now, so I won't bother as it would only be confusing. I believe IL should be capable of this, possibly with the addition of a couple of runtime helpers. The most difficult bit is that we end up storing internal pointers on the heap, which may adversely impact garbage collection performance. In theory though, since they will only ever point to a boxed struct, the GC could learn to recognise this scenario, and optimise for it.
Instead I will show you what it will look like to the consumer:
Whether all this is worth it is a different question. I think the most sensible approach is to say that if you need to heap allocate your captured locals you should use ordinary Linq.
Summary and Conclusion
In this proposal we looked at what it would take to create an allocation free, virtual method call free, fully inlinable version of Linq.
We came up with what I would consider to be a pretty workable proposal, at least for the consumer of the library. As I said, I do have some other ideas about making life easier for the library writer.
Let's list the language and runtime changes that would be required to make this all work:
Championed proposals already exist for 1, 2 and 5 and I think all seem reasonably likely to happen, given enough resources. 6 is also easily enough doable once all the other parts are in place.
The trickiest parts of this are probably 3 and 4. They allow for the creation of allocation free function types which capture variables.
Is it all worth it?
Linq is extremely well established, and is used across the vast majority of C# codebases.
This proposal seeks to create an alternative to Linq which is trickier to use, and more limited in general. It's only advantage is it's improved performance. That's a tough sell...
Most C# projects don't require a level of performance that makes this tradeoff worthwhile. As such an alloc free Linq library would probably be provided as a seperate nuget package, rather than be included in .Net Standard.
Is it worth making so many language changes for such a niche feature?
To that I will make the same arguments as were made for Span. Most people will never have to use an alloc free Linq library. But everyone will use libraries that do make use of this.
That argument isn't as powerful here, since Span allowed for writing code that previously just could not be written, whereas this only allows for the simplification of code which is currently perfectly writable.
However allocation free function types would allow for writing code which currently cannot be written, or at least not without significant pain. For example there are many cases where Roslyn would benefit from using generic visitors which take a func, but at the moment it hand crafts each visitor to avoid allocating the func. Even when the overhead of using a func is considered worth it, great pains are taken to make sure the function does not capture any locals, and instead all locals are passed in as parameters. For an example, take a look at the
VisitTypeSymbol
extension method.Once allocation free function types exist, having Linq make use of them is a much smaller task.
So here's the questions I would be interested if people could answer:
Beta Was this translation helpful? Give feedback.
All reactions