-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueEnumerables (fast to code and run) #974
Comments
That's a lot of public-facing voodoo which would require very explicit specification and deterministic generation. I certainly don't like that |
The Jit elides |
What is the reason of splitting state into |
Internalized the state and added IDisposable |
I wish that all small enumerators could fit inside stack so even linq would work on them without making garbage, and without needing to define a new syntax. |
Need to describe what happens when you have many of these. for example: public enumerable Skip(int count) ...
public enumerable Take(int count) ... There will need to be two named public structs in this type. How are they named? How do you add more enumerable methods to the class without changing the names of the existing enumerables. Remember that doing so would be a binary breaking change to people who have already compiled against you. One way to handle this (best syntax tbd) would be to allow the signature to specify explicitly what the name of the nested type would be (with Enumerator) being a fairly sound default given the patterns in the BCL. So this would be something like: public enumerable(SkipEnumerator) Skip(int count) ...
public enumerable(TakeEnumerator) Take(int count) ... |
Another option might be something like: public struct SkipEnumerator Skip(int count) ...
public struct TakeEnumerator Take(int count) ... Here you are effectively saying "this returns an instance of the type 'struct SkipEnumrator' which the compiler will then fill in the implementation for itself". The downside here is that this syntax might be fairly confusing as we're combining (intentionally) the syntax for declaring a struct and for declaring a method. |
📝 I've been thinking about this a bunch recently. Not to the point of knowing how to implement it, but enough to know it would be a valuable addition. |
It will be definitely valuable - have a look at these two hand-written examples that I've written just this week: https://github.com/Microsoft/msbuild/pull/2586/files#diff-4a6b19dd618716c35ef73124a3cf2893, https://github.com/Microsoft/msbuild/pull/2577/files#diff-e099cf962d6056d4357dcdc015b8725f. The second I was less boilerplate because I avoided implementing the interfaces. |
:-/ Breaks down when trying to write Linq extensions public static partial class Enumerable
{
public static int Count<TSource, TEnumerable, TEnumerator>(this TEnumerable source)
where TEnumerable : struct, IValueEnumerable<TSource, TEnumerator>
where TEnumerator : struct, IValueEnumerator<TSource>
{
int count = 0;
foreach (var _ in source)
{
checked
{
count++;
}
}
return count;
}
} var list = new List<int>();
list.Count();
|
Stripping it back one level doesn't help either 😢 public static int Count<TSource, TEnumerator>(this TEnumerator source)
where TEnumerator : struct, IValueEnumerator<TSource>
{
Any ideas on a pattern that might auto-infer (but also remain struct based) |
If we did this, i think we'd have to improve type inference here. Note that that's something that is already being looked at due to the work around type-classes/shapes. i.e. the current type-classes/shapes proposals depend on this cute "pass two structs along" approach as well. So ensuring the language can properly figure this out without users having to provide all the types is definitely something that needs to happen. |
I've talked with @MadsTorgersen about this a lot in the past. The implementation turns out to not be a problem. The main problem is really how is this surfaced to users in a comprehensible and non painful manner. Really, all the compiler needs to know is:
|
This was exactly my concern on Twitter. I agree that the name should not be inferred. Coming up with good syntax is hard, but the name should be explicit in source code somehow.
This is not the only problem with using the method name as the type name. Consider overloads. |
K, may not be an insurmountable problem then; have added to questions section.
Parameter collisions as well public enumerable Skip(int count) ...
public enumerable Skip(long count) ...
public enumerable Take(int count) ...
public enumerable Take(long count) ... How about By default; when no name collisions public enumerable Skip(int count) ...
public enumerable Take(int count) ... Name is inferred as public SkipEnumerable Skip(int count) ... // And SkipEnumerator
public TakeEnumerable Skip(int count) ... // And TakeEnumerator When type with same name defined; or same name with multiple signatures - error
With generic type syntax? // Defined type
public struct SkipEnumerable {}
// Or parameter collision
public enumerable<SkipEnumerableInt, SkipEnumeratorInt> Skip(int count) ...
public enumerable<SkipEnumerableLong, SkipEnumeratorLong> Skip(int count) ... Hopefully would be an advanced use and uncommon? |
The element type should also be explicit, not inferred from yield returns. This is a public contract and I want to see clearly in code review when it is changed. |
It would always be inferred as public SkipEnumerable Skip(int count) ... // And SkipEnumerator To make the common case easy. Any name collisions would require it to be explicitly specified e.g. public enumerable<SkipEnumerableInt, SkipEnumeratorInt> Skip(int count) So would show up in a code review/diff? |
@benaadams I think he means that there is nothing in the signature stating what the type of the actual elements are that are returned. i.e. today you see |
Ahhh... hmm... So maybe generic for type, and valuetuple style for name collisions? |
No collision public enumerable<T> Skip(int count) ...
public enumerable<T> Take(int count) ... Collisions public enumerable<T>(SkipEnumerableInt, SkipEnumeratorInt) Skip(int count) ...
public enumerable<T>(SkipEnumerableInt, SkipEnumeratorInt) Skip(long count) ... or public enumerable<T> Skip(int count) ...
public enumerable<T>(SkipEnumerableInt, SkipEnumeratorInt) Skip(long count) ... As you were suggesting earlier? |
One thing that could be done would be to only supply the name of the Enumerable. The Enumerator could alwyas be a given a well known name inside of that. So, for example, we could have something like: public struct SkipEnumerable<int> Skip(int count) This would produce a nested type "SkipEnumerable", with a nested type inside of that always called "Enumerator". This would prevent the need to have to name the enumerator, and there would never be a collision problem. |
Try the operator syntax: public struct int enumerable Skip(int count); |
the reason i like |
Another possibility: public SkipEnumerable<int> Skip(int count) struct {
} or possibly: public value SkipEnumerable<int> Skip(int count) {
} As 'value' is already a contextual keyword. This would help as "public struct X" coudl read as if you're declaring the struct right there for lots of people. |
@davkean Still needs the name of the type being created though. |
Would So if someone used a rubbish name; it may be confusing what its doing from the signature and it doesn't immediately explain why you can start yielding in the method public struct MyName<int> Skip(int count) {
public MyName<int> Skip(int count) struct {
public value MyName<int> Skip(int count) { Also suggests that Typed public enumerable<T> SkipEnumerable Skip(int count)
public enumerable<T> MyName Skip(long count) Pass-through drops the modifier (as async and Task); but keeps the type public SkipEnumerable SkipTen(int count) => Skip(count: 10); |
Updated proposal based on feedback @CyrusNajmabadi do the changes address 1, 2 & 3 in #974 (comment)? |
A method modifier telling IDE/consumer whether a method is lazy enumerable or not would be cool. I am kind of tired of juggling |
I'm liking the direction, but I'm now finding it hard to read the distinctions between "class-level" and "method-level". Ditto for the syntactic difference between implementing GetEnumerator() and GetValueEnumerator() with an iterator. Today, an iterator can return either Example: public class Sequence<T> : IValueEnumerable<T, Sequence<T>.Enumerator>
{
...
public enumerator<T> Enumerator GetValueEnumerator()
{
foreach (T t in _items)
yield return t;
}
public enumerable<T> Span Slice(int start, int length)
{
for (int i = start, i < items.Length - length; i++)
yield return _items[i];
}
public Span Slice(int start) => Slice(start, _items.Length - start);
} The programmer would have to declare that they implement IValueEnumerable just like IEnumerable. I consider that to be desirable. Again, I want to see the public contract when reading code. |
Overall really like the direction. Some small items to think about
|
TIL this works IEnumerable<int> Enumerable()
{
yield return 1;
}
IEnumerator<int> Enumerator()
{
yield return 1;
} Though you can only void Test()
{
foreach (var i in Enumerable())
{ }
// Error
foreach (var i in Enumerator())
{ }
} |
@benaadams What would this proposal look like if it was focused instead on |
wouldn't it make sense to define this as a "shape/trait/concept" so it would be struct all the the way? |
public static int Count<TSource, TEnumerator>(this TEnumerator source)
where TEnumerator : struct, IValueEnumerator<TSource>
{
... however, don't see why Then Value Linq, |
@sharwell interesting :) I think the |
Partially works better public static class Iterator
{
// Can be inferred
public static void ForEach<TValueEnumerator, TSource>(this TValueEnumerator enumerator, Action<TSource> action)
where TValueEnumerator : struct, IValueEnumerator<TSource>
{
var current = enumerator.TryGetNext(out bool success);
while (success)
{
action(current);
current = enumerator.TryGetNext(out success);
}
}
// Can't be inferred
public static int Count<TValueEnumerator, TSource>(this TValueEnumerator enumerator)
where TValueEnumerator : struct, IValueEnumerator<TSource>
{
int count = 0;
enumerator.TryGetNext(out bool success);
while (success)
{
count++;
enumerator.TryGetNext(out success);
}
return count;
}
// Can be inferred, but unneeded extra param
public static int Count<TValueEnumerator, TSource>(this TValueEnumerator enumerator, TSource _)
where TValueEnumerator : struct, IValueEnumerator<TSource>
{
int count = 0;
enumerator.TryGetNext(out bool success);
while (success)
{
count++;
enumerator.TryGetNext(out success);
}
return count;
}
}
class Program
{
void Main()
{
var list = new List<int>();
// Works
list.GetValueEnumerator().ForEach((int i) => Console.WriteLine(i));
// The type arguments for method
// 'Iterator.Count<TValueEnumerator, TSource>(TValueEnumerator)'
// cannot be inferred from the usage.
// Try specifying the type arguments explicitly
list.GetValueEnumerator().Count();
// Works
int _ = 0;
list.GetValueEnumerator().Count(_);
}
} But not quite; but the proposal is simpler - writing it up |
Fyi, work has started on stack-allocated objects. dotnet/coreclr#20251 |
Still problematic to penetrate through 2 levels of interfaces (IEnumerable -> IEnumerator) and convert explicitly shared generic to value generics? |
It may be worth bringing up. |
Motivation
Its fairly convoluted to add an non-allocating struct enumerator to a class; and yield iterators which have a simpler syntax are allocating and also don't work as a class-level struct enumerable.
Related:
System.Linq
is a wonderful feature, however it also allocates for all the IEnumerables; so it would be desirable to find a solution that supports a non-allocating struct-basedLinq
or Value LinqBackground
Given a common or garden List class
Adding an indexer is fairly straight forward using the
this[]
property, and it would be good to have a similar ease of use for an enumerator; that is also non-allocating for yield enumerators.Inspired by @davkean's twitter conversion on the verboseness of enumerators, @jaredpar's Rethinking IEnumerable and Immutable's IStrongEnumerable
As well as @nguerrera call to action that 140 chars was too small to convey a design
Proposal
Contract
Contextual keyword
New method_modifier to specify the method is a ValueEnumerable which is genericly typed
enumerable<>
Used with a member_name it is a struct-based iterator and works with
yield
Used without a member_name it is a struct-based class-level enumerable and works with
yield
Usage
Class-Level Iterator
public enumerable<T> StructTypeName()
Convention:
public enumerable<T> ValueEnumerable()
Change to iterator Stack sample to utilize new fast enumerator
The type name of the Enumerator is part of the method_header declaration so user is at liberty to define it as
Method Iterator
public enumerable<T> StructTypeName MethodName()
Convention:
public enumerable<T> MethodNameEnumerable MethodName()
Used as return type from a method it is a method iterator and works with
yield
The type name of the Enumerator is part of the method_header declaration so user is at liberty to define it as
Pass-through Iterator
They should also be able to be pass-through chained without generating another enumerable type. The class enumerable will have to re-specify
enumerable<>
to identify the method; a method to method will not:As the class-level enumerable; needs to refers to the Enumerator subtype of the Enumerable a simpler inferred type pass-through syntax will be allowed:
Method Iterator pass-through is a simple method mapping
Code-generation
Class interfaces
Using the class level enumerator will automatically implement the
IValueEnumerable
interface on the class as well asIEnumerable<T>
if not already implementedClass-level enumerator
Example code for the above class enumerator that the compiler could generate
Class-level enumerator Compatibility/Interop
If
IEnumerator<T>
was not previously defined on the class so the compiler added it; it would also generate an adapter. Example code for the generated code:GetEnumerator/IEnumerable interop
With a common
EnumeratorAdapter
shared for allIValueEnumerator
sMethod iterator
Pass-through enumerator
The should return the called type
Already defined error if both type-inferred and type-specified class-level iterator
Consuming the Enumerable
foreach
will bind toGetValueEnumerator
in preference toGetEnumerator
if available and usage as now:Code-generation
Questions
Type inference
Linq style extensions (also see #974 (comment))
When used
Will error with
Value Linq
Overload preference, struct generic to be preferred over interface extensions e.g.
Non-boxing
Preferred over cast to interface (that will eventually box)
Return vs out
Return
T
or returnbool
?@jaredpar mentions on twitter
e.g.
vs
Covariance
I said fast right 😉 Though open question...
/cc @JonHanna for thoughts on Value Linq use
The text was updated successfully, but these errors were encountered: