Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IEnumerable should have a extension for creating fixed size chunks #27449

Closed
inputfalken opened this issue Sep 22, 2018 · 21 comments · Fixed by #47965
Closed

IEnumerable should have a extension for creating fixed size chunks #27449

inputfalken opened this issue Sep 22, 2018 · 21 comments · Fixed by #47965
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Linq help wanted [up-for-grabs] Good issue for external contributors
Milestone

Comments

@inputfalken
Copy link
Contributor

inputfalken commented Sep 22, 2018

Background and Motivation

There's currently no method available to create chunks from an IEnumerable<T>. It's a popular functionality as this comment shows.
I believe most projects in need of this functionality have their own extension to IEnumerable<T> and the goal of this proposal is to have System.Linq offer this functionality

Proposed API

namespace System.Linq
{
    public static class Enumerable
    {
+        public static IEnumerable<T[]> Chunk(this IEnumerable<T> source, int size);
    }

    public static class Queryable
    {
+        public static IQueryable<T[]> Chunk(this IQueryable<T> source, int size);
    }
}

Usage Examples

int[] numbers = new int[]{1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9, 10, 11, 12}
IEnumerable<int[]> numberChunks = numbers.Chunk(5) // {[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12]}

Alternative Designs

Another way to create chunks would be to specify the amount of chunks you would like to have from an IEnumerable<T> but I think it would be more desirable to have it as a separate method.
like the sample below.

// Where size determines the split amount.
public static IEnumerable<T[]> SplitBy(this IEnumerable<T> source, int size)
{
    ...
}

int[] numbers = new int[]{1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9, 10}
IEnumerable<int[]> numberChunks = numbers.SplitBy(2) // {[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]}

Risks

This could potentially be a breaking change for users who have their own extension methods due to naming collisions.

@benaadams
Copy link
Member

GroupBy?

@svick
Copy link
Contributor

svick commented Sep 22, 2018

@inputfalken If you want to add a new API to .Net Core, you should follow the API review process. That means not opening a PR before the API is accepted and also including a speclet in this issue.

@inputfalken
Copy link
Contributor Author

@benaadams Maybe this is redundant.

But I think it would be nice to have the option to easily create fixed size sub-sequences by just providing an integer.

@svick
Copy link
Contributor

svick commented Sep 22, 2018

Also note that there already was a proposal for this in the past, which was closed: https://github.com/dotnet/corefx/issues/2146.

@inputfalken
Copy link
Contributor Author

@svick Oh i didn't see that one 😐, should I remove the PR then?

@svick
Copy link
Contributor

svick commented Sep 22, 2018

@inputfalken Yes, I think the PR should be closed for now.

@inputfalken inputfalken changed the title IEnumerable should have a extension for creating chunks IEnumerable should have a extension for creating fixed size chunks Sep 22, 2018
@karelz
Copy link
Member

karelz commented Oct 1, 2018

Is there anything left here or is it just duplicate of #14753?

@svick
Copy link
Contributor

svick commented Oct 1, 2018

@karelz That part of https://github.com/dotnet/corefx/issues/2146 was closed by the question "Is this really common?", with no response given to the people (including me) who think it is common. So maybe its closing could be reconsidered?

(Though that issue is also a mix of unrelated proposals. Keeping this issue open instead might be the better choice.)

@karelz
Copy link
Member

karelz commented Oct 1, 2018

Alright, I let the new area owners to comment hen they get to it during their incomming triage - @cston @333fred (cc @jaredpar)

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@adamsitnik adamsitnik removed the untriaged New issue has not been triaged by the area owner label Sep 2, 2020
@eiriktsarpalis
Copy link
Member

Chunking is a useful feature which we should consider adding to System.Linq. My preferred signature would be the following:

public static class Enumerable
{
     public static IEnumerable<T[]> ChunkBy(this IEnumerable<T> source, int size);
}

F# core already has a corresponding chunkBySize function.

@eiriktsarpalis
Copy link
Member

Revisit for .NET 6?

We could take a look. @inputfalken would you be able to update the original post to match the API review template?

@inputfalken
Copy link
Contributor Author

Revisit for .NET 6?

We could take a look. @inputfalken would you be able to update the original post to match the API review template?

@eiriktsarpalis

@eiriktsarpalis eiriktsarpalis added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Dec 3, 2020
@eiriktsarpalis eiriktsarpalis self-assigned this Dec 3, 2020
@MgSam
Copy link

MgSam commented Jan 21, 2021

I think Chunk is a better name than ChunkBy. Especially if the proposed LINQ *By methods get accepted. Those all take a function, whereas this just accepts an integer size.

@inputfalken
Copy link
Contributor Author

inputfalken commented Jan 25, 2021

I think Chunk is a better name than ChunkBy. Especially if the proposed LINQ *By methods get accepted. Those all take a function, whereas this just accepts an integer size.

Sounds fine to me, it would be unified with the Take method.

@R2D221
Copy link

R2D221 commented Jan 25, 2021

MoreLINQ includes this method named as Batch:
https://github.com/morelinq/MoreLINQ/blob/master/MoreLinq/Batch.cs

@terrajobst terrajobst added api-needs-work API needs work before it is approved, it is NOT ready for implementation and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jan 26, 2021
@terrajobst
Copy link
Member

terrajobst commented Jan 26, 2021

Video

  • Looks good as proposed, but: does this need to be on IQuerable<T>?
namespace System.Linq
{
    public static class Enumerable
    {
        public static IEnumerable<T[]> Chunk(this IEnumerable<T> source, int size);
    }
}

@eiriktsarpalis
Copy link
Member

eiriktsarpalis commented Jan 28, 2021

This should have a queryable counterpart, I've updated the proposed API.

@eiriktsarpalis eiriktsarpalis added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-needs-work API needs work before it is approved, it is NOT ready for implementation labels Jan 28, 2021
@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jan 28, 2021
@terrajobst
Copy link
Member

terrajobst commented Jan 28, 2021

Video

  • Looks good as proposed.
  • Note: Every time we add a member to Enumerable that returns IEnumerable<T> we need to add a corresponding member on Queryable that returns IQueryable<T> (and we do have a test for that)
namespace System.Linq
{
    public static class Enumerable
    {
        public static IEnumerable<T[]> Chunk(this IEnumerable<T> source, int size);
    }
    public static class Queryable
    {
        public static IQueryable<T[]> Chunk(this IQueryable<T> source, int size);
    }
}

@eiriktsarpalis eiriktsarpalis modified the milestones: Future, 6.0.0 Jan 28, 2021
@eiriktsarpalis eiriktsarpalis removed their assignment Jan 28, 2021
@eiriktsarpalis eiriktsarpalis added the help wanted [up-for-grabs] Good issue for external contributors label Jan 28, 2021
@inputfalken
Copy link
Contributor Author

@eiriktsarpalis I would like to grab this if possible. 🙂

Would this be the phase where a pull request is created?

@eiriktsarpalis
Copy link
Member

@inputfalken absolutely, happy to accept PRs for this work ☺️

inputfalken added a commit to inputfalken/runtime that referenced this issue Feb 6, 2021
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Feb 6, 2021
eiriktsarpalis pushed a commit that referenced this issue Feb 10, 2021
* Add System.Linq Chunk extension method

Fix #27449

* Use explicit types instead of type inference

* Seperate inner and outer loop

* Rename parameter maxSize to size

* Add missing license header

* Remove Chunk.SpeedOpt.cs

* Add tests to verify Chunk works after mutations

* Add/remove before getting enumerator in tests

* Test content of chunk method for IQueryable<T>
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Feb 10, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Mar 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Linq help wanted [up-for-grabs] Good issue for external contributors
Projects
None yet
Development

Successfully merging a pull request may close this issue.