Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename Iterators.partition #3284

Closed
jariji opened this issue Feb 5, 2023 · 4 comments
Closed

Rename Iterators.partition #3284

jariji opened this issue Feb 5, 2023 · 4 comments
Labels
Milestone

Comments

@jariji
Copy link
Contributor

jariji commented Feb 5, 2023

#3212 adds Iterators.partition for DataFrames.

Splitting a df into SubDataFrames sounds like a useful functionality, but Iterators.partition is supposed to split iterables, and that is not what DataFrame is, so using this function is a pun. I propose partitionrows or something.

@bkamins bkamins added this to the 1.5 milestone Feb 5, 2023
@bkamins
Copy link
Member

bkamins commented Feb 5, 2023

The reason why we call it Iterators.partition is twofold:

  • first to make it clear that we produce Base.Iterators.PartitionIterator object - this is relevant e.g. in Arrow.jl
  • Iterators.partition does not require an iterator, but a collection. By convention data frame is considered a collection of rows (although this collection is not iterable nor indexable, but supports many other functions that support collections like push!, append!, deleteat! etc.)

In summary - given the conventions we use I prefer to keep things as they are now (otherwise we would need to rename all the functions that assume that data frame is collection of rows).

@nalimilan, @quinnj - what do you think

@quinnj
Copy link
Member

quinnj commented Feb 6, 2023

I agree that I think it's fine as-is; it doesn't seem to me to be too bad of a pun to define partition on DataFrame like this; as @bkamins said, even though a DataFrame itself isn't iterable (on purpose, to avoid confusion between iterating rows or columns), it's definitely viewed as a collection, either of rows or columns.

@nalimilan
Copy link
Member

Yes, we generally use the convention that functions operating on rows do not have the "rows" suffix (e.g. deleteat!), but functions operating on columns have the "columns" suffix.

@bkamins
Copy link
Member

bkamins commented Feb 8, 2023

OK - so I am closing this issue and we leave things as they are. Thank you for discussion as it is useful to confirm the design.

@bkamins bkamins closed this as completed Feb 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants