-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Add the group_by
and group_by_mut
methods to slice
#2477
Conversation
I don't like the naming. group by in SQL and C# work quite differently from what you propose, they map each item to a key and then group all items having the same key. Something like |
I guess, but this is exactly what groupBy does in Haskell, so the name should stay. |
@CodesInChaos the function you describe has the same behavior as GroupWith in Haskell. |
I don't like the "My language is more important than yours" tone of this exchange. That said, from my perspective, a name like Assuming that |
This seems pretty similar to the existing I can imagine some cases where you iterate over a list and you can skip some work if some predicate hasn't changed. But in that case, it seems clearer to me to write it as a single loop with that part of the behavior written out with a |
I don't think that's it. There's a good reason to favor a Haskell based name in the case of iterator methods, functions, and related things because Rust already uses Haskell based naming in such cases. For example: There's also precedent from I also think it's a telling and descriptive name. You are grouping elements by where the predicate applies. (Yes, a split happens every time the predicate doesn't match, so the order matters, but if you sort, then it does not. The minor difference can be cleared up in documentation..)
Grouping is a pretty common operation; Think of
Generally speaking, at least to me, iterator style composition makes data flow clearer :) |
A related by different function in D language: And Python: So what semantics do we prefer? |
@leonardo-m It seems to me that the semantics of D and Python can be recovered from the proposed semantics in this RFC (which is the same as the semantics in the itertools crate) so I would say that the proposed solution is more general. |
To me, it's not even sufficiently clear what the RFC is proposing. The name The reference implementation of Another problem with providing an implementation and no specification is that I have no idea what the guarantees/requirements are for the predicate. Maybe this is a silly question, but am I guaranteed that it gets invoked with (a[0], a[1]) then (a[1], a[2]) and so on in that order exactly once each, or is the implementation allowed to do things like (a[0], a[1]) then (a[0], a[2]) and so on? I assume everyone's assuming the former, but since there isn't a single example or test of a predicate other than equality ( Which is all a really long way of saying I don't think we can properly bikeshed the name quite yet. In particular, I think we need to see some examples of more interesting predicates (I can't come up with any) to figure out whether "groups", "runs" or "splitting" is the least misleading term for what's going on. |
Have a look at the few tests I wrote in my allocation version, here. |
@Ixrec I would expect the behavior to be equivalent to https://docs.rs/itertools/0.7.8/itertools/trait.Itertools.html#method.group_by except that you get a slice as the element of the returned iterator instead of an @phaazon What's the prior art for @Kerollmops Small library additions such as the one proposed in this RFC have historically been accepted with a PR against |
@Centril About
|
|
@Centril Interesting, the itertools method expects an What other use cases does this proposal hope to support by generalizing to a |
(Also, not related to the actual discussion, this is more meta about the commit: @Kerollmops, you seem to have pushed this commit with a company email address. I think you should check twice and possibly rebase with your OSS identity – your commit is also not verified). |
@Ixrec I would provide the following hierarchy:
The two latter ones can be implemented in terms of the first one. I would not use the name EDIT: The use cases for the most general version is for when you have some different notion of equality than the natural one ( |
I think it would be confusing to have |
Yeah, split is absolutely not the verb, even if group is also not the verb (though i think group is the best verb here) |
Ruby calls this chunk — https://ruby-doc.org/core-2.5.1/Enumerable.html#method-i-chunk Of course, that’s already taken. Would it be too confusing if it were
Now... that’s probably not a huge issue, because (in rust :) nobody will accidentally treat the result of group_by as a HashMap? |
It seems like I'm the only one that finds it unintuitive for |
@lxrec - I also find For fun, I asked someone nearby who isn't familiar with any of the languages in question, and they felt pretty strongly that Now, there's already |
I'd go with the (That is, assuming the "contiguous runs" interpretation which would preserve lazy evaluation of the iterator. Otherwise, perhaps something in the vein of |
Actually Remember that The only question left is one of the ones you already said: assuming the first slice starts with (a[0], a[1]), do we keep checking the next element against the start of the slice or against the latest element of the slice? |
Ok, a few interesting alternatives from the thesaurus:
|
I'd go with Aside from having no precedent I'm aware of, it's too focused on process rather than effect. "Snipping" is what you do to accomplish a goal like "trimming" or "splitting" rather than a goal in itself and says nothing about which snipped pieces, if any, will be retained or discarded. (ie. Are you "snipping [something] off" or "snipping [things] apart"?) Therefore, it's neither as intuitive as is ideal when looking through a list of methods with a goal in mind, nor obvious about what it does when looking at a use of it in code. It also generally has a "too informal to fit in with the other terminology" feel to it. |
The motivation here reads to me like "this makes it easier to do what this does", which I don't find persuasive. Why is it common to have this problem, to have the input in exactly the form needed for this proposed implementation, and to not just want the eager @Ixrec You're definitely not alone; I would absolutely expect "group by" to have the relational algebra meaning as well. |
well this works in a no_alloc situation, for one |
I suppose
|
Arguably, these should be called You can almost implement
It doesn't quite work however since |
I think something like this can correctly emulated the current pub fn split<F>(&self, pred: F) -> impl Iterator<Item=&[T]>
where F: FnMut(&T) -> bool
{
self.group_by(|a, _| pred(a))
.enumerate()
.map(|(i, slice)| {
match i {
0 => slice,
_ => slice[1..],
}
})
} EDIT: this code ^ is wrong ! But this is not really the subject here, For the real discussion here is: do we really want to add this method to the standard library ? And I did not understand what you mean with the |
I think our discussion of deriving As I said, it's unfortunate Afaik, you'd need the lifetime so the return can borrow both
In principle, the new |
Apologies about the immediate bike-shedding. In terms of the actual functionality, I've found the corresponding functionality pretty useful in Ruby-land for segmenting (or chunking, or grouping, if you will :) line-based output into logical chunks more easily. It's not a great day when you're stuck parsing the output of some random command line tool, but So, it's been useful for me in situations where sorting beforehand doesn't make sense. |
I think the immediate bikeshedding is a sign that people don't have any larger issues with the proposal... |
Ok so the only blocking thing about this RFC seems to be the name chosen ( @uberjay propose It seems that the only way I can make this RFC move forward is by changing the name from I am not convinced about this renaming, as @mark-i-m says:
So I propose to tag this RFC as |
This comment has been minimized.
This comment has been minimized.
Totally agree -- it'd be great to have this, regardless of what it's named! I've run across a couple times where it would have come in handy over the past couple months, even! |
As the RFC seems to get stuck, I will ping someone from the Library team to take a decision. |
I propose to rename the method Note that the The same pattern can be found on other types like |
Nominating for discussion in the next libs team meeting, to get the process un-stuck after the various bikeshed-painting. :) |
@joshtriplett When does the next libs team meeting occurs ? |
For those interrested, I have made a temporary library that I will not publish to crates.io, providing a temporary workaround to this RFC/PR that has not been merged. It compiles on stable rust, the version that will be merged will use unstable functions to improve performance and clarity (i.e. offset_from). let slice = &[1, 1, 1, 3, 3, 2, 2, 2];
let mut iter = GroupBy::new(slice, |a, b| a == b);
assert_eq!(iter.next(), Some(&[1, 1, 1][..]));
assert_eq!(iter.next(), Some(&[3, 3][..]));
assert_eq!(iter.next(), Some(&[2, 2, 2][..]));
assert_eq!(iter.next(), None); |
Sorry we dropped the ball on this one. We've actually just recently restarted Libs meetings, so now, two years later we can say that for API additions like this we actually don't typically need an RFC and will happily land an unstable implementation. That wasn't widely understood by the wider Rust org at the time though. If anybody would like to resubmit rust-lang/rust#51606 we'll be happy to land this API as unstable. Sorry for leaving this for so long and thank you for putting all this together @Kerollmops 🙇 |
Thank you @KodrAus for your time, I understand that this is very hard to follow all of these PRs and RFCs so don't worry, I created my own library to support much more than just sequential groups (i.e. linear, binary and, exponential group of search). |
The return of the GroupBy and GroupByMut iterators on slice According to rust-lang/rfcs#2477 (comment), I am opening this PR again, this time I implemented it in safe Rust only, it is therefore much easier to read and is completely safe. This PR proposes to add two new methods to the slice, the `group_by` and `group_by_mut`. These two methods provide a way to iterate over non-overlapping sub-slices of a base slice that are separated by the predicate given by the user (e.g. `Partial::eq`, `|a, b| a.abs() < b.abs()`). ```rust let slice = &[1, 1, 1, 3, 3, 2, 2, 2]; let mut iter = slice.group_by(|a, b| a == b); assert_eq!(iter.next(), Some(&[1, 1, 1][..])); assert_eq!(iter.next(), Some(&[3, 3][..])); assert_eq!(iter.next(), Some(&[2, 2, 2][..])); assert_eq!(iter.next(), None); ``` [An RFC](rust-lang/rfcs#2477) was open 2 years ago but wasn't necessary.
It looks like people are discussing two related but different functions in this issue:
|
Would |
This is now being tracked in rust-lang/rust#80552 @CodesInChaos I've captured your concern about naming in that issue. We can continue discussion over there! Thanks everybody who contributed here. |
This RFC propose to add two new methods to the slice, the
group_by
andgroup_by_mut
. These two will provide a way to iterate over non-overlapping sub-slices of a base slice that are separated by the predicate given by the user (e.g.Partial::eq
,|a, b| a < b
).The predicate is called on two elements following themselves, it means the predicate is called on
slice[0]
andslice[1]
then onslice[1]
andslice[2]
...Pending Pull Request
Work around temporary library
Rendered