-
Notifications
You must be signed in to change notification settings - Fork 508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add par_extend
, partition
, unzip
, and unzip_into
#326
Conversation
@nikomatsakis In the wiki you said that |
The nightly failures are rust-lang/rust#41479. It does work with |
src/iter/unzip.rs
Outdated
|
||
fn opt_len(&mut self) -> Option<usize> { | ||
self.base.opt_len() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one got me thinking. It's fine to leak the source-iterator's index-ness through opt_len()
here, and it should result in the optimization we want. However, with an eye on moving that mechanism to specialization, I can't actually implement a true IndexedParallelIterator
for this!
I could implement IndexedParallelIterator::drive()
, but not with_producer()
because of the iterator requirement in Producer
. We need to stay in "push" mode here to feed values into multiple consumers, but once we flip to an iterator we're left pulling. And that means we can't support zip
et al. on this -- which I wouldn't expect in a par_extend()
implementation anyway, but with specialization one could encounter that.
Maybe this is a distinction which deserves to split up the traits again. An IndexedParallelIterator
that can drive()
, enumerate()
, etc., and a subset ProducableParallelIterator
that supports with_producer()
, zip()
, etc.
Ugh.
(Incidentally, this "push vs. pull" is basically why Iterator::unzip
requires Extend
instead of FromIterator
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. It's also worth considering whether using specialization is worth it versus the current opt_len()
trick.
There's also |
I merged that common functionality, generalized |
Added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome. I had plumb forgot that we still had a few things unchecked (and that they included such generally useful things as unzip()
and partition()
!). We should open issues for the remainder (which just appears to be scan()
and cycle()
-- scan()
may just not make sense, cycle()
seems achievable though).
src/iter/unzip.rs
Outdated
|
||
|
||
/// Unzip an `IndexedParallelIterator` into two arbitrary `Consumer`s. | ||
pub fn unzip_indexed<I, A, B, CA, CB>(pi: I, left: CA, right: CB) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these intended to be Rayon public APIs? (I'd appreciate a comment if not.)
src/iter/collect/mod.rs
Outdated
@@ -39,6 +40,26 @@ fn special_extend<I, T>(pi: I, len: usize, v: &mut Vec<T>) | |||
collect.complete(); | |||
} | |||
|
|||
/// Unzips the results of the exact iterator into the specified vectors. | |||
pub fn unzip_into<I, A, B>(mut pi: I, left: &mut Vec<A>, right: &mut Vec<B>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these intended to be Rayon public APIs? (I'd appreciate a comment if not.)
src/iter/unzip.rs
Outdated
|
||
fn opt_len(&mut self) -> Option<usize> { | ||
self.base.opt_len() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. It's also worth considering whether using specialization is worth it versus the current opt_len()
trick.
src/iter/partition.rs
Outdated
{ | ||
let mut result = None; | ||
{ | ||
// Now it's time to find the consumer for non-matching items |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clever trick, I will say. I wonder if we can use some internal traits + generics to "package up" this pattern rather than repeating it. It would probably make things less clear though, although it would provide a good place to document what is going on.
src/iter/unzip.rs
Outdated
@@ -1,15 +1,24 @@ | |||
use super::internal::*; | |||
use super::*; | |||
|
|||
trait UnzipOp<T>: Sync { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think perhaps you were a step ahead of me. Good place for comments, though!
src/iter/mod.rs
Outdated
where C: Default + ParallelExtend<Self::Item>, | ||
fn partition<A, B, P>(self, predicate: P) -> (A, B) | ||
where A: Default + ParallelExtend<Self::Item>, | ||
B: Default + ParallelExtend<Self::Item>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay! My only hesitation is that it means that transition to rayon may require more type annotations. e.g., with seq iterators, people can do:
let (x, y): (Vec<_>, _) = iter.partition();
maybe we should add another method for this case? (We could then propose the same method for seq iterators, I suppose). But I don't know what would be a good name.
src/iter/mod.rs
Outdated
/// Partitions and maps the items of a parallel iterator into a pair of | ||
/// arbitrary `ParallelExtend` containers. `Either::Left` items go into | ||
/// the first container, and `Either::Right` items go into the second. | ||
fn partition_map<A, B, P, L, R>(self, predicate: P) -> (A, B) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is meant for parity with itertools, I guess? Makes sense. I was wondering if we should link directly to itertools, so we can use the same Either
type, but it's problematic that it is at only 0.6 I suppose, and it seems a bit silly to add an itertools dependency just for Either
.
(cc @bluss -- as an aside, any reason itertools can't be 1.0?)
So the only real question that came up in my review is whether to have (I have definitely needed two distinct collection types in the past and been annoyed that I can't have it.) |
Yeah, I'm ok with the possible ambiguity, since it would only bite at compile time, not a runtime surprise, and it's simple to correct. I'll add more comments where requested. |
r=me, after rebase |
de19391
to
82ca764
Compare
Sorry for the very late reply 😄. Itertools is working towards 1.0, it doesn't feel very far off. |
That said we don't have a "tracking issue" for it. One reason might be that if std adopts anything from itertools using the same method name, we have a problem. |
ParallelExtend
mirrorsstd::iter::Extend
, with one methodpar_extend
. It works basically likeFromParallelIterator
, except it adds to an existing container instead of creating a new one. In fact, most of theFromParallelIterator
s we implement now just create an empty container and thenpar_extend
it.ParallelIterator::partition
mirrorsIterator::partition
, separating the iterator into twoParallelExtend
containers based on the user'spredicate
function.ParallelIterator::unzip
mirrorsIterator::unzip
, separating an iterator over(A, B)
into twoParallelExtend
containers, one with justA
and the other with justB
.IndexedParallelIterator::unzip_into
is analogous tocollect_into
, writing directly into the target vectors.