Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it necessary to collect non-linear collections into Vec? #358

Closed
torkleyy opened this issue Jun 8, 2017 · 3 comments
Closed

Is it necessary to collect non-linear collections into Vec? #358

torkleyy opened this issue Jun 8, 2017 · 3 comments

Comments

@torkleyy
Copy link
Contributor

torkleyy commented Jun 8, 2017

My issue is mainly referring to this file.

I'm asking myself if it'd be safe for collections which only modify the element requested, nothing else to be shared mutably with an UnsafeCell. I'm thinking about something like this:

/// Marker trait which guarantees multiple
/// mutable borrows with distinct elements don't conflict in any way.
unsafe trait Distinct {}

struct SuperUnsafe<T> {
    inner: &'UnsafeCell<T>,
}

unsafe impl<T> Send for SuperUnsafe<T> where T: Distinct {}

And then accessing distinct elements from multiple threads.

Is this UB or is there another reason for example the HashMap gets collected into a Vec?

@cuviper
Copy link
Member

cuviper commented Jun 8, 2017

Note that if you're just trying to modify the map, par_iter_mut() does leave the actual items in place. We only collect the (&K, &mut V) references into a vector, and then iterate that in parallel.

The reason we use a vector is more about how to split those collections between threads. We just don't have any access to their internals to manage the kind of random access we need, to say e.g. this thread processes the first chunk of the map, this thread the second chunk, etc. We have no visibility into how the map is laid out at all.

Generally, the only views we have of arbitrary collections are their public iterators. We should someday have a way to parallelize arbitrary iterators (see #46 and #312), but I suspect that will require a lot more synchronization than "native" collection support could. We'll have to compare benchmarks using that Iterator adapter versus the current way of collecting a vector up-front for hash-maps etc., and if it's worth it we can switch their implementation.

I did some experiments with forking a new hash map here: rayon-hash. It's a snapshot of the libstd HashMap and HashSet, but with local access I can add functions to split references to the underlying raw table for separate thread access. The results from the simple benchmark are really nice, but I'm not sure how to properly proceed with this. Maybe I should publish that as a standalone crate and see if it gets any traction.

@torkleyy
Copy link
Contributor Author

torkleyy commented Jun 9, 2017

We just don't have any access to their internals to manage the kind of random access we need

Ah, makes sense. From what you're saying, I'm assuming that it is indeed correct, right? The reason I read the source code was that while reviewing a PR, I wasn't sure if this is correct at all. I wanted to look at other crazes, and rayon happened to be the first one that came into my mind. And given that we have an additional (hierarchical) bit set for the keys, it should work for Specs (a crate which is boosted by this great library!).

Thanks @cuviper!

@torkleyy
Copy link
Contributor Author

torkleyy commented Jun 9, 2017

Thanks to the great support on Gitter, my question could be answered. Closing.

@torkleyy torkleyy closed this as completed Jun 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants