Meta Issue: Support for parallelized/blocked algorithms #89

kernelmachine · 2016-02-26T19:54:50Z

What are your thoughts on implementing something similar to http://dask.pydata.org/en/latest/ on top of ndarrays? I suspect parallelized computations on submatrices should be pretty natural to do in the Rust framework, and it seems you've already created sub-array view functions. Do you agree?

(Community Edits below)

Actionable sub issues:

Send/Sync splittable array views are already present
Implement Rayon parallel iterator traits for Axis Iter Implement Rayon parallel iterator traits for Axis Iter #248 Implement parallelization using rayon #252
Implement Rayon parallel iterator traits for element iterators / for Array itself Implement parallelization using rayon #252
alternative methods of collecting an axis_iter to ndarray matrix alternative methods of collecting an axis_iter to ndarray matrix #249
Parallel Iter for AxisChunksIter
Parallel support for Array::map_inplace Rayon parallelization for Zip / azip! #288
Parallel support for Array::map -> Array
Parallel lock step function application (Zip) Rayon parallelization for Zip / azip! #288

bluss · 2016-02-26T21:23:08Z

The goal is absolutely to be able to support a project like that. Iterators already provide chunking in inner_iter, outer_iter, axis_iter, axis_chunks_iter and their mut counterparts. We also want to add just a few more split_at like interfaces to support easy chunking like this.

bluss · 2016-02-26T21:24:02Z

Integrating with https://github.com/nikomatsakis/rayon would be pretty exciting too.

kernelmachine · 2016-02-26T21:39:47Z

Yup, that's exactly my thought! Would love to work on this, if you're
interested in collaborating.

On Fri, Feb 26, 2016 at 4:24 PM bluss [email protected] wrote:

Integrating with https://github.com/nikomatsakis/rayon would be pretty
exciting too.

—
Reply to this email directly or view it on GitHub
#89 (comment).

kernelmachine · 2016-02-26T21:52:27Z

Also on the subject of integrations, I've been writing a crate that wraps Lapack/BLAS with high level, easy to use functions, inspired by the hmatrix library in Haskell. Focus is on compile-time, descriptive error checking, enumerated matrix types, and an easy interface. I wrote my own (simple) matrix representation for the project, but it actually seems way better to build the crate on top of ndarray.

How actively are you working on the BLAS integration that I see on the docs? Would love to exchange notes.

bluss · 2016-02-26T21:55:06Z

Not very actively, but it's the thing I must solve now. Not sure if ndarray wants to continue with rblas or use more raw blas bindings.

One problem is specialization, i.e. how to dispatch to use BLAS for element types f32, f64 while still supporting other array element types. Rust will learn specialization down the line, but what it looks now, we can do some dispatch using Any instead. Which is fine, it just adds that Any bound.

bluss · 2016-02-26T21:56:46Z

Note that Any allows static (compile time) dispatch on the element type.

bluss · 2016-02-26T21:59:25Z

As a high level library, ndarray has that strain that comes from supporting a much more general data layout than what BLAS does. So we must always have both the optimized code and the fallback code present for everything.

bluss · 2016-02-28T18:34:52Z

More splitting coming up #94

kernelmachine · 2016-02-28T19:28:55Z

Awesome. I'll look into rayon integration via these split_at functions.

Yeah regarding the BLAS float issue, Any bound was my solution as well. In the initialization of the matrix I just tried to cast any ints to floats, else returned error.

bluss · 2016-03-16T19:00:43Z

Can you make this issue more concrete? Ndarray will not aim to develop or host a project that is similar to Dask, but we can make sure it can be built with ndarray.

More low level methods have been exposed since this issue was reported (See 0.4.2 release).

Maybe more concrete issues can be filed for missing functionality.

kernelmachine · 2016-03-20T00:53:21Z

Sure. I think this issue comes down to an integration between ndarray and rayon. We should be able to apply basic parallelized computations on an array of subviews, and aggregate/reduce. This interface could be generic, or we could focus on a few specialized computations, like elementwise-operations or selections.

bluss · 2016-03-20T01:47:06Z

Yeah.

Here's a very basic experiment with that (only elementwise ops)

https://github.com/bluss/rust-ndarray-experimental/blob/master/src/lib.rs

One important thing is of course to split along whichever axis has the greatest stride.
There was a significant discovery here related to the just merged unstable feature specialization. You can seamlessly special case for the thread safe vs. not thread safe case, and use rayon only when the operation is thread safe! (Sync / Send as appropriate)

bluss · 2016-12-14T15:34:38Z

We need to break this down into specific sub-issues to that we can get each piece done in turn.

I'm editing the first comment of this issue. This is a good thing, that means that both I and you @pegasos1 can edit the same task list.

kernelmachine · 2016-12-14T16:59:24Z

We just need to implement the parallel iterator trait, right? beyond tests and stuff, what else is there?

bluss · 2016-12-23T13:04:30Z

parallel map is a bit tricky (the Array::map(f) -> Array), but I have a work in progress for that.

bluss · 2016-12-23T13:08:58Z

There's also the question of interface. You have championed the parallel wrapper for array types before I think.

With parallel wrappers it could be something like:

use ndarray::parallel::par;
par(&mut array).map_inplace(|x| *x = x.exp());

or parallel array view types

array.par_view_mut().map_inplace(|x| *x = exp());

We could use wrapper/builder types for the closure instead:

use ndarray::parallel::par;
array.map_inplace(par(|x| *x = x.exp()));

or separate methods:

array.par_map_inplace(|x| *x = x.exp());

What is possible with specialization is to transparently parallelize regular Array::map_inplace calls, but that is too magical, we don't want that I think.

iduartgomez · 2017-03-04T10:28:45Z

On a more general note, there are any plans to eventually provide opt-in GPU computation? Maybe using https://github.com/arrayfire/arrayfire-rust ?

bluss · 2017-03-05T00:32:19Z

There is no explicit plan one way or the other.

Ndarray's design (explicit views, direct access to data) dictates that it's an in-memory data structure, so it could only integrate with gpu computation by allowing conversion to a more restricted format (like Arrayfire), or implementing operations using such a conversion before and after.

frjnn · 2021-05-13T08:12:02Z

Parallel Iter for AxisChunksIter

and

Parallel support for Array::map -> Array

should be checked out. @bluss @jturner314

bluss · 2021-05-13T08:16:02Z

I guess everything here is done as of current master. Zip::par_map_collect could be sufficient to satisfy the Array::map item, do you agree @frjnn?

frjnn · 2021-05-13T08:18:44Z

I agree

bluss · 2021-05-13T08:28:28Z

All the actionable points have been completed, so we can celebrate by closing. However, I think there is a lot more to do if we should begin to approach the original appeal of the issue text, and a new issue is welcome for that. 🙂

kernelmachine changed the title ~~Support for threaded/blocked algorithms~~ Support for parallelized/blocked algorithms Feb 26, 2016

bluss changed the title ~~Support for parallelized/blocked algorithms~~ Meta Issue: Support for parallelized/blocked algorithms Dec 14, 2016

bluss mentioned this issue Mar 31, 2017

ndarray development discussion and dashboard #293

Closed

nitsky mentioned this issue May 23, 2019

Parallel Iterator for AxisChunksIter #639

Merged

bluss closed this as completed May 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meta Issue: Support for parallelized/blocked algorithms #89

Meta Issue: Support for parallelized/blocked algorithms #89

kernelmachine commented Feb 26, 2016 •

edited by bluss

Loading

bluss commented Feb 26, 2016

bluss commented Feb 26, 2016

kernelmachine commented Feb 26, 2016

kernelmachine commented Feb 26, 2016

bluss commented Feb 26, 2016

bluss commented Feb 26, 2016

bluss commented Feb 26, 2016

bluss commented Feb 28, 2016

kernelmachine commented Feb 28, 2016

bluss commented Mar 16, 2016

kernelmachine commented Mar 20, 2016

bluss commented Mar 20, 2016

bluss commented Dec 14, 2016

kernelmachine commented Dec 14, 2016

bluss commented Dec 23, 2016

bluss commented Dec 23, 2016

iduartgomez commented Mar 4, 2017

bluss commented Mar 5, 2017

frjnn commented May 13, 2021

bluss commented May 13, 2021

frjnn commented May 13, 2021

bluss commented May 13, 2021

Meta Issue: Support for parallelized/blocked algorithms #89

Meta Issue: Support for parallelized/blocked algorithms #89

Comments

kernelmachine commented Feb 26, 2016 • edited by bluss Loading

bluss commented Feb 26, 2016

bluss commented Feb 26, 2016

kernelmachine commented Feb 26, 2016

kernelmachine commented Feb 26, 2016

bluss commented Feb 26, 2016

bluss commented Feb 26, 2016

bluss commented Feb 26, 2016

bluss commented Feb 28, 2016

kernelmachine commented Feb 28, 2016

bluss commented Mar 16, 2016

kernelmachine commented Mar 20, 2016

bluss commented Mar 20, 2016

bluss commented Dec 14, 2016

kernelmachine commented Dec 14, 2016

bluss commented Dec 23, 2016

bluss commented Dec 23, 2016

iduartgomez commented Mar 4, 2017

bluss commented Mar 5, 2017

frjnn commented May 13, 2021

bluss commented May 13, 2021

frjnn commented May 13, 2021

bluss commented May 13, 2021

kernelmachine commented Feb 26, 2016 •

edited by bluss

Loading