Add lane sampling to ndarray-rand #724

LukeMathWalker · 2019-09-22T20:25:17Z

While working on rust-ndarray/ndarray-examples#2 I realised that a sampling method could actually fit quite nicely in ndarray-rand.

I'd like to relax the Copy bound on A, I need to look deeper into select's implementation.

bluss · 2019-09-24T18:50:21Z

Relaxing the Copy bound on select is a bigger task, I think I'd like to help (but feel free to work on it if you get to it). I consider it a bigger task, since it entails handling copying partial arrays and being able to tear down and drop the partial array if something panics during this operation. Definitely something it's about time to handle.

The choice to use Copy to implement something that's fast and safe was a good one at the time.

jturner314 · 2019-09-24T20:37:03Z

Relaxing the Copy bound on select

Fwiw, nditer can do this. See this example; just replace x * x with x.clone(). Right now, this is implemented using external iteration (since it relies on Vec's from_iter implementation), but I'll probably change it to internal iteration in the future. I really need to finish up nditer and officially release it...

The key idea is to collect the data into a Vec in whatever order you're iterating over it, and then convert that into an array with the correct shape and strides. nditer attempts to choose the best iteration order. However, the simplest (but somewhat slow) implementation of select would be something like this:

use ndarray::prelude::*;
use ndarray::RemoveAxis;

fn select<A, D>(arr: ArrayView<'_, A, D>, axis: Axis, indices: &[usize]) -> Array<A, D>
where
    A: Clone,
    D: Dimension + RemoveAxis,
{
    let data = indices
        .iter()
        .flat_map(|&index| arr.index_axis(axis, index).iter())
        .cloned()
        .collect();

    // TODO: Check that new length doesn't overflow `isize`.
    let mut dim = arr.raw_dim();
    dim[axis.index()] = indices.len();

    // TODO: Compute strides. `axis` is the outermost axis (largest stride),
    // and the remaining axes are in standard order.
    let strides = unimplemented!();

    Array::from_vec_dim_stride_unchecked(dim, strides, data)
}

LukeMathWalker · 2019-09-25T07:22:37Z

Relaxing the Copy bound on select is a bigger task, I think I'd like to help (but feel free to work on it if you get to it). I consider it a bigger task, since it entails handling copying partial arrays and being able to tear down and drop the partial array if something panics during this operation. Definitely something it's about time to handle.

The choice to use Copy to implement something that's fast and safe was a good one at the time.

I'll keep this mind, I was planning to have a look this weekend most likely.
I would suggest to evaluate/merge this independently of the select overhaul, then it's just a matter of dropping the bound here when that lands.

ndarray-rand/src/lib.rs

ndarray-rand/Cargo.toml

bluss · 2019-09-25T16:38:29Z

Sorry for the unstructured critique, but how do we decide if it's best with separate methods for independent sampling and sampling with replacement, or to have them in one? Looks like a logical addition anyhow.

LukeMathWalker · 2019-09-25T20:46:12Z

Sorry for the unstructured critique, but how do we decide if it's best with separate methods for independent sampling and sampling with replacement, or to have them in one? Looks like a logical addition anyhow.

I don't have a strong preference in either direction.
If we make them separate, I can see we will end up with quite verbose names.
If we fold them in a single one (as in this PR), then there is an option that you have to specify every time. But maybe it's good, but you get to think about it?

jturner314

Regarding the "with/without replacement" question --

It's worth noting that there is a similar method in rand: SliceRandom::choose_multiple. It might be beneficial to use similar naming (axis_choose_multiple, for example). If we did that, we'd want to match the behavior of rand (samping without replacement) and add a separate method for sampling with replacement (although I'm not sure what we'd call it).

If we handle "with/without replacement" as a parameter, IMO we should make that parameter an enum instead of a bool for additional clarity, e.g.

enum Choose {
    WithReplacement,
    WithoutReplacement,
}

ndarray-rand/src/lib.rs

LukeMathWalker · 2019-09-28T11:42:50Z

I lean towards a single method with a parameter - I changed it from a bool to an enum (SamplingStrategy).

I have added a quickcheck feature-flag to provide an Arbitrary implementation for it (something I'd like to do for ndarray as well at a certain point) and I realized we were not running ndarray-rand tests in CI - hence I changed the script to make sure we do.

LukeMathWalker · 2019-10-06T13:11:46Z

I am planning to start working on another convenience function for ndarray-rand (shuffle_axis, similar to SliceRandom::shuffle from rand) - I am going to need the changes I have already done in this PR (adding an extra type parameter to RandomExt, fixing CI for ndarray-rand, etc.).

What is missing for this PR to be merged? Should I branch from here or should I branch from master? Just conscious of wasting time in merge conflicts 😅

bluss · 2019-10-06T13:46:46Z

This needs rebase/fixup :) Otherwise it looks good — I won't have much opinions on ndarray-rand. So I'd say, branch off from this branch, but not in the state it is now.

quickcheck as a public dependency is not something I'd like to have for ndarray.

Add `quickcheck` feature flag to `ndarray-rand` to get `Arbitrary` implementation for `ndarray-rand` types. Update quickcheck dependency in ndarray. Fix CI scripts to run ndarray-rand's tests.

LukeMathWalker · 2019-10-06T14:03:44Z

I was going for a squash and merge, but rebasing sounds good as well :)

quickcheck as a public dependency is not something I'd like to have for ndarray.

Could you elaborate?
My idea/proposal is mostly a result of the orphan rule - as far as I understand it, it makes it impossible to provide an Arbitrary implementation for ndarray outside of the ndarray crate (or quickcheck itself).
quickcheck seems to have become the de facto standard for property-based testing in Rust right now and I don't feel that a feature-flagged quickcheck dependency would be much different from our current feature-flagged serde dependency for serialization.

bluss · 2019-10-06T14:52:57Z

I can't elaborate all the way, but if we had quickcheck as a pub dep, then rand is a pub dep of ndarray, and ndarray-rand would be pointless, because the separation would be gone. And sure, we have talked about rand as pub dep as a possibility through dependency rename.

Another facet is that serde is 1.0 and has been for a while, and it's a more important feature and a feature that doesn't have alternatives.

LukeMathWalker added ndarray-rand enhancement labels Sep 22, 2019

bluss reviewed Sep 25, 2019

View reviewed changes

ndarray-rand/src/lib.rs Outdated Show resolved Hide resolved

bluss reviewed Sep 25, 2019

View reviewed changes

ndarray-rand/Cargo.toml Outdated Show resolved Hide resolved

jturner314 reviewed Sep 25, 2019

View reviewed changes

ndarray-rand/src/lib.rs Outdated Show resolved Hide resolved

bluss added the breaking-change label Sep 29, 2019

Add sample_axis and sample_axis_using to ndarray-rand.

0c9a2e3

Add `quickcheck` feature flag to `ndarray-rand` to get `Arbitrary` implementation for `ndarray-rand` types. Update quickcheck dependency in ndarray. Fix CI scripts to run ndarray-rand's tests.

LukeMathWalker force-pushed the random-convenient-functions branch from 0583689 to 0c9a2e3 Compare October 6, 2019 13:55

bluss merged commit ad7efff into rust-ndarray:master Oct 6, 2019

LukeMathWalker deleted the random-convenient-functions branch October 6, 2019 15:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add lane sampling to ndarray-rand #724

Add lane sampling to ndarray-rand #724

LukeMathWalker commented Sep 22, 2019

bluss commented Sep 24, 2019 •

edited

Loading

jturner314 commented Sep 24, 2019 •

edited

Loading

LukeMathWalker commented Sep 25, 2019

bluss commented Sep 25, 2019

LukeMathWalker commented Sep 25, 2019

jturner314 left a comment

LukeMathWalker commented Sep 28, 2019

LukeMathWalker commented Oct 6, 2019

bluss commented Oct 6, 2019

LukeMathWalker commented Oct 6, 2019

bluss commented Oct 6, 2019

Add lane sampling to ndarray-rand #724

Add lane sampling to ndarray-rand #724

Conversation

LukeMathWalker commented Sep 22, 2019

bluss commented Sep 24, 2019 • edited Loading

jturner314 commented Sep 24, 2019 • edited Loading

LukeMathWalker commented Sep 25, 2019

bluss commented Sep 25, 2019

LukeMathWalker commented Sep 25, 2019

jturner314 left a comment

Choose a reason for hiding this comment

LukeMathWalker commented Sep 28, 2019

LukeMathWalker commented Oct 6, 2019

bluss commented Oct 6, 2019

LukeMathWalker commented Oct 6, 2019

bluss commented Oct 6, 2019

bluss commented Sep 24, 2019 •

edited

Loading

jturner314 commented Sep 24, 2019 •

edited

Loading