Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lane sampling to ndarray-rand #724

Merged

Conversation

LukeMathWalker
Copy link
Member

While working on rust-ndarray/ndarray-examples#2 I realised that a sampling method could actually fit quite nicely in ndarray-rand.

I'd like to relax the Copy bound on A, I need to look deeper into select's implementation.

@bluss
Copy link
Member

bluss commented Sep 24, 2019

Relaxing the Copy bound on select is a bigger task, I think I'd like to help (but feel free to work on it if you get to it). I consider it a bigger task, since it entails handling copying partial arrays and being able to tear down and drop the partial array if something panics during this operation. Definitely something it's about time to handle.

The choice to use Copy to implement something that's fast and safe was a good one at the time.

@jturner314
Copy link
Member

jturner314 commented Sep 24, 2019

Relaxing the Copy bound on select

Fwiw, nditer can do this. See this example; just replace x * x with x.clone(). Right now, this is implemented using external iteration (since it relies on Vec's from_iter implementation), but I'll probably change it to internal iteration in the future. I really need to finish up nditer and officially release it...

The key idea is to collect the data into a Vec in whatever order you're iterating over it, and then convert that into an array with the correct shape and strides. nditer attempts to choose the best iteration order. However, the simplest (but somewhat slow) implementation of select would be something like this:

use ndarray::prelude::*;
use ndarray::RemoveAxis;

fn select<A, D>(arr: ArrayView<'_, A, D>, axis: Axis, indices: &[usize]) -> Array<A, D>
where
    A: Clone,
    D: Dimension + RemoveAxis,
{
    let data = indices
        .iter()
        .flat_map(|&index| arr.index_axis(axis, index).iter())
        .cloned()
        .collect();

    // TODO: Check that new length doesn't overflow `isize`.
    let mut dim = arr.raw_dim();
    dim[axis.index()] = indices.len();

    // TODO: Compute strides. `axis` is the outermost axis (largest stride),
    // and the remaining axes are in standard order.
    let strides = unimplemented!();

    Array::from_vec_dim_stride_unchecked(dim, strides, data)
}

@LukeMathWalker
Copy link
Member Author

Relaxing the Copy bound on select is a bigger task, I think I'd like to help (but feel free to work on it if you get to it). I consider it a bigger task, since it entails handling copying partial arrays and being able to tear down and drop the partial array if something panics during this operation. Definitely something it's about time to handle.

The choice to use Copy to implement something that's fast and safe was a good one at the time.

I'll keep this mind, I was planning to have a look this weekend most likely.
I would suggest to evaluate/merge this independently of the select overhaul, then it's just a matter of dropping the bound here when that lands.

ndarray-rand/src/lib.rs Outdated Show resolved Hide resolved
ndarray-rand/Cargo.toml Outdated Show resolved Hide resolved
@bluss
Copy link
Member

bluss commented Sep 25, 2019

Sorry for the unstructured critique, but how do we decide if it's best with separate methods for independent sampling and sampling with replacement, or to have them in one? Looks like a logical addition anyhow.

@LukeMathWalker
Copy link
Member Author

Sorry for the unstructured critique, but how do we decide if it's best with separate methods for independent sampling and sampling with replacement, or to have them in one? Looks like a logical addition anyhow.

I don't have a strong preference in either direction.
If we make them separate, I can see we will end up with quite verbose names.
If we fold them in a single one (as in this PR), then there is an option that you have to specify every time. But maybe it's good, but you get to think about it?

Copy link
Member

@jturner314 jturner314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the "with/without replacement" question --

It's worth noting that there is a similar method in rand: SliceRandom::choose_multiple. It might be beneficial to use similar naming (axis_choose_multiple, for example). If we did that, we'd want to match the behavior of rand (samping without replacement) and add a separate method for sampling with replacement (although I'm not sure what we'd call it).

If we handle "with/without replacement" as a parameter, IMO we should make that parameter an enum instead of a bool for additional clarity, e.g.

enum Choose {
    WithReplacement,
    WithoutReplacement,
}

ndarray-rand/src/lib.rs Outdated Show resolved Hide resolved
@LukeMathWalker
Copy link
Member Author

I lean towards a single method with a parameter - I changed it from a bool to an enum (SamplingStrategy).

I have added a quickcheck feature-flag to provide an Arbitrary implementation for it (something I'd like to do for ndarray as well at a certain point) and I realized we were not running ndarray-rand tests in CI - hence I changed the script to make sure we do.

@LukeMathWalker
Copy link
Member Author

I am planning to start working on another convenience function for ndarray-rand (shuffle_axis, similar to SliceRandom::shuffle from rand) - I am going to need the changes I have already done in this PR (adding an extra type parameter to RandomExt, fixing CI for ndarray-rand, etc.).

What is missing for this PR to be merged? Should I branch from here or should I branch from master? Just conscious of wasting time in merge conflicts 😅

@bluss
Copy link
Member

bluss commented Oct 6, 2019

This needs rebase/fixup :) Otherwise it looks good — I won't have much opinions on ndarray-rand. So I'd say, branch off from this branch, but not in the state it is now.

quickcheck as a public dependency is not something I'd like to have for ndarray.

Add `quickcheck` feature flag to `ndarray-rand` to get `Arbitrary`
implementation for `ndarray-rand` types.
Update quickcheck dependency in ndarray.
Fix CI scripts to run ndarray-rand's tests.
@LukeMathWalker LukeMathWalker force-pushed the random-convenient-functions branch from 0583689 to 0c9a2e3 Compare October 6, 2019 13:55
@LukeMathWalker
Copy link
Member Author

I was going for a squash and merge, but rebasing sounds good as well :)

quickcheck as a public dependency is not something I'd like to have for ndarray.

Could you elaborate?
My idea/proposal is mostly a result of the orphan rule - as far as I understand it, it makes it impossible to provide an Arbitrary implementation for ndarray outside of the ndarray crate (or quickcheck itself).
quickcheck seems to have become the de facto standard for property-based testing in Rust right now and I don't feel that a feature-flagged quickcheck dependency would be much different from our current feature-flagged serde dependency for serialization.

@bluss
Copy link
Member

bluss commented Oct 6, 2019

I can't elaborate all the way, but if we had quickcheck as a pub dep, then rand is a pub dep of ndarray, and ndarray-rand would be pointless, because the separation would be gone. And sure, we have talked about rand as pub dep as a possibility through dependency rename.

Another facet is that serde is 1.0 and has been for a while, and it's a more important feature and a feature that doesn't have alternatives.

@bluss bluss merged commit ad7efff into rust-ndarray:master Oct 6, 2019
@LukeMathWalker LukeMathWalker deleted the random-convenient-functions branch October 6, 2019 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants