-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add raw array pointer types #496
Conversation
unsafe fn clone_with_ptr(&self, ptr: *mut Self::Elem) -> (Self, *mut Self::Elem) { | ||
(*self, ptr) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can be without this trait, and just use DataClone? Clone
is the only consumer of the trait
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In current ndarray
, DataClone
has a Data
bound. If we want to be able to clone ArrayPtr
/ArrayPtrMut
, this is too restrictive given only the current implementation of Clone
for ArrayBase
. Possible alternatives to the approach in this PR:
-
Don't add a
DataRawClone
trait; instead, implementClone
individually forArrayPtr
andArrayPtrMut
. That would be inconsistent and would prevent code with just aDataClone
bound from being able to cloneArrayPtr
andArrayPtrMut
.With a breaking change, though, we could get rid of
DataClone
entirely and just implementClone
individually forArcArray
,Array
,ArrayView
,ArrayPtr
, andArrayPtrMut
. This actually seems pretty nice, but it does make writing generic code more difficult. (In user code, the bound to allowarray.clone()
would have to beArrayBase<S, D>: Clone
instead ofS: DataRawClone
.) -
Add the
DataRawClone
trait and remove theDataClone
trait. That would work but would be a breaking change.
Fwiw, DataClone
does have a small amount of utility as the combination of Data
and DataRawClone
, but I agree it does seem like unnecessary complication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(2) I think a breaking change is ok. We have lots of them queued? We should get on to merging them as soon as 0.12.1 is released (and I can do the releasing if you want).
I'd hope it has very little impact on users, but maybe I underestimate how much these representation traits are used?
I'd use the name DataClone
for the new trait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use the name
DataClone
for the new trait.
The only thing that I don't like about this is consistency of naming with the other data traits, especially DataMut
vs. DataRawMut
. I was naming traits without a Data
bound DataRaw*
and traits with a Data
bound Data*
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. We still should use only one trait if we don't need 2, we could use the name DataRawClone and let DataClone be a deprecated reexport of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataClone could be a trait alias of DataRawClone + Data? That's a new Rust feature we can transition to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woah, I had no idea trait aliases were a thing. They look very convenient.
I've added a commit deprecating DataClone
. We can switch DataClone
to a trait alias once they're available on stable.
208b4df
to
d2daef9
Compare
I like the new names. It also puts the Raw types a bit outside of the list of important types with |
Yeah, it's nice they're sorted near the end alphabetically. Edit: I also like the method name |
In this PR, I've created a separate |
Seems to not matter, separate works. |
Otherwise, `do_collapse_axis` should really be an `unsafe` function.
src/data_traits.rs
Outdated
/// Unsafe because, `ptr` must point inside the current storage. | ||
unsafe fn clone_with_ptr(&self, ptr: *mut Self::Elem) -> (Self, *mut Self::Elem); | ||
#[deprecated(note="use `Data + DataRawClone` instead", since="0.12.1")] | ||
pub trait DataClone : Data + DataRawClone {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard to decide, but it seems best to me to still keep DataClone
under its current name, and let the breaking change instead be, that DataClone does not require Data
. The code using DataClone on github that I've seen (random sample) mostly uses DataClone + Data<Elem=X>
as bound anyway.
Ideally, we should be here. These should be internal implementation traits. Users should have better ways of being general over our arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why you lean towards dropping the Data
bound from DataClone
instead of adding a DataRawClone
trait (and keeping DataClone: Data + DataRawClone
if desired). The name DataClone
is a little shorter than DataRawClone
, which is nice for code that needs to specify a DataClone
/DataRawClone
bound, but I don't see any other advantages to this approach. If the issue is primarily the length of the name, we could call the new trait RawClone
instead of DataRawClone
.
Fwiw, I'd typically write DataClone<Elem = X>
instead of DataClone + Data<Elem = X>
. This does come up more often with DataMut
(example below), but I could conceivably see it happening with DataClone
too.
extern crate ndarray;
use ndarray::prelude::*;
use ndarray::{array, Data, DataMut};
fn foo<S1, S2>(a: &ArrayBase<S1, Ix2>, b: &mut ArrayBase<S2, Ix2>)
where
S1: Data<Elem = f32>,
S2: DataMut<Elem = f32>, // I like this better than Data<Elem = f32> + DataMut
{
*b += a;
}
fn main() {
let a = array![[1., 2.], [3., 4.]];
let mut b = array![[5., 6.], [7., 8.]];
foo(&a, &mut b);
println!("{}", b);
}
I'm against dropping the Data
bound from DataClone
for two reasons:
- The biggest issue is inconsistency with the names of other traits: all the other traits named
Data*
have aData
bound, while theDataRaw*
traits don't. - If users don't notice the change in the changelog and update their code, it would lead to safety issues for user code relying on
DataClone
implyingData
. For example, this could happen if existing user code specifies aDataClone<Elem = A>
bound on an array and dereferences pointers to elements in the array. The consequences of this could be pretty bad, but this coming up in practice does seem unlikely.
We could resolve reason 1 and maintain consistency of naming by dropping the Data
bound from DataMut
, DataOwned
, and DataShared
too. Then, we'd have something like this:
/// Array representation trait.
///
/// For an array that meets the invariants of the `ArrayBase` type. This trait
/// does not imply any ownership or lifetime; pointers to elements in the array
/// may not be safe to dereference.
///
/// ***Note:*** `RawData` is not an extension interface at this point.
/// Traits in Rust can serve many different roles. This trait is public because
/// it is used as a bound on public methods.
pub unsafe trait RawData : Sized {
/// The array element type.
type Elem;
#[doc(hidden)]
// This method is only used for debugging
fn _data_slice(&self) -> Option<&[Self::Elem]>;
private_decl!{}
}
/// Array representation trait.
///
/// For an array with writable elements.
///
/// ***Internal trait, see `RawData`.***
pub unsafe trait DataMut : RawData {
/// If possible, ensures that the array has unique access to its data.
///
/// If `Self` provides safe mutable access to array elements, then it
/// **must** panic or ensure that the data is unique.
#[doc(hidden)]
fn try_ensure_unique<D>(&mut ArrayBase<Self, D>)
where Self: Sized,
D: Dimension;
/// If possible, returns whether the array has unique access to its data.
///
/// If `Self` provides safe mutable access to array elements, then it
/// **must** return `Some(_)`.
#[doc(hidden)]
fn try_is_unique(&mut self) -> Option<bool>;
}
/// Array representation trait.
///
/// A representation that is a unique or shared owner of its data.
///
/// ***Internal trait, see `Data`.***
pub unsafe trait DataOwned : RawData {}
/// Array representation trait.
///
/// A representation that is a lightweight view.
///
/// ***Internal trait, see `Data`.***
pub unsafe trait DataShared : Clone + RawData + DataClone { }
/// Array representation trait.
///
/// An array representation that can be cloned.
///
/// ***Internal trait, see `RawData`.***
pub unsafe trait DataClone : RawData {
#[doc(hidden)]
/// Unsafe because, `ptr` must point inside the current storage.
unsafe fn clone_with_ptr(&self, ptr: *mut Self::Elem) -> (Self, *mut Self::Elem);
#[doc(hidden)]
unsafe fn clone_from_with_ptr(&mut self, other: &Self, ptr: *mut Self::Elem) -> *mut Self::Elem {
let (data, ptr) = other.clone_with_ptr(ptr);
*self = data;
ptr
}
}
/// Array representation trait.
///
/// For an array with elements that can be accessed with safe code.
///
/// ***Internal trait, see `RawData`.***
//
// # For implementers
//
// If you implement both the `Data` and `DataMut` traits, you are guaranteeing
// that the `DataMut::try_ensure_unique` implementation always panics or
// ensures that the data is unique. You are also guaranteeing that
// `try_is_unique` always returns `Some(_)`.
pub unsafe trait Data : RawData {
#[doc(hidden)]
fn new(elements: Vec<Self::Elem>) -> Self
where
Self: DataOwned;
/// Converts the data representation to a shared (copy on write)
/// representation, without any copying.
#[doc(hidden)]
fn into_shared(self) -> OwnedRcRepr<Self::Elem>
where
Self: DataOwned;
/// Converts the array to a uniquely owned array, cloning elements if necessary.
#[doc(hidden)]
fn into_owned<D>(self_: ArrayBase<Self, D>) -> ArrayBase<OwnedRepr<Self::Elem>, D>
where
Self::Elem: Clone,
D: Dimension;
/// Ensures that the array has unique access to its data.
#[doc(hidden)]
#[inline]
fn ensure_unique<D>(self_: &mut ArrayBase<Self, D>)
where
Self: DataMut + Sized,
D: Dimension
{
Self::try_ensure_unique(self_)
}
/// Returns whether the array has unique access to its data.
#[doc(hidden)]
#[inline]
fn is_unique(&mut self) -> bool
where
Self: DataMut,
{
self.try_is_unique().unwrap()
}
}
There are two disadvantages to this: First, the safety issue (reason 2 above) is more likely to occur with more important traits like DataMut
missing the Data
bound. Second, this requires explicitly writing a Data
bound in more cases (e.g. DataMut + Data<Elem = A>
instead of DataMut<Elem = A>
) which isn't as convenient.
These should be internal implementation traits. Users should have better ways of being general over our arrays.
Treating them as internal would be fine, but leaving them public doesn't seem too bad. If we treat them as internal, we'd need to add ArrayMut
, ArrayOwned
, etc., traits implemented on ArrayBase
to allow users to operate in a generic way over array types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, you're right :)
After reading about the stacked borrows model and trying to catch up with what the unsafe guidelines people have been working on, this seems even more important. For example down the line, let's have an NdProducer that produces raw pointer elements, that should allow us (I think?) to get around problems like accessing uninitialized elements through a mutable reference, if that becomes an issue. I hope that unsafe code guidelines will have some room for pragmatism and adapting to existing unsafe code practices. |
I'd like to rename
Yeah, we should be able to refactor a lot of functionality in terms of For uninitialized data specifically, I'm looking forward to RFC 1892, but for the time being, raw views provide a safe way to work with uninitialized data. We could even have a "raw" owned array type for use with uninitialized data until RFC 1892 is implemented, but I'd prefer to just wait for RFC 1892. |
New names sound good! |
We'll replace `DataClone` with a trait alias once that feature becomes available on stable.
💯 |
The motivation for this is #388 and another project I'm working on where I'd like to be able to properly deal with non-
Copy
uninitialized data.This PR introduces two new traits,
DataRaw
andDataRawMut
, which are less restrictive thanData
andDataMut
. It also introducesArrayPtr
andArrayPtrMut
which are similar toArrayView
andArrayViewMut
but don't carry lifetime information. I believe all the changes are backwards compatible.ArrayPtr
andArrayPtrMut
are more restrictive than normal raw pointers in Rust: they require pointers to all elements in the array to be within bounds or one byte past the end of a single allocated object. The reason for this restriction is to ensure that.offset()
operations on the array's pointer can be performed safely. If we lifted this restriction, then then following methods would have to be unsafe forArrayPtr
andArrayPtrMut
:.slice_move()
.slice_inplace()
.slice_axis_inplace()
.subview_inplace()
.into_subview()
.invert_axis()
.split_at()
While we could make these methods safe for normal arrays and
unsafe
for array pointers by implementing them separately forArrayBase<S: Data, D: Dimension>
,ArrayPtr<A, D: Dimension>
, andArrayPtrMut<A, D: Dimension>
, doing so requires some duplication and hurts code organization. Additionally, as a user, I'd prefer for most methods onArrayPtr
/ArrayPtrMut
to be safe for convenience, and in all of my realistic use-cases, myArrayPtr
/ArrayPtrMut
instances will point to valid allocations.@bluss Will you please review this? This PR deals with more unsafe code and potential for undefined behavior than I'm used to.
Edit: I've now added a
DataRawClone
trait to allow cloningArrayPtr
andArrayPtrMut
. (I couldn't just implementDataClone
forPtrRepr<*const A>
andPtrRepr<*mut A>
becauseDataClone: Data
.)