-
-
Notifications
You must be signed in to change notification settings - Fork 485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement hstack
and vstack
.
#1198
base: main
Are you sure you want to change the base?
Conversation
The implementation uses a trait to fold over tuples, summing the dimensions in one direction and checking for equality in the other, and then uses `fixed_{rows,columns}_mut` if the dimensions are static, or `{rows,columns}_mut` if the dimensions are dynamic, together with `copy_from` to construct the output matrix.
Partially addresses #1146. |
Thanks, matrix concatenation/stacking is sorely lacking in |
I don't see a reason not to merge both. The
Regarding codegen, |
…onst parameters to the end of declarations to support the rustc version used by the cuda test suite.
This isn't articulated anywhere, but I have been thinking about stacking/concatenation lately, and I was anyway kind of starting to prefer My personal opinion is that I think you make a good point about compile-times, so I decided to quickly test it. I compiled the current version of
Here are the results without the Here are the results including the We see that the build times are almost the same, because I must also admit that I actually prefer the proc macro approach. The Based on all of this, I would at least on a purely technical level prefer to rename the WIP |
A small note to my compile-time measurements: Looking at the results again, it's clear that there's quite a bit of parallel compilation of small crates going on, which is why the compile times are so similar. I have a 12-core Ryzen 3900x, so I suspect the numbers might look very different single-threaded or, say, a dual-core laptop as I mentioned. (This doesn't change my opinion, but I wanted to be more transparent/clear on the measurements I made) |
Since you refer to them as " impl<A, Func: Visitor<A>> VisitTuple<Func> for (A,) { /* ... */ }
impl<B, A, Func: Visitor<B>> VisitTuple<Func> for (B, A) { /* ... */ }
impl<C, B, A, Func: Visitor<C>> VisitTuple<Func> for (C, B, A) { /* ... */ } and so on, currently up to 8-tuples (these could be written manually if it would be clearer without The If it's desired for one of these systems to be defined in terms of the other, I could implement lazy versions of impl<T, R, C> Block<T> for VCat<X> where
X: Copy +
VisitTuple<VStackShapeInit, Output=VStackShape<R, C>>
VisitTuple<VCatBlocks<...>, Output=<...>>
{
type Rows = R;
type Cols = C;
fn shape(&self) -> (Self::Rows, Self::Cols) {
let shape = <X as VisitTuple<_>>::visit(VStackShapeInit, *self);
(shape.r, shape.c)
}
fn populate<S>(&self, m: &mut Matrix<T, Self::Rows, Self::Cols, S>) {
/* probably implementable directly in terms of `VStack` instead of a new `VCatBlocks` if
pub struct VStack<T, R, C, S, R2> {
out: Matrix<T, R, C, S>,
current_row: R2,
}
is replaced with
pub struct VStack<'a, T, R, C, S, R2> {
out: &'a mut Matrix<T, R, C, S>,
current_row: R2,
}
*/
}
} This would make it so that VCat((
HCat((&a, &b, &c)),
HCat((&d, &e, &f)),
HCat((&g, &h, &i))
)).build() , where impl<T> Block<T> {
fn build(&self) -> MatrixMN<T, Self::Rows, Self::Cols> {
let mut m = allocate_block_output(self);
self.populate(&mut m);
m
}
} is a normal function that can also be used by strict versions of |
…HStack`/`VStack` to facilitate implementing lazy stacking.
Update: after having added the necessary lifetime parameters to the impl<T, R, C, X> Block<T> for VCat<X> where
X: Copy +
VisitTuple<VStackShapeInit, Output=VStackShape<R, C>>
{
type Rows = R;
type Cols = C;
fn shape(&self) -> (Self::Rows, Self::Cols) {
let shape = <X as VisitTuple<_>>::visit(VStackShapeInit, *self);
(shape.r, shape.c)
}
fn populate<S>(&self, m: &mut Matrix<T, Self::Rows, Self::Cols, S>) where
X: for<'a> VisitTuple<VStack<'a, T, R, C, S, Const<0>>, Output=VStack<'a, T, R, C, S, R>
{
let vstack_visitor = VStack { out: m, current_row: Const::<0> };
let _ = <X as VisitTuple<_>>::visit(vstack_visitor, *self);
}
} |
@aweinstock314: thanks for clarifying, indeed I hadn't realized that I'm impressed by what you're proposing here. At the same time, I feel that there are many reasons to prefer the more general
Although merging both could be an option, I'm still personally reluctant to have two incompatible ways of stacking matrices in Overall I'm not personally swayed by the compile-time argument, but I'd be happy for others to chime in here with some perspectives. |
…rmediate allocations when building a matrix with a mix of horizontal and vertical stacking.
I've implemented the lazy versions of
compiles to
|
One other possible advantage of the trait-based approach is that since it has access to the types (which if I'm understanding correctly, proc macros/syn don't), it's possible to get better type inference with this approach. The current implementation only propagates sizes outwards (i.e. determines the size of the output matrix if the sizes of the input matrices are known), but it should be possible to support inferring at most one unknown-width matrix per hstack (and correspondingly, one unknown-height matrix per vstack) if the output width is known by specifying, for each tuple entry, that sum of the preceding entries' widths plus the current entry's width is equal to the output's width minus the sum of the succeeding entries' widths. Could you elaborate on what makes you suspect that the trait approach is less maintainable/extensible than the proc macro approach? I feel that, while it is more verbose when expressed as trait bounds, the underlying logic (i.e. walking through the matrices to be concatenated, keeping track of the current position in the output matrix, and copying into slices of the output matrix) is essentially the same. |
I just completed a new round of reviews for #1080. My personal position is still that we should only have one way of stacking/concatenating matrices in
I think there are several facets here. I think generally working with non-trivial I think it would be useful to have @sebcrozet's opinion here, if he has the time to chime in. |
The implementation uses a trait to fold over tuples, summing the dimensions in one direction and checking for equality in the other, and then uses
fixed_{rows,columns}_mut
if the dimensions are static, or{rows,columns}_mut
if the dimensions are dynamic, together withcopy_from
to construct the output matrix.