Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[on hold] Switch convert_2q_block_matrix.rs to use matmul from faer and an explicit version of kron #12193

Closed
wants to merge 20 commits into from
Closed
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 53 additions & 19 deletions crates/accelerate/src/convert_2q_block_matrix.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,16 @@ use pyo3::wrap_pyfunction;
use pyo3::Python;

use num_complex::Complex64;
use numpy::ndarray::linalg::kron;
use numpy::ndarray::{aview2, Array2, ArrayView2};
use numpy::{IntoPyArray, PyArray2, PyReadonlyArray2};
use smallvec::SmallVec;

use faer::modules::core::mul::matmul;
use faer::perm::swap_rows;
use faer::prelude::*;
use faer::{Mat, Parallelism};
use faer_ext::{IntoFaerComplex, IntoNdarrayComplex};

static ONE_QUBIT_IDENTITY: [[Complex64; 2]; 2] = [
[Complex64::new(1., 0.), Complex64::new(0., 0.)],
[Complex64::new(0., 0.), Complex64::new(1., 0.)],
Expand All @@ -32,32 +37,61 @@ pub fn blocks_to_matrix(
py: Python,
op_list: Vec<(PyReadonlyArray2<Complex64>, SmallVec<[u8; 2]>)>,
) -> PyResult<Py<PyArray2<Complex64>>> {
let identity = aview2(&ONE_QUBIT_IDENTITY);
let input_matrix = op_list[0].0.as_array();
let mut matrix: Array2<Complex64> = match op_list[0].1.as_slice() {
[0] => kron(&identity, &input_matrix),
[1] => kron(&input_matrix, &identity),
let identity = aview2(&ONE_QUBIT_IDENTITY).into_faer_complex();
let input_matrix = op_list[0].0.as_array().into_faer_complex();

let mut matrix = match op_list[0].1.as_slice() {
[0] => identity.kron(input_matrix.as_ref()),
[1] => input_matrix.kron(identity.as_ref()),
arnaucasau marked this conversation as resolved.
Show resolved Hide resolved
[0, 1] => input_matrix.to_owned(),
[1, 0] => change_basis(input_matrix),
[] => Array2::eye(4),
[1, 0] => change_basis_faer(input_matrix),
[] => Mat::<c64>::identity(4, 4),
_ => unreachable!(),
};
for (op_matrix, q_list) in op_list.into_iter().skip(1) {
let op_matrix = op_matrix.as_array();
let op_matrix = op_matrix.as_array().into_faer_complex();

let result = match q_list.as_slice() {
[0] => Some(kron(&identity, &op_matrix)),
[1] => Some(kron(&op_matrix, &identity)),
[1, 0] => Some(change_basis(op_matrix)),
[] => Some(Array2::eye(4)),
_ => None,
};
matrix = match result {
Some(result) => result.dot(&matrix),
None => op_matrix.dot(&matrix),
[0] => identity.kron(op_matrix.as_ref()),
[1] => op_matrix.kron(identity.as_ref()),
[1, 0] => change_basis_faer(op_matrix),
[] => Mat::<c64>::identity(4, 4),
_ => op_matrix.to_owned(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The op_matrix.to_owned() here is a bit weird to me. It feels like an unnecessary copy as we don't actually need an owned copy of the array from python here as it's only used in the matmul call below. The previous logic was a bit ugly but was done that way to avoid copying with to_owned(). The other option would be to maybe try something like:

            [0] => identity.kron(op_matrix.as_ref()).as_ref(),
            [1] => op_matrix.kron(identity.as_ref()).as_ref(),
            [1, 0] => change_basis_faer(op_matrix).as_ref(),
            [] => Mat::<c64>::identity(4, 4).as_ref(),
            _ => op_matrix.to_owned(),

and change result to be a MatRef<c64> instead of of a Mat<c64>. Although I'm not sure if the compiler would be happy with this (there might be a lifetime issue doing this. The other advantage is if result becomes a MatRef we can create the 4x4 identity matrix as a static and avoid the allocation (although in practice I'm not sure this ever gets used so it probably doesn't matter).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried the code you suggested, and as you said, the compiler is not happy about it. The error message is "temporary value is freed at the end of this statement" referring to each case of the match where we use as_ref().

Eric and I have been trying to fix it, but the only alternative we saw to avoid copying with to_owned() is to recover the previous logic, making the result matrix an Option<Mat<c64>> and "duplicating" the matmul call using the unwrapped value of the Some or op_matrix in case of None. It's not as readable as now, but performance-wise should be better.

Here is the git diff of the change. What do you think about doing something like this again? I suspect we will see this case in two_qubit_decompose.rs too.

         let op_matrix = op_matrix.as_array().into_faer_complex();
 
         let result = match q_list.as_slice() {
-            [0] => identity.kron(op_matrix.as_ref()),
-            [1] => op_matrix.kron(identity.as_ref()),
-            [1, 0] => change_basis_faer(op_matrix),
-            [] => Mat::<c64>::identity(4, 4),
-            _ => op_matrix.to_owned(),
+            [0] => Some(identity.kron(op_matrix.as_ref())),
+            [1] => Some(op_matrix.kron(identity.as_ref())),
+            [1, 0] => Some(change_basis_faer(op_matrix)),
+            [] => Some(Mat::<c64>::identity(4, 4)),
+            _ => None,
         };
 
         let aux = matrix.clone();
-        matmul(
-            matrix.as_mut(),
-            result.as_ref(),
-            aux.as_ref(),
-            None,
-            c64::new(1., 0.),
-            Parallelism::None,
-        );
+        match result {
+            Some(x) => {
+                matmul(
+                    matrix.as_mut(),
+                    x.as_ref(),
+                    aux.as_ref(),
+                    None,
+                    c64::new(1., 0.),
+                    Parallelism::None,
+                );
+            },
+            None => {
+                matmul(
+                    matrix.as_mut(),
+                    op_matrix,
+                    aux.as_ref(),
+                    None,
+                    c64::new(1., 0.),
+                    Parallelism::None,
+                );
+            }
+        };
     }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI this is what we were trying to do:

let result = match q_list.as_slice() {
  [0] => Some(identity.kron(op_matrix.as_ref())),
  [1] => Some(op_matrix.kron(identity.as_ref())),
  [1, 0] => Some(change_basis_faer(op_matrix)),
  [] => Some(Mat::<c64>::identity(4, 4)),
   _ => None,
};
let arg1 = match result {
   // Go from Mat to MatRef. This failed because the lifetime of x
  Some(x) => x.as_ref(),
  // op_matrix is already MatRef
  None => op_matrix
}

Using result.as_ref().unwrap_or(op_matrix) does not work. result.as_ref results in an Option<&Mat> rather than Option<MatRef> and they are not the same thing.

Copy link
Member

@mtreinish mtreinish Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think lets use this pattern, it's ugly but it lets us avoid a copy

Copy link
Member

@jakelishman jakelishman Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using result.as_ref().unwrap_op(op_matrix) does not work.

Should this not be result.as_ref().map(|x| x.as_ref()).unwrap_op(op_matrix)?

Fwiw, match result is wrong because it's consuming the result, so there's nothing for the reference to refer to. The match needs to be on result.as_ref() to keep the underlying matrix alive, and then it's no problem if you get an &Mat - you can just call Mat::as_ref on one of those, no trouble - that's just the regular API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not be result.as_ref().map(|x| x.as_ref()).unwrap_op(op_matrix)?

Yes, that works! Much better than duplicating the matmul call, thanks Jake!

};

let aux = matrix.clone();
arnaucasau marked this conversation as resolved.
Show resolved Hide resolved
matmul(
matrix.as_mut(),
mtreinish marked this conversation as resolved.
Show resolved Hide resolved
result.as_ref(),
aux.as_ref(),
None,
c64::new(1., 0.),
Parallelism::None,
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the faer docs, it looks like you may be able to just do:

let res = result * matrix;

https://docs.rs/faer/latest/faer/#basic-usage

It obviously doesn't give us the same level as control as calling faer::modules::core::mul::matmul; but I wonder if there is a performance difference with it, as I think it would be a little cleaner looking than calling matmul like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think it looks cleaner too.

I wonder if there is a performance difference with it

To test it, I ran the same tests as in the issue, and it seems like we lose performance if we don't call faer::modules::core::mul::matmul;

% change avg (asterisk -> matmul) asterisk (min) asterisk (max) asterisk (avg) matmul (min) matmul (max) matmul (avg)
2x2 92.80% 524ns 651ns 547.67ns 38ns 70ns 39.41ns
4x4 78.19% 659ns 1.00µs 689.28ns 145ns 217ns 150.34ns

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that's quite the difference, I wasn't expecting that. Then yeah lets leave it as is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, wait what exactly were you timing here? Was it just the time to call matmul or did you include the clone() time? The bare matmul call by itself is definitely gonna be faster, but you still need to allocate the result space which you are in effect doing with the clone() call above. The multiplication operator just does that implicitly so it's not a manual operation.

Copy link
Contributor Author

@arnaucasau arnaucasau Apr 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yeah, I was only timing the matmul call in general, without using the same matrix as the destination either. I rewrote the test including the clone().

So for the matmul call, I'm timing:

let aux = dst.clone();
matmul(dst.as_mut(), a.as_ref(), aux.as_ref(), Some(ALPHA), BETA, Parallelism::None);

For the other case:

dst = &dst + &a * &dst;

ALPHA and BETA are 1 for the first case.

% change avg (asterisk -> matmul) asterisk (min) asterisk (max) asterisk (avg) matmul (min) matmul (max) matmul (avg)
2x2 56.61% 589ns 868ns 632.74ns 263ns 432ns 274.56ns
4x4 44.37% 717ns 938ns 763.04ns 397ns 558ns 424.51ns

}
Ok(matrix.into_pyarray_bound(py).unbind())

Ok(matrix
.as_ref()
.into_ndarray_complex()
.to_owned()
.into_pyarray_bound(py)
.unbind())
}

/// Switches the order of qubits in a two qubit operation.
/// This function will substitue `change_basis` once the
/// `two_qubit_decompose.rs` uses Mat<c64> instead of ArrayView2
#[inline]
pub fn change_basis_faer(matrix: MatRef<c64>) -> Mat<c64> {
let mut trans_matrix: Mat<c64> = matrix.transpose().to_owned();
let (row1, row2) = trans_matrix.as_mut().two_rows_mut(1, 2);
swap_rows(row1, row2);

trans_matrix = trans_matrix.transpose().to_owned();
let (row1, row2) = trans_matrix.as_mut().two_rows_mut(1, 2);
swap_rows(row1, row2);

trans_matrix
}

/// Switches the order of qubits in a two qubit operation.
Expand Down
Loading