Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: scipy.sparse._csc.csc_matrix → Eigen::SparseMatrix appears to incur a copy #781

Open
alecjacobson opened this issue Nov 8, 2024 · 3 comments · May be fixed by #782
Open

[BUG]: scipy.sparse._csc.csc_matrix → Eigen::SparseMatrix appears to incur a copy #781

alecjacobson opened this issue Nov 8, 2024 · 3 comments · May be fixed by #782

Comments

@alecjacobson
Copy link

Problem description

The docs write that Eigen::SparseMatrix maps to scipy.sparse.csc/csr, but in my minimal example seems to show that it copies. I hope I'm mistaken and this is not actually a bug, in that case apologies I didn't make this a discussion question.

pybind11's documentation has a note that seems to imply that pass by reference is not supported for Eigen::SparseMatrix so I wonder if #126 inherited that.

If there's no current intention to support passing SparseMatrices efficiently, perhaps this is simply a documentation bug. In that case, I wonder if I could help try to add support. My understanding was that numpyeigen passed without a copy (I will try to verify).

Reproducible example code

#include <nanobind/nanobind.h>
#include <nanobind/ndarray.h>
#include <nanobind/eigen/dense.h>
#include <nanobind/eigen/sparse.h>

namespace nb = nanobind;

nb::object foo(const Eigen::SparseMatrix<double>& X)
{
  return nb::none();
}

// Bind the wrapper to the Python module
NB_MODULE(my_module, m) 
{
  m.def("foo", &foo, nb::arg("X"));
}


-------------------------------
import numpy as np
import my_module
import time
import scipy.sparse
# timing
import time

def rand_sparse(n,density):
    n_features = n
    n_samples = n
    rng1 = np.random.RandomState(42)
    rng2 = np.random.RandomState(43)

    nnz = int(n_samples*n_features*density)

    row = rng1.randint(n_samples, size=nnz)
    cols = rng2.randint(n_features, size=nnz)
    data = rng1.rand(nnz)

    S = scipy.sparse.csc_matrix((data, (row, cols)), shape=(n_samples, n_features))
    return S

for p in range(2,8):
    n = 10**p
    S = rand_sparse(n,1.0/(n))
    my_module.foo(S)
    start = time.time()
    for i in range(10):
        my_module.foo(S)
    end = time.time()
    t = (end - start)/10
    print("%10d %f" % (n,t))
@alecjacobson
Copy link
Author

The code outputs:

       100 0.000002
      1000 0.000006
     10000 0.000081
    100000 0.003803
   1000000 0.010209
  10000000 0.090195

which clearly shows linear dependence on the number of non-zeros.

@alecjacobson
Copy link
Author

I also confirmed that the analogous example appears to incur a copy for pybind11.

@alecjacobson
Copy link
Author

I also confirmed that numpyeigen avoids the copy
https://github.com/alecjacobson/numpyeigen-example-project/

@alecjacobson alecjacobson linked a pull request Nov 8, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant