Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed and memory improvements in linalg #303

Merged
merged 6 commits into from
Mar 22, 2023
Merged

Speed and memory improvements in linalg #303

merged 6 commits into from
Mar 22, 2023

Conversation

maximelucas
Copy link
Collaborator

  • made incidence_matrix() cleaner and faster (about 4x)
  • made output consistent for sparse/non-sparse for extreme cases (empty Hypergraph, no edges of order d)
  • added tests
  • made multiorder_laplacian() "truly sparse": all internal variable are sparse if sparse=True. This fixes multiorder laplacian not truly sparse #301.

As a result, multiorder_laplacian() is also faster.

Timings for the email-eu dataset:

%timeit LL = xgi.multiorder_laplacian(H, orders, weights, sparse=False)
# --> before: 9.5 s, after: 9.5s
%timeit LL = xgi.multiorder_laplacian(H, orders, weights, sparse=True)
# --> before: 9.5 s, after: 1.1s

That's a 10x speedup for the sparse case, and more memory efficient (all internals are sparse).

@codecov
Copy link

codecov bot commented Mar 22, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.89 🎉

Comparison is base (6a5fb60) 83.41% compared to head (53e4818) 84.31%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #303      +/-   ##
==========================================
+ Coverage   83.41%   84.31%   +0.89%     
==========================================
  Files          35       35              
  Lines        2732     2747      +15     
==========================================
+ Hits         2279     2316      +37     
+ Misses        453      431      -22     
Impacted Files Coverage Δ
xgi/algorithms/centrality.py 63.04% <ø> (ø)
xgi/convert.py 91.26% <100.00%> (ø)
xgi/linalg/matrix.py 94.24% <100.00%> (+0.53%) ⬆️

... and 3 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@nwlandry
Copy link
Collaborator

Thanks so much - this is awesome! One quick question which isn't essential for this PR is why the A.setdiag(0) statement in adjacency_matrix() throws a warning "Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient." when the order keyword is specified. I tried figuring it out without any luck. Good to merge!

@nwlandry
Copy link
Collaborator

One more thing - can you run isort and black?

@maximelucas
Copy link
Collaborator Author

why the A.setdiag(0) statement in adjacency_matrix() throws a warning "Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient." when the order keyword is specified.

From what I've understood, "changing the sparsity structure" here just means we are changing entries of A (the diagonal entries). And csr_array is not the ideal type of sparse array for this, it can be slow, so scipy is letting us know. lil an coo are better for modifying entries. In our case don't think it's a big problem, we're just modifying a few values once. May be worth checking in the future though.

@maximelucas
Copy link
Collaborator Author

Btw, I have a 18k-node hypergraph, and the cluster used to takes hours and crash trying to compute the laplacian. It now takes 5s on my laptop 👀

@nwlandry
Copy link
Collaborator

Btw, I have a 18k-node hypergraph, and the cluster used to takes hours and crash trying to compute the laplacian. It now takes 5s on my laptop 👀

🤯

@maximelucas maximelucas merged commit 5efe433 into main Mar 22, 2023
@maximelucas maximelucas deleted the sparse branch March 22, 2023 21:27
@leotrs
Copy link
Collaborator

leotrs commented Mar 23, 2023

Btw, I have a 18k-node hypergraph, and the cluster used to takes hours and crash trying to compute the laplacian. It now takes 5s on my laptop eyes

This would be nice to add as a regression test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

multiorder laplacian not truly sparse
3 participants