Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python/c++] COO to CSX conversion optimization #3304

Merged
merged 49 commits into from
Nov 19, 2024
Merged

Conversation

bkmartinjr
Copy link
Member

@bkmartinjr bkmartinjr commented Nov 7, 2024

Issue and/or context:

Using scipy.sparse.spmatrix for conversion from SOMA Arrow tables to SciPy csr_matrix is single-threaded and requires contiguous COO input arrays. This PR:

  • does the conversion from COO -> CSC/CSR without requiring the extra copy to concat chunked arrays (roughly halving the memory used for this operation).
  • parallelizes the conversion - reduces wall-clock time (at the cost of extra total CPU time)

All API/modules are for internal use. There is no external API surface change.

Changes:

  • C++ package containing low-level operations. Includes vendoring "span" and "parallel_for" from TileDB repo
  • Python module and pybind-ings containing "friendlier" API for internal python code
  • Wired into soma.io.to_anndata and the blockwise() reader
  • C++ compile flags for Python bindings: added -O3 -mavx2 to setup.py
  • Modify R/Python GHA workflow to match latest build practices and remove legacy steps

Later work will wire into additional code paths.

Issue: #3345 [sc-59595]

Copy link

codecov bot commented Nov 7, 2024

Codecov Report

Attention: Patch coverage is 98.40000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 85.52%. Comparing base (ddc72c1) to head (ba3f69e).
Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3304      +/-   ##
==========================================
+ Coverage   85.14%   85.52%   +0.37%     
==========================================
  Files          53       54       +1     
  Lines        5568     5686     +118     
==========================================
+ Hits         4741     4863     +122     
+ Misses        827      823       -4     
Flag Coverage Δ
python 85.52% <98.40%> (+0.37%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
python_api 85.52% <98.40%> (+0.37%) ⬆️
libtiledbsoma ∅ <ø> (∅)
---- 🚨 Try these New Features:

Copy link
Member

@johnkerl johnkerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm partway through but need to be AFK. Poking the submit button on where I am now, as a checkpoint.

apis/python/setup.py Show resolved Hide resolved
apis/python/setup.py Outdated Show resolved Hide resolved
apis/python/src/tiledbsoma/_fastercsx.py Outdated Show resolved Hide resolved
apis/python/src/tiledbsoma/_fastercsx.py Outdated Show resolved Hide resolved
apis/python/src/tiledbsoma/_fastercsx.py Outdated Show resolved Hide resolved
apis/python/src/tiledbsoma/_fastercsx.py Outdated Show resolved Hide resolved
apis/python/src/tiledbsoma/_fastercsx.py Show resolved Hide resolved
@bkmartinjr bkmartinjr marked this pull request as ready for review November 15, 2024 21:17
@bkmartinjr bkmartinjr requested a review from nguyenv November 15, 2024 21:17
@johnkerl johnkerl changed the title COO to CSX conversion optimization [python/c++] COO to CSX conversion optimization Nov 15, 2024
apis/python/src/tiledbsoma/fastercsx.cc Show resolved Hide resolved
apis/python/src/tiledbsoma/fastercsx.cc Outdated Show resolved Hide resolved
apis/python/src/tiledbsoma/fastercsx.cc Outdated Show resolved Hide resolved
apis/python/src/tiledbsoma/fastercsx.cc Outdated Show resolved Hide resolved
apis/python/src/tiledbsoma/fastercsx.cc Outdated Show resolved Hide resolved
libtiledbsoma/src/utils/fastercsx.h Outdated Show resolved Hide resolved
libtiledbsoma/src/utils/fastercsx.h Outdated Show resolved Hide resolved
libtiledbsoma/src/utils/fastercsx.h Outdated Show resolved Hide resolved
libtiledbsoma/src/utils/fastercsx.h Outdated Show resolved Hide resolved
@@ -0,0 +1,406 @@
/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is simply a copy from core of known-good code -- if so I won't even peek at it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct - vendored, as noted in the PR description

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my less-than-thorough read of the PR description!

Copy link
Member

@johnkerl johnkerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see #3328 (review) for narrative.

Thanks @bkmartinjr !

And as on #3328 -- cc @nguyenv for more :eyes.

libtiledbsoma/src/utils/fastercsx.h Outdated Show resolved Hide resolved
libtiledbsoma/src/utils/fastercsx.h Outdated Show resolved Hide resolved
Copy link
Member

@nguyenv nguyenv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Some minor suggestions

apis/python/src/tiledbsoma/_fastercsx.py Outdated Show resolved Hide resolved
libtiledbsoma/src/utils/fastercsx.h Show resolved Hide resolved
libtiledbsoma/src/utils/fastercsx.h Outdated Show resolved Hide resolved
apis/python/src/tiledbsoma/_fastercsx.py Show resolved Hide resolved
apis/python/src/tiledbsoma/_read_iters.py Outdated Show resolved Hide resolved
@bkmartinjr bkmartinjr merged commit d4e5a42 into main Nov 19, 2024
25 checks passed
@bkmartinjr bkmartinjr deleted the bkmartinjr/fastercsx branch November 19, 2024 00:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants