Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[c++, python] fastercsx: improved duplicate coordinate handling #3468

Merged
merged 8 commits into from
Dec 19, 2024

Conversation

bkmartinjr
Copy link
Member

@bkmartinjr bkmartinjr commented Dec 18, 2024

Issue and/or context:

the fastercsx module will generate a malformed scipy sparse matrix under a specific set of conditions not usually seen with SOMA data (matrix contains duplicate coordinates). This PR detects that condition and correctly initializes the SciPy sparse matrix. Tested on scipy==1.15.0r1 for good measure.

Changes:

  • correct dup handling
  • remove a small bit of redundant dead code (from_pjd method)
  • fix sc-61043 - incorrect handling of a multi-chunk coordinate if first chunk is empty

Copy link

codecov bot commented Dec 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.32%. Comparing base (5ddab04) to head (87cf2d3).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3468      +/-   ##
==========================================
+ Coverage   86.27%   86.32%   +0.04%     
==========================================
  Files          55       55              
  Lines        6339     6338       -1     
==========================================
+ Hits         5469     5471       +2     
+ Misses        870      867       -3     
Flag Coverage Δ
python 86.32% <100.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
python_api 86.32% <100.00%> (+0.04%) ⬆️
libtiledbsoma ∅ <ø> (∅)

@bkmartinjr bkmartinjr marked this pull request as ready for review December 18, 2024 22:46
@bkmartinjr bkmartinjr requested a review from johnkerl December 18, 2024 22:47
@johnkerl johnkerl changed the title [C++, python] fastercsx: improved duplicate coordinate handling [c++, python] fastercsx: improved duplicate coordinate handling Dec 18, 2024
Copy link
Member

@johnkerl johnkerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

apis/python/src/tiledbsoma/_fastercsx.py Show resolved Hide resolved
apis/python/src/tiledbsoma/_fastercsx.py Show resolved Hide resolved
apis/python/src/tiledbsoma/_fastercsx.py Outdated Show resolved Hide resolved
@bkmartinjr bkmartinjr marked this pull request as draft December 19, 2024 00:31
@bkmartinjr bkmartinjr marked this pull request as ready for review December 19, 2024 16:24
@bkmartinjr bkmartinjr marked this pull request as draft December 19, 2024 16:55
@bkmartinjr bkmartinjr marked this pull request as ready for review December 19, 2024 18:29
@bkmartinjr bkmartinjr merged commit e9e04e2 into main Dec 19, 2024
25 checks passed
@bkmartinjr bkmartinjr deleted the bkmartinjr/fastercsx-dup-handling branch December 19, 2024 18:32
github-actions bot pushed a commit that referenced this pull request Dec 19, 2024
)

* improved duplicate coordinate handling

* invert sense of dup flag

* fix mishandled empty first fragment in compress_coo

* pr fb

* fix race
johnkerl pushed a commit that referenced this pull request Dec 19, 2024
) (#3485)

* improved duplicate coordinate handling

* invert sense of dup flag

* fix mishandled empty first fragment in compress_coo

* pr fb

* fix race

Co-authored-by: Bruce Martin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants