Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify how obs and var names handled in ZarrAnnData #5

Merged
merged 16 commits into from
Nov 5, 2024

Conversation

Artur-man
Copy link
Collaborator

@Artur-man Artur-man commented Nov 1, 2024

This PR is similar to the one in scverse#171 and includes other changes in scverse/anndataR@main too.

Some tests are currently still failing but starting this PR to check em out! @keller-mark also feel free to review!

UPDATE: We commited some test and _index fixes.

rcannood and others added 15 commits June 27, 2024 21:29
* re-enable matrices with NAs tests in X and layers

* one more

* Ensure that matrices are never written as nullables
Take care when using reticulate for testing: before writing, convert NA to NaN

* Fix Windows writing NA error

* Remove commented code since no longer needed

* remove commented code (no longer needed)

---------

Co-authored-by: Louise Deconinck <[email protected]>
* Update write_h5ad_categorical

* fix styling

* Update write_h5ad_categorical

* Adjust H5AD categorical write test

* Add write_h5ad_attributes function

Replace repeated code in individual writers

* ignore cyclomatic complexity warning for `write_h5ad_element` warning

* formatting changes

* in write_h5ad_attributes, allow file to be an open hdf5 file

* Add write_h5ad_boolean_attribute()

* Add write_h5ad_boolean_array()

Helper function for writing ENUM boolean arrays

* Remove compression argument from write_h5ad_boolean_array

Don't think it is possible to write compressed data using the workaround
and ENUM format should be fairly space efficient anyway

* Correctly read categorical levels from H5AD

Fixes array when there are more levels than values

* Fix writing scalar H5AD attributes

Correctly check the is_scalra argument

* add lintr exceptions

* fix nolint

---------

Co-authored-by: Robrecht Cannoodt <[email protected]>
* port rownames-related changes from scverse#166 and scverse#169

* run styler

* fix test

* style

* style

* fix docs

* fix documentation

* simplify helper functions

* simplify test

* add more documentation to AnnData

* fix docs
* Update write_h5ad_categorical

* fix styling

* Update write_h5ad_categorical

* Adjust H5AD categorical write test

* Add write_h5ad_attributes function

Replace repeated code in individual writers

* ignore cyclomatic complexity warning for `write_h5ad_element` warning

* formatting changes

* in write_h5ad_attributes, allow file to be an open hdf5 file

* wip

* wip

* substitute mentions of rhdf5 with hdf5r

* strip obs_names and var_names from framework

* update

* fix tests and finalize

* remove mentions of obs_names and var_names in the constructor

* make sure filenames are always unique

* add mode to various functions

* manually close anndatas in tests (where needed)

* only close when pointer is valid

* move match

* use $close() instead of $close_all()

* switch to different branch

* simplify test

* gc afterclosing the adata in write_h5ad

* guess the dtype and the space

* update docs

* use hhoeflin's remote

* bugfix in hdf5r has been released

* update: nevermind, the fix wasn't included in the release yet

* minor fixes

* bump version number

* remove remotes

* remove references to rhdf5

* fix attributes

* style

* fix write h5ad helpers

* fix unit tests

* fix linting issues

* move hdf5 helpers

* reuse existing functionality

* add test (this seems to have been fixed at some point)

* improve guessing of dtype when storing a logical vector

* fix styling

* reenable more tests

---------

Co-authored-by: Luke Zappia <[email protected]>
* Tidy user interface

Co-authored-by: Luke Zappia <[email protected]>

* Update docs

Co-authored-by: Luke Zappia <[email protected]>

* update docs

* fix linting issues

---------

Co-authored-by: Luke Zappia <[email protected]>
@Artur-man
Copy link
Collaborator Author

I think I understand why you commented out this test, since read_*_mapping returns lists in different orders right :D

# test_that("reading mappings is same for h5ad and zarr", {
# mapping_h5ad <- read_h5ad_mapping(file, "uns")
# mapping_zarr <- read_zarr_mapping(store, "uns")
# expect_equal(mapping_h5ad, mapping_zarr)
# })

@Artur-man
Copy link
Collaborator Author

I would say one way to go around this problem is the define additional name indexing for zarr stores, to capture the initial ordering of groups/arrays within h5?

@Artur-man
Copy link
Collaborator Author

Artur-man commented Nov 2, 2024

I fixed a lot of tests and indexing bugs, all tests are now working and I have even replaced example.zarr with one that works great ... @keller-mark I would also love your input!

@keller-mark
Copy link
Owner

Thank you very much for the improvements/ bug fixes and getting it up to date with the original repo. All of the changes look good to me and the tests are passing locally, I will merge this and then open a PR into the original scverse repo.

@keller-mark keller-mark merged commit 4a7d4c2 into keller-mark:keller-mark/zarr Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants