-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[r] Add initial support for ragged array writing for Seurat v5 #2523
[r] Add initial support for ragged array writing for Seurat v5 #2523
Conversation
b21a806
to
49e4edf
Compare
49e4edf
to
3f5bca1
Compare
3f5bca1
to
c5a48a3
Compare
6461bc9
to
b692361
Compare
cb7147e
to
86a7be1
Compare
86a7be1
to
f80acdc
Compare
Seurat v5 adds support for ragged arrays, where not every `X` layer has exactly the same cells and features. To handle this, ragged `X` layers need to be re-indexed and re-shaped on ingestion to resize down to only the data present Modified SOMA methods: - `SOMAExperimentAxisQuery$to_seurat()` and `SOMAExperimentAxisQuery$to_seurat_assay()`: now read in as v5 assays New SOMA methods: - `SOMAExperimentAxisQuery$private$.to_seurat_assay_v5()`: helper method to read in ragged and non-ragged arrays into a v5 assay; note this method only handles expression layers, all other assay-level information is handled by parent `$to_seurat_assay()` to share code with v3 assay outgestion Requires #2523 and #3007 [SC-52261](https://app.shortcut.com/tiledb-inc/story/52261/)
db6e539
to
9d0357a
Compare
apis/r/R/write_seurat.R
Outdated
fmat <- methods::slot(x, name = 'features') | ||
|
||
# Write `X` matrices | ||
for (lyr in SeuratObject::Layers(x)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think spelling out lyr
as layer
is a reasonable ask
also I think it's a layer_name
not a layer
-- given key = lyr
below
apis/r/R/write_seurat.R
Outdated
|
||
# Write `X` matrices | ||
for (lyr in SeuratObject::Layers(x)) { | ||
ldat <- SeuratObject::LayerData(x, layer = lyr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise layer_data
expect_identical(setdiff(ms$var$attrnames(), "var_id"), names(rna[[]])) | ||
expect_s3_class(ms$X, "SOMACollection") | ||
expect_identical(ms$X$names(), SeuratObject::Layers(rna)) | ||
fmat <- methods::slot(rna, name = "features") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above with spelling out fmat
, cmat
, lyr
-- just a few keystrokes for you -- single-time spend of those seconds! -- with added clarity for every reader forever after
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comments are non-blocking
ed0f50e
to
c33ca40
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functionally this is great! I really like the new _hint()
helpers and the tests appear very comprehensive.
My main ask is to address the significant code duplication between the existing write_soma.Assay()
method and the new write_soma.Assay5()
method. Both methods share similar logic for creating measurements, writing X matrices, handling feature-level metadata, etc.
This will make the code more maintainable, easier to update, and less likely to create bugs by making changes in one method but not the other.
apis/r/R/write_seurat.R
Outdated
parents <- unique(sys.parents()) | ||
idx <- which(vapply_lgl( | ||
parents, | ||
FUN = function(i) identical(sys.function(i), write_soma.Seurat) | ||
)) | ||
shape <- if (length(idx) == 1L) { | ||
get("shape", envir = sys.frame(parents[idx])) | ||
} else { | ||
NULL | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accessing variables from parent frames seems fragile. Could we just add a shape
argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had thought about that, but I didn't want to expose shape
in write_soma()
; this logic is needed for resume-mode #2405, and more specifically for the tests. This functionality isn't used otherwise, which is why I didn't want to expose shape
for write_soma()
a2c77d9
to
dcf4d8b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates!
Seurat v5 adds support for ragged arrays, where not every `X` layer has exactly the same cells and features. To handle these ragged arrays on ingestion, re-indexing the soma join IDs is necessary to pad the `X` layer to the full domain of the SOMA measurement Implemented SOMA methods: - `write_soma.Assay5()`: write a Seurat v5 assay to a SOMA measurement. When writing `X` layers, if a layer is ragged: - cast layer to `TsparseMatrix` for COO representation - re-index Seurat's character IDs to SOMA join IDs - re-index COO coordinates to SOMA join IDs - write array using `SOMASparseNDArray$private$.write_coo_dataframe()` Notes: - This PR does not implement alternate matrix (eg. DelayedArray, BPCells) ingestion
Expand tests
[ci skip] Co-authored-by: John Kerl <[email protected]>
[ci skip]
function, reduce code duplication
Bump develop version
f42044b
to
b947cf0
Compare
Seurat v5 adds support for ragged arrays, where not every `X` layer has exactly the same cells and features. To handle this, ragged `X` layers need to be re-indexed and re-shaped on ingestion to resize down to only the data present Modified SOMA methods: - `SOMAExperimentAxisQuery$to_seurat()` and `SOMAExperimentAxisQuery$to_seurat_assay()`: now read in as v5 assays New SOMA methods: - `SOMAExperimentAxisQuery$private$.to_seurat_assay_v5()`: helper method to read in ragged and non-ragged arrays into a v5 assay; note this method only handles expression layers, all other assay-level information is handled by parent `$to_seurat_assay()` to share code with v3 assay outgestion Requires #2523 and #3007 [SC-52261](https://app.shortcut.com/tiledb-inc/story/52261/)
Seurat v5 adds support for ragged arrays, where not every `X` layer has exactly the same cells and features. To handle this, ragged `X` layers need to be re-indexed and re-shaped on ingestion to resize down to only the data present Modified SOMA methods: - `SOMAExperimentAxisQuery$to_seurat()` and `SOMAExperimentAxisQuery$to_seurat_assay()`: now read in as v5 assays New SOMA methods: - `SOMAExperimentAxisQuery$private$.to_seurat_assay_v5()`: helper method to read in ragged and non-ragged arrays into a v5 assay; note this method only handles expression layers, all other assay-level information is handled by parent `$to_seurat_assay()` to share code with v3 assay outgestion Requires #2523 and #3007 [SC-52261](https://app.shortcut.com/tiledb-inc/story/52261/)
Seurat v5 adds support for ragged arrays, where not every
X
layer has exactly the same cells and features. To handle these ragged arrays on ingestion, re-indexing the soma join IDs is necessary to pad theX
layer to the full domain of the SOMA measurementImplemented SOMA methods:
write_soma.Assay5()
: write a Seurat v5 assay to a SOMA measurement. When writingX
layers, if a layer is ragged:TsparseMatrix
for COO representationSOMASparseNDArray$private$.write_coo_dataframe()
Notes:
SC-46644
resolves #2658