Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[r] Add initial support for ragged array writing for Seurat v5 #2523

Conversation

mojaveazure
Copy link
Member

@mojaveazure mojaveazure commented May 7, 2024

Seurat v5 adds support for ragged arrays, where not every X layer has exactly the same cells and features. To handle these ragged arrays on ingestion, re-indexing the soma join IDs is necessary to pad the X layer to the full domain of the SOMA measurement

Implemented SOMA methods:

  • write_soma.Assay5(): write a Seurat v5 assay to a SOMA measurement. When writing X layers, if a layer is ragged:
    • cast layer to TsparseMatrix for COO representation
    • re-index Seurat's character IDs to SOMA join IDs
    • re-index COO coordinates to SOMA join IDs
    • write array using SOMASparseNDArray$private$.write_coo_dataframe()

Notes:

  • This PR does not implement alternate matrix (eg. DelayedArray, BPCells) ingestion

SC-46644

resolves #2658

@mojaveazure mojaveazure force-pushed the paulhoffman/sc-46644/add-support-for-ragged-arrays-in-write-soma branch from b21a806 to 49e4edf Compare May 30, 2024 21:01
@mojaveazure mojaveazure force-pushed the paulhoffman/sc-46644/add-support-for-ragged-arrays-in-write-soma branch from 49e4edf to 3f5bca1 Compare July 15, 2024 21:57
@mojaveazure mojaveazure force-pushed the paulhoffman/sc-46644/add-support-for-ragged-arrays-in-write-soma branch from 3f5bca1 to c5a48a3 Compare August 1, 2024 19:08
@mojaveazure mojaveazure force-pushed the paulhoffman/sc-46644/add-support-for-ragged-arrays-in-write-soma branch 2 times, most recently from 6461bc9 to b692361 Compare August 14, 2024 20:53
@mojaveazure mojaveazure force-pushed the paulhoffman/sc-46644/add-support-for-ragged-arrays-in-write-soma branch 3 times, most recently from cb7147e to 86a7be1 Compare September 9, 2024 17:43
@mojaveazure mojaveazure force-pushed the paulhoffman/sc-46644/add-support-for-ragged-arrays-in-write-soma branch from 86a7be1 to f80acdc Compare September 16, 2024 14:41
mojaveazure added a commit that referenced this pull request Sep 17, 2024
Seurat v5 adds support for ragged arrays, where not every `X` layer has
exactly the same cells and features. To handle this, ragged `X` layers
need to be re-indexed and re-shaped on ingestion to resize down to only
the data present

Modified SOMA methods:
 - `SOMAExperimentAxisQuery$to_seurat()` and
   `SOMAExperimentAxisQuery$to_seurat_assay()`: now read in as v5 assays

New SOMA methods:
 - `SOMAExperimentAxisQuery$private$.to_seurat_assay_v5()`: helper
   method to read in ragged and non-ragged arrays into a v5 assay; note
   this method only handles expression layers, all other assay-level
   information is handled by parent `$to_seurat_assay()` to share code
   with v3 assay outgestion

Requires #2523 and #3007

[SC-52261](https://app.shortcut.com/tiledb-inc/story/52261/)
@mojaveazure mojaveazure force-pushed the paulhoffman/sc-46644/add-support-for-ragged-arrays-in-write-soma branch from db6e539 to 9d0357a Compare September 17, 2024 18:02
@mojaveazure mojaveazure marked this pull request as ready for review September 17, 2024 18:02
@johnkerl johnkerl removed the request for review from eddelbuettel October 21, 2024 16:08
apis/r/R/utils.R Show resolved Hide resolved
apis/r/R/utils.R Show resolved Hide resolved
fmat <- methods::slot(x, name = 'features')

# Write `X` matrices
for (lyr in SeuratObject::Layers(x)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think spelling out lyr as layer is a reasonable ask

also I think it's a layer_name not a layer -- given key = lyr below


# Write `X` matrices
for (lyr in SeuratObject::Layers(x)) {
ldat <- SeuratObject::LayerData(x, layer = lyr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise layer_data

apis/r/R/write_seurat.R Outdated Show resolved Hide resolved
apis/r/R/write_seurat.R Outdated Show resolved Hide resolved
apis/r/R/write_seurat.R Outdated Show resolved Hide resolved
apis/r/R/write_seurat.R Outdated Show resolved Hide resolved
expect_identical(setdiff(ms$var$attrnames(), "var_id"), names(rna[[]]))
expect_s3_class(ms$X, "SOMACollection")
expect_identical(ms$X$names(), SeuratObject::Layers(rna))
fmat <- methods::slot(rna, name = "features")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above with spelling out fmat, cmat, lyr -- just a few keystrokes for you -- single-time spend of those seconds! -- with added clarity for every reader forever after

apis/r/R/write_seurat.R Show resolved Hide resolved
Copy link
Member

@johnkerl johnkerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments are non-blocking

apis/r/R/write_seurat.R Outdated Show resolved Hide resolved
@mojaveazure mojaveazure force-pushed the paulhoffman/sc-46644/add-support-for-ragged-arrays-in-write-soma branch from ed0f50e to c33ca40 Compare October 24, 2024 18:31
Copy link
Member

@aaronwolen aaronwolen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionally this is great! I really like the new _hint() helpers and the tests appear very comprehensive.

My main ask is to address the significant code duplication between the existing write_soma.Assay() method and the new write_soma.Assay5() method. Both methods share similar logic for creating measurements, writing X matrices, handling feature-level metadata, etc.

This will make the code more maintainable, easier to update, and less likely to create bugs by making changes in one method but not the other.

apis/r/R/write_seurat.R Show resolved Hide resolved
Comment on lines 267 to 276
parents <- unique(sys.parents())
idx <- which(vapply_lgl(
parents,
FUN = function(i) identical(sys.function(i), write_soma.Seurat)
))
shape <- if (length(idx) == 1L) {
get("shape", envir = sys.frame(parents[idx]))
} else {
NULL
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accessing variables from parent frames seems fragile. Could we just add a shape argument?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had thought about that, but I didn't want to expose shape in write_soma(); this logic is needed for resume-mode #2405, and more specifically for the tests. This functionality isn't used otherwise, which is why I didn't want to expose shape for write_soma()

mojaveazure added a commit that referenced this pull request Oct 25, 2024
mojaveazure added a commit that referenced this pull request Oct 31, 2024
@mojaveazure mojaveazure force-pushed the paulhoffman/sc-46644/add-support-for-ragged-arrays-in-write-soma branch from a2c77d9 to dcf4d8b Compare October 31, 2024 20:07
Copy link
Member

@aaronwolen aaronwolen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates!

Seurat v5 adds support for ragged arrays, where not every `X` layer has
exactly the same cells and features. To handle these ragged arrays on
ingestion, re-indexing the soma join IDs is necessary to pad the `X`
layer to the full domain of the SOMA measurement

Implemented SOMA methods:
- `write_soma.Assay5()`: write a Seurat v5 assay to a SOMA measurement.
  When writing `X` layers, if a layer is ragged:
  - cast layer to `TsparseMatrix` for COO representation
  - re-index Seurat's character IDs to SOMA join IDs
  - re-index COO coordinates to SOMA join IDs
  - write array using `SOMASparseNDArray$private$.write_coo_dataframe()`

Notes:
 - This PR does not implement alternate matrix (eg. DelayedArray,
   BPCells) ingestion
@mojaveazure mojaveazure force-pushed the paulhoffman/sc-46644/add-support-for-ragged-arrays-in-write-soma branch from f42044b to b947cf0 Compare November 4, 2024 15:56
@mojaveazure mojaveazure merged commit 54933f8 into main Nov 4, 2024
14 checks passed
@mojaveazure mojaveazure deleted the paulhoffman/sc-46644/add-support-for-ragged-arrays-in-write-soma branch November 4, 2024 16:21
mojaveazure added a commit that referenced this pull request Nov 5, 2024
Seurat v5 adds support for ragged arrays, where not every `X` layer has
exactly the same cells and features. To handle this, ragged `X` layers
need to be re-indexed and re-shaped on ingestion to resize down to only
the data present

Modified SOMA methods:
 - `SOMAExperimentAxisQuery$to_seurat()` and
   `SOMAExperimentAxisQuery$to_seurat_assay()`: now read in as v5 assays

New SOMA methods:
 - `SOMAExperimentAxisQuery$private$.to_seurat_assay_v5()`: helper
   method to read in ragged and non-ragged arrays into a v5 assay; note
   this method only handles expression layers, all other assay-level
   information is handled by parent `$to_seurat_assay()` to share code
   with v3 assay outgestion

Requires #2523 and #3007

[SC-52261](https://app.shortcut.com/tiledb-inc/story/52261/)
mojaveazure added a commit that referenced this pull request Jan 16, 2025
Seurat v5 adds support for ragged arrays, where not every `X` layer has
exactly the same cells and features. To handle this, ragged `X` layers
need to be re-indexed and re-shaped on ingestion to resize down to only
the data present

Modified SOMA methods:
 - `SOMAExperimentAxisQuery$to_seurat()` and
   `SOMAExperimentAxisQuery$to_seurat_assay()`: now read in as v5 assays

New SOMA methods:
 - `SOMAExperimentAxisQuery$private$.to_seurat_assay_v5()`: helper
   method to read in ragged and non-ragged arrays into a v5 assay; note
   this method only handles expression layers, all other assay-level
   information is handled by parent `$to_seurat_assay()` to share code
   with v3 assay outgestion

Requires #2523 and #3007

[SC-52261](https://app.shortcut.com/tiledb-inc/story/52261/)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[r] Add support for ragged arrays in write_soma()
3 participants