[r] Add initial support for ragged array writing for Seurat v5 #2523

mojaveazure · 2024-05-07T19:45:04Z

Seurat v5 adds support for ragged arrays, where not every X layer has exactly the same cells and features. To handle these ragged arrays on ingestion, re-indexing the soma join IDs is necessary to pad the X layer to the full domain of the SOMA measurement

Implemented SOMA methods:

write_soma.Assay5(): write a Seurat v5 assay to a SOMA measurement. When writing X layers, if a layer is ragged:
- cast layer to TsparseMatrix for COO representation
- re-index Seurat's character IDs to SOMA join IDs
- re-index COO coordinates to SOMA join IDs
- write array using SOMASparseNDArray$private$.write_coo_dataframe()

Notes:

This PR does not implement alternate matrix (eg. DelayedArray, BPCells) ingestion

SC-46644

resolves #2658

Seurat v5 adds support for ragged arrays, where not every `X` layer has exactly the same cells and features. To handle this, ragged `X` layers need to be re-indexed and re-shaped on ingestion to resize down to only the data present Modified SOMA methods: - `SOMAExperimentAxisQuery$to_seurat()` and `SOMAExperimentAxisQuery$to_seurat_assay()`: now read in as v5 assays New SOMA methods: - `SOMAExperimentAxisQuery$private$.to_seurat_assay_v5()`: helper method to read in ragged and non-ragged arrays into a v5 assay; note this method only handles expression layers, all other assay-level information is handled by parent `$to_seurat_assay()` to share code with v3 assay outgestion Requires #2523 and #3007 [SC-52261](https://app.shortcut.com/tiledb-inc/story/52261/)

apis/r/R/utils.R

johnkerl · 2024-10-23T14:11:27Z

apis/r/R/write_seurat.R

+  fmat <- methods::slot(x, name = 'features')
+
+  # Write `X` matrices
+  for (lyr in SeuratObject::Layers(x)) {


I think spelling out lyr as layer is a reasonable ask

also I think it's a layer_name not a layer -- given key = lyr below

johnkerl · 2024-10-23T14:11:39Z

apis/r/R/write_seurat.R

+
+  # Write `X` matrices
+  for (lyr in SeuratObject::Layers(x)) {
+    ldat <- SeuratObject::LayerData(x, layer = lyr)


Likewise layer_data

apis/r/R/write_seurat.R

johnkerl · 2024-10-23T14:18:23Z

apis/r/tests/testthat/test-SeuratIngest.R

+  expect_identical(setdiff(ms$var$attrnames(), "var_id"), names(rna[[]]))
+  expect_s3_class(ms$X, "SOMACollection")
+  expect_identical(ms$X$names(), SeuratObject::Layers(rna))
+  fmat <- methods::slot(rna, name = "features")


Same as above with spelling out fmat, cmat, lyr -- just a few keystrokes for you -- single-time spend of those seconds! -- with added clarity for every reader forever after

apis/r/R/write_seurat.R

johnkerl

My comments are non-blocking

apis/r/R/write_seurat.R

aaronwolen

Functionally this is great! I really like the new _hint() helpers and the tests appear very comprehensive.

My main ask is to address the significant code duplication between the existing write_soma.Assay() method and the new write_soma.Assay5() method. Both methods share similar logic for creating measurements, writing X matrices, handling feature-level metadata, etc.

This will make the code more maintainable, easier to update, and less likely to create bugs by making changes in one method but not the other.

apis/r/R/write_seurat.R

aaronwolen · 2024-10-24T22:24:13Z

apis/r/R/write_seurat.R

+  parents <- unique(sys.parents())
+  idx <- which(vapply_lgl(
+    parents,
+    FUN = function(i) identical(sys.function(i), write_soma.Seurat)
+  ))
+  shape <- if (length(idx) == 1L) {
+    get("shape", envir = sys.frame(parents[idx]))
+  } else {
+    NULL
+  }


Accessing variables from parent frames seems fragile. Could we just add a shape argument?

I had thought about that, but I didn't want to expose shape in write_soma(); this logic is needed for resume-mode #2405, and more specifically for the tests. This functionality isn't used otherwise, which is why I didn't want to expose shape for write_soma()

PR feedback #2523 (comment)

aaronwolen

Thanks for the updates!

Seurat v5 adds support for ragged arrays, where not every `X` layer has exactly the same cells and features. To handle these ragged arrays on ingestion, re-indexing the soma join IDs is necessary to pad the `X` layer to the full domain of the SOMA measurement Implemented SOMA methods: - `write_soma.Assay5()`: write a Seurat v5 assay to a SOMA measurement. When writing `X` layers, if a layer is ragged: - cast layer to `TsparseMatrix` for COO representation - re-index Seurat's character IDs to SOMA join IDs - re-index COO coordinates to SOMA join IDs - write array using `SOMASparseNDArray$private$.write_coo_dataframe()` Notes: - This PR does not implement alternate matrix (eg. DelayedArray, BPCells) ingestion

Expand tests

[ci skip] Co-authored-by: John Kerl <[email protected]>

[ci skip]

PR feedback #2523 (comment)

function, reduce code duplication

Bump develop version

Seurat v5 adds support for ragged arrays, where not every `X` layer has exactly the same cells and features. To handle this, ragged `X` layers need to be re-indexed and re-shaped on ingestion to resize down to only the data present Modified SOMA methods: - `SOMAExperimentAxisQuery$to_seurat()` and `SOMAExperimentAxisQuery$to_seurat_assay()`: now read in as v5 assays New SOMA methods: - `SOMAExperimentAxisQuery$private$.to_seurat_assay_v5()`: helper method to read in ragged and non-ragged arrays into a v5 assay; note this method only handles expression layers, all other assay-level information is handled by parent `$to_seurat_assay()` to share code with v3 assay outgestion Requires #2523 and #3007 [SC-52261](https://app.shortcut.com/tiledb-inc/story/52261/)

mojaveazure added the r-api label May 7, 2024

mojaveazure force-pushed the paulhoffman/sc-46644/add-support-for-ragged-arrays-in-write-soma branch from b21a806 to 49e4edf Compare May 30, 2024 21:01

mojaveazure force-pushed the paulhoffman/sc-46644/add-support-for-ragged-arrays-in-write-soma branch from 49e4edf to 3f5bca1 Compare July 15, 2024 21:57

mojaveazure force-pushed the paulhoffman/sc-46644/add-support-for-ragged-arrays-in-write-soma branch from 3f5bca1 to c5a48a3 Compare August 1, 2024 19:08

mojaveazure force-pushed the paulhoffman/sc-46644/add-support-for-ragged-arrays-in-write-soma branch 2 times, most recently from 6461bc9 to b692361 Compare August 14, 2024 20:53

mojaveazure force-pushed the paulhoffman/sc-46644/add-support-for-ragged-arrays-in-write-soma branch 3 times, most recently from cb7147e to 86a7be1 Compare September 9, 2024 17:43

mojaveazure force-pushed the paulhoffman/sc-46644/add-support-for-ragged-arrays-in-write-soma branch from 86a7be1 to f80acdc Compare September 16, 2024 14:41

mojaveazure mentioned this pull request Sep 17, 2024

[r] [WIP] Add support for reading v5 assays from an axis query #3008

Open

mojaveazure force-pushed the paulhoffman/sc-46644/add-support-for-ragged-arrays-in-write-soma branch from db6e539 to 9d0357a Compare September 17, 2024 18:02

mojaveazure marked this pull request as ready for review September 17, 2024 18:02

mojaveazure requested review from eddelbuettel, aaronwolen and johnkerl September 17, 2024 18:02

johnkerl removed the request for review from eddelbuettel October 21, 2024 16:08

johnkerl requested changes Oct 23, 2024

View reviewed changes

johnkerl approved these changes Oct 24, 2024

View reviewed changes

apis/r/R/write_seurat.R Outdated Show resolved Hide resolved

mojaveazure force-pushed the paulhoffman/sc-46644/add-support-for-ragged-arrays-in-write-soma branch from ed0f50e to c33ca40 Compare October 24, 2024 18:31

aaronwolen reviewed Oct 24, 2024

View reviewed changes

mojaveazure added a commit that referenced this pull request Oct 25, 2024

Add comment explaining option usage

21bd333

PR feedback #2523 (comment)

mojaveazure added a commit that referenced this pull request Oct 31, 2024

Add comment explaining option usage

7292e8b

PR feedback #2523 (comment)

mojaveazure force-pushed the paulhoffman/sc-46644/add-support-for-ragged-arrays-in-write-soma branch from a2c77d9 to dcf4d8b Compare October 31, 2024 20:07

aaronwolen approved these changes Nov 4, 2024

View reviewed changes

mojaveazure added 4 commits November 4, 2024 10:51

Add metadata hints for v5 writing

4e7eae9

Expand tests

Add new helpers to generate metadata for v5 ingestion

3870fe4

Update tests and docs

acfd400

mojaveazure and others added 9 commits November 4, 2024 10:51

Add support for ragged assays

9e0c84f

Apply suggestions from code review

7c48ae1

[ci skip] Co-authored-by: John Kerl <[email protected]>

Adjust more names

db8b96a

[ci skip]

Use SOMASparseNDArray$.write_coordinates() instead of private methods

8486f0d

Add internal docs for .type_hint()

e0db339

Add comment explaining option usage

babf129

PR feedback #2523 (comment)

Factor out write_soma.Assay() and write_soma.Assay5() into combined

12a2397

function, reduce code duplication

Code review feedback

76ad268

Update changelog

b947cf0

Bump develop version

mojaveazure force-pushed the paulhoffman/sc-46644/add-support-for-ragged-arrays-in-write-soma branch from f42044b to b947cf0 Compare November 4, 2024 15:56

mojaveazure merged commit 54933f8 into main Nov 4, 2024
14 checks passed

mojaveazure deleted the paulhoffman/sc-46644/add-support-for-ragged-arrays-in-write-soma branch November 4, 2024 16:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[r] Add initial support for ragged array writing for Seurat v5 #2523

[r] Add initial support for ragged array writing for Seurat v5 #2523

mojaveazure commented May 7, 2024 •

edited

Loading

johnkerl Oct 23, 2024

johnkerl Oct 23, 2024

johnkerl Oct 23, 2024

johnkerl left a comment

aaronwolen left a comment

aaronwolen Oct 24, 2024

mojaveazure Oct 25, 2024

aaronwolen left a comment

[r] Add initial support for ragged array writing for Seurat v5 #2523

[r] Add initial support for ragged array writing for Seurat v5 #2523

Conversation

mojaveazure commented May 7, 2024 • edited Loading

johnkerl Oct 23, 2024

Choose a reason for hiding this comment

johnkerl Oct 23, 2024

Choose a reason for hiding this comment

johnkerl Oct 23, 2024

Choose a reason for hiding this comment

johnkerl left a comment

Choose a reason for hiding this comment

aaronwolen left a comment

Choose a reason for hiding this comment

aaronwolen Oct 24, 2024

Choose a reason for hiding this comment

mojaveazure Oct 25, 2024

Choose a reason for hiding this comment

aaronwolen left a comment

Choose a reason for hiding this comment

mojaveazure commented May 7, 2024 •

edited

Loading