Prepare Monocle for CellRanger 3.0 #243
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
10X released a new version of CellRanger that changed the output format of its matrices. They also deprecated the rkit R package, so it will no longer be able to help users load their data into monocle. This pull request makes Monocle compatible with the new version, and avoids having users download a separate R package by moving the data loading functionality of Rkit into monocle, with updates to handle CellRanger 3.0 data.
In particular, the following changes were made to the output file format in 3.0:
Text File Formats - In order to save disk space, the sparse matrix and barcode text files will now be gzipped. As R automatically identifies and correctly reads gzipped files, no changes were needed to account for this other than appending a suffix when necessary. Additionally, in order to account for experiments that have "multimodal" datasets, the gene.tsv will instead become the features.tsv file. This file will contain an additional column describing the type of feature referred to in that row of the matrix.
Feature Data - CellRanger now supports obtaining both feature barcoding (e.g. CRISPR/Antibody/Dextamer) data in addition to standard Gene Expression data.
To replace the functionality of Rkit, this pull request adds a new function to
monocle
called,load_cellranger_data
. It behaves similarly to the old R kit functionload_cellranger_matrix
with a few important distinctions (small name change made to avoid confusion but hint at the strong similarity).1 - It directly returns a
CellDataSet
object. Rather than have the user convert after the fact, it just loads the data directly into this.2 - It ignores Feature Barcoding data - As this is a new feature for CellRanger 3.0, for now it is not loaded into monocle.
3 - It transparently handles v2.0 vs v3.0 data. Although the formats are different, the function detects the version used and loads the data appropriately.
Very small test data files and associated tests were added to verify the expected behavior.