Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare Monocle for CellRanger 3.0 #243

Merged
merged 1 commit into from
Jan 4, 2019

Conversation

evolvedmicrobe
Copy link
Contributor

10X released a new version of CellRanger that changed the output format of its matrices. They also deprecated the rkit R package, so it will no longer be able to help users load their data into monocle. This pull request makes Monocle compatible with the new version, and avoids having users download a separate R package by moving the data loading functionality of Rkit into monocle, with updates to handle CellRanger 3.0 data.

In particular, the following changes were made to the output file format in 3.0:

Text File Formats - In order to save disk space, the sparse matrix and barcode text files will now be gzipped. As R automatically identifies and correctly reads gzipped files, no changes were needed to account for this other than appending a suffix when necessary. Additionally, in order to account for experiments that have "multimodal" datasets, the gene.tsv will instead become the features.tsv file. This file will contain an additional column describing the type of feature referred to in that row of the matrix.

Feature Data - CellRanger now supports obtaining both feature barcoding (e.g. CRISPR/Antibody/Dextamer) data in addition to standard Gene Expression data.

To replace the functionality of Rkit, this pull request adds a new function to monocle called, load_cellranger_data. It behaves similarly to the old R kit function load_cellranger_matrix with a few important distinctions (small name change made to avoid confusion but hint at the strong similarity).

1 - It directly returns a CellDataSet object. Rather than have the user convert after the fact, it just loads the data directly into this.

2 - It ignores Feature Barcoding data - As this is a new feature for CellRanger 3.0, for now it is not loaded into monocle.

3 - It transparently handles v2.0 vs v3.0 data. Although the formats are different, the function detects the version used and loads the data appropriately.

Very small test data files and associated tests were added to verify the expected behavior.

10X is released a new version of CellRanger that is changed the output format of its matrices.  They also deprecated the rkit R package, so it will no longer be able to help users load their data into monocle.  This pull request makes Monocle compatible with the new version, and avoids having users download a separate R package by moving the data loading functionality of Rkit into monocle, with updates to handle CellRanger 3.0 data.

In particular, the following changes were made to the output file format in 3.0:

**Text File Formats** - In order to save disk space, the sparse matrix and barcode text files will now be gzipped. As R automatically identifies and correctly reads gzipped files, no changes were needed to account for this other than appending a suffix when necessary. Additionally, in order to account for experiments that have "multimodal" datasets, the gene.tsv will instead become the features.tsv file. This file will contain an additional column describing the type of feature referred to in that row of the matrix.

**Feature Data** - CellRanger now supports obtaining both feature barcoding (e.g. CRISPR/Antibody/Dextamer) data in addition to standard Gene Expression data.

To replace the functionality of Rkit, this pull request adds a new function to `monocle` called, `load_cellranger_data`.  It behaves similarly to the old R kit function `load_cellranger_matrix` with a few important distinctions (small name change made to avoid confusion but hint at the strong similarity).

1 - **It directly returns a `CellDataSet` object.**  Rather than have the user convert after the fact, it just loads the data directly into this.

2 - **It ignores Feature Barcoding data** - As this is a new feature for CellRanger 3.0, for now it is not loaded into monocle.

3 - **It transparently handles v2.0 vs v3.0 data.**  Although the formats are different, the function detects the version used and loads the data appropriately.

Very small test data files and associated tests were added to verify the expected behavior.
@ctrapnell ctrapnell merged commit 7df1050 into cole-trapnell-lab:master Jan 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants