Skip to content

Single Cell and Spatial Data

Orr Ashenberg edited this page Nov 30, 2021 · 5 revisions

The data is stored with the Human Tumor Atlas Network's Data Coordinating Center (HTAN DCC) in their HTAN Data Portal. The different types and levels of RNA-Seq data are described here.

During processing at the Broad Institute, each tumor received its own set of Terra workspaces (https://terra.bio/). Raw sequencing data (bcl files) remained on the Broad premises and are ultimately archived. The Terra workspaces stored data for 3’ scRNA-seq (10x) and full transcript scRNA-seq (SMART-seq2) data. The raw data and processed data for each tumor was kept in two separate workspaces (Google Cloud Storage buckets), which allowed us to grant specific access to user groups that only want either the raw data (fastq workspace) or the processed data (counts and results workspace).

In the raw data workspace, we stored the raw fastq files in a folder named fastq_tumor. The fastq files were generated from the bcl files coming off the sequencers. In the processed data workspace, we store the gene count matrices in a folder named counts_tumor. The count matrices were generated by aligning the fastq files to a reference genome and quantifying each gene’s abundance within a cell. The associated alignment BAM files and QCs were also stored. In the same processed data workspace, we also stored data processing QCs and cell clustering analyses in a folder named results_tumor. Within the raw data and processed data workspaces, data from each sample was stored in its own subfolder.

sampletracking_example.csv samplesheet_cellranger_example.csv make_terra_multipleflowcells_example.py cellranger_mfastq_local_6.0.2.sh

input_cellbender_example.json

input_cumulus_example.json samplesheet_cumulus_example.csv

input_cumulus_cellbender_example.json samplesheet_cellbender_cumulus_example.csv