-
Notifications
You must be signed in to change notification settings - Fork 170
File Definitions
There are several files that may be needed depending on the analysis. These files, as well as, files output by inferCNV are described here.
InferCNV is compatible with both smart-seq2 and 10x single cell transcriptome data, and presumably other methods (not tested). The counts matrix can be generated using any conventional single cell transcriptome quantification pipeline, yielding a matrix of genes (rows) vs. cells (columns) containing assigned read counts.
The format might look like so:
MGH54_P16_F12 | MGH54_P12_C10 | MGH54_P11_C11 | MGH54_P15_D06 | MGH54_P16_A03 | ... | |
---|---|---|---|---|---|---|
A2M | 0 | 0 | 0 | 0 | 0 | ... |
A4GALT | 0 | 0 | 0 | 0 | 0 | ... |
AAAS | 0 | 37 | 30 | 21 | 0 | ... |
AACS | 0 | 0 | 0 | 0 | 2 | ... |
AADAT | 0 | 0 | 0 | 0 | 0 | ... |
... | ... | ... | ... | ... | ... | ... |
The matrix can be provided as a tab-delimited file. (note, sparse matrices are also supported - see Running-InferCNV)
The sample annotation file is used to define the different cell types, and optionally, indicating how the cells should be grouped according to sample (ie. patient). The format is simply two columns, tab-delimited, and there is no column header.
MGH54_P2_C12 Microglia/Macrophage
MGH36_P6_F03 Microglia/Macrophage
MGH54_P16_F12 Oligodendrocytes (non-malignant)
MGH54_P12_C10 Oligodendrocytes (non-malignant)
MGH36_P1_B02 malignant_MGH36
MGH36_P1_H10 malignant_MGH36
The first column is the cell name, and the 2nd column indicates the known cell type. For the normal cells, if you have different types of known normal cells (ie. immune cells, normal fibroblasts, etc.), you can give an indication as to what the cell type is. Otherwise, you can group them all as 'normal'. If multiple 'normal' types are defined separately, the the expression distribution for normal cells will be explored according to each normal cell grouping, as opposed treating them all as a single normal group. They'll also be clustered and plotted in the heatmap according to normal cell grouping.
The sample (ie. patient) information is encoded in the attribute name as "malignant_{patient}", which allows the tumor cells to be clustered and plotted according to sample (patient) in the heatmap.
Only those cells listed in the sample annotations file will be analyzed by inferCNV. This is useful in case you cells of interest are a subset of the total counts matrix, without needing create a new matrix containing the subset of interest.
The gene ordering file provides the chromosomal location for each gene. The format is tab-delimited and has no column header, simply providing the gene name, chromosome, and gene span:
WASH7P chr1 14363 29806
LINC00115 chr1 761586 762902
NOC2L chr1 879584 894689
MIR200A chr1 1103243 1103332
SDF4 chr1 1152288 1167411
UBE2J2 chr1 1189289 1209265
Every gene in the counts matrix to be analyzed should have the corresponding gene name and location info provided in this gene ordering file.
Note, only those genes that exist in both the counts matrix and the gene ordering file will be included in the inferCNV analysis.
Some Genomic Position Files have been generated from common references and made available at TrinityCTAT.
If you need to construct your own custom genomic positions file, see instructions for creating a genomic position file.
- InferCNV Home
- Quick Start
- Installing inferCNV
- Running InferCNV
- Applying Noise Filters
- Predicting CNV via HMM
- Bayesian Mixture Model
- Tumor heterogeneity - define tumor subclusters
- Interpreting the Figure
- Inputs to InferCNV
- Outputs from InferCNV
- More inferCNV example data sets
- Using 10x data
- Interactively navigating data using the Next Generation Heatmap Viewer
- Extracting HMM features
- FAQ and common issues