-
Notifications
You must be signed in to change notification settings - Fork 170
File Definitions
There are several files that may be needed depending on the analysis. These files, as well as, files output by inferCNV are described here.
(REQUIRED, this is the data matrix)
- The input data matrix is expected to be log(TPM+1). If your data is TPM data, use the --transform command line argument and the data will be transformed.
- The file should be tab delimited.
- It is also expected that the matrix will be genes (rows) by cells (columns) and that the gene and cells are labeled.
- Gene names in the expression matrix should match gene names in the genomic positions file.
- Example - Please look at example_expression.txt in the example directory of the download for an example.
(Optional, contains which genes are viewed and their order)
- This is a tab delimited file of 4 columns (gene name, contig/chr, start position, stop position).
- Gene name should match the expression matrix row labels.
- This is used to order the expression data in genomic order.
- Contigs/chr will be ordered by first appearance in this file.
- Example Position File
- Some Genomic Position Files have been generated from common references and made available at TrinityCTAT.
- To generate a Genomic Positions file from a GTF file please use the gtf_to_position_file.py script provided in the src directory.
# By Default use gene_id as the name of your feature
python ./src/gtf_to_position_file.py your_reference.gtf your_gen_pos.txt
# You can change what gtf attribute key is used, here transcript_id is used.
python ./src/gtf_to_position_file.py --attribute_name transcript_id your_reference.gtf your_gen_pos.txt
(This command should work in both Python 2.X and 3.X environments).
(Optional, useful when working with controls/reference files)
- This is a simple text file with the names of the cells that should be treated as references or controls.
- Cell names should be identical to the cell names in the Expression Matrix.
- Cell names should be comma delimited and can be on an arbitrary number of lines.
- Example References File
A directory of output files is generated per run. This output directory can be found in the same location as the output pdf and is named the same name as the output pdf (excluding the extension). Several files are provided in the directory to enable further analysis.
Please let us know if there are other files that would be helpful as you explore your results!
This is the expression matrix after all data manipulation except the last transform for data visualization. The last step of preparing data for visualization allows one to bound measurements (using the --vis_bound_threshold argument). Although helpful in making visualization more vivid in the presence of outliers, this may not be as appropriate for additional analysis. The matrix before this bounding is given here.
All observations and associated measurements as shown in the visualization.
(Optional, only generated when reference cells are indicated)
All references and associated measurements as shown in the visualization.
If groups of observation are generated (for instance, by the --obs_groups argument), the names of samples (cells) in each group of observations are recorded in separate files. The file names indicate the cluster group and the method observations were clustered. "General" indicates clustering using all genomic positions; a contig name indicates clustering just by that contig (see --obs_cluster_contig).
If groups of observations are generated (for instance by the --obs_groups argument), sample (cell) name, cluster membership, and color (shown in the figure) are recorded here.
A newick output of the observation matrix dendrogram so that it can be reconstructed.
- InferCNV Home
- Quick Start
- Installing inferCNV
- Running InferCNV
- Applying Noise Filters
- Predicting CNV via HMM
- Bayesian Mixture Model
- Tumor heterogeneity - define tumor subclusters
- Interpreting the Figure
- Inputs to InferCNV
- Outputs from InferCNV
- More inferCNV example data sets
- Using 10x data
- Interactively navigating data using the Next Generation Heatmap Viewer
- Extracting HMM features
- FAQ and common issues