Skip to content

File Definitions

Timothy Tickle edited this page Jun 24, 2016 · 27 revisions

There are several files that may be needed depending on the analysis. These files are described here.

Expression Matrix

(REQUIRED, this is the data matrix)

  • The input data matrix is expected to be log(TPM+1). If your data is TPM data, use the --transform command line argument and the data will be transformed.
  • The file should be tab delimited.
  • It is also expected that the matrix will be genes (rows) by cells (columns) and that the gene and cells are labeled.
  • Gene names in the expression matrix should match gene names in the genomic positions file.
  • Example - Please look at example_expression.txt in the example directory of the download for an example.

Genomic Position Files

(Optional, contains which genes are viewed and their order)

  • This is a tab delimited file of 4 columns (gene name, contig/chr, start position, stop position).
  • Gene name should match the expression matrix row labels.
  • This is used to order the expression data in genomic order.
  • Contigs/chr will be ordered by first appearance in this file.
  • Example Position File

Making Genomic Position File

  • Some Genomic Position Files have been generated from common references and made available at TrinityCTAT.
  • To generate a Genomic Positions file from a GTF file please use the gtf_to_position_file.py script provided in the src directory.
python ./src/gtf_to_position_file.py your_reference.gtf your_gen_pos.txt

(This command should work in both Python 2.X and 3.X environments).

References File

(Optional, useful when working with controls/reference files)

  • This is a simple text file with the names of the cells that should be treated as references or controls.
  • Cell names should be identical to the cell names in the Expression Matrix.
  • Cell names should be comma delimited and can be on an arbitrary number of lines.
  • Example References File
Clone this wiki locally