Skip to content
/ cram Public
forked from audy/cram

Comparative, Rapid Analysis of Metagenomes

License

Notifications You must be signed in to change notification settings

ditag/cram

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CRAM

CRAM is a tool developed to analyze data produced from shotgun metagenome sequencing. CRAM was created for people who are familiar with UNIX and command-line interfaces and enables scientists to modify their analysis approach to fit their experiment.

The goals of CRAM are to:

  • Provide a fully open source solution for metagenome analysis
  • Create a codebase that promotes contribution by the community.
  • Allow for analysis of large datasets on commodity hardware.

The default pipeline consists of an assemble, annotate and quantify approach to metagenome analysis. The creation of quantitative data allows for comparison between samples across other variables such as time and space. If this does not suit you, you can easily craft your own approach.

Oh yeah, CRAM and the tools that it is built from are 100% free and open source because science with black boxes is not science.

What is CRAM?

CRAM takes the following approach to metagenome assembly, annotation and analysis:

  1. Quality control (Trimming)
  2. De novo assembly (Velvet)
  3. Open Reading Frame prediction (Prodigal)
  4. ORF annotation (PHMMER/BLAST) using Subsystems (SEED)
  5. ORF coverage detection (SMALT) leading to quantitative measurement of metabolic potential.
  6. Quantitative measurement of community composition by comparison of 16S rRNA genes to the Ribosomal Database Project (RDP) database.

The end result is a matrix containing subsystems and their coverage in the metagenome. Samples can be standardized (by dividing by the total number of reads), and compared. Combined with metadata, this can lead to an understanding of the relationship between the a sample's metabolic potential and environmental factors.

Installation instructions

  1. You need to have Python version between 2.7 and 3.0. To check your version of Python, type:

     $ python --version
    
  2. Download and extract from here.

  3. cd into the CRAM directory and type:

     $ python setup.py install
    

    You should see a bunch of output and "installation complete". This means that Cram is installed. You will also need to download the tools and databases used by CRAM. This can be accomplished by typing make in the same directory. This step takes a while as the databases are quite large.

  4. CRAM should be installed. Check by typing

     $ metacram
    

    You should see:

     ** MetaCRAM **        
         Pipelines:
             Type metacram <name> <directory> to create a new project.
             simple
             illumina
    

    The metacram is used to create metagenome projects. To create a project:

     $ metacram <name of pipeline> <directory for project>
    

    There are two pipelines that come with cram out of the box: Illumina and Simple. Illumina is for paired-end reads and simple is for single-end reads generated by any playform as long as the reads are in fasta/fastq or qseq format.

    Once you have created the project directory with the metacram command, cd into that directory and make a directory called data/.

     $ cd metagenome_projects
     
     $ metacram simple new_project
     
     $ cd new_project
     
     $ mkdir data
     
     $ cp my_raw_reads.fastq data/
    

    For paired end reads, your reads by be called *_left.qseq and *_right.qseq. (The extension doesn't matter as long as it's fastq or qseq). Only the left and right matter. This tells CRAM how the reads are oriented.

    Now that your reads are in the data/ directory, invoke the pipeline by typing

     $ python simple.py # for the simple pipeline
    

    Or

     $ python illumina.py # for the illumina pipeline
    

    You should see things beginning to happen (directories being made, reads trimmed, assemblages assembleD). If, at any point, the pipeline crashes or you stop it, you can resume it by invoking the script again. Cram will pick up where it left off.

LICENSE

BSD (see LICENSE.txt)

About

Comparative, Rapid Analysis of Metagenomes

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 94.8%
  • Makefile 5.2%