Skip to content
hyattpd edited this page Aug 11, 2014 · 31 revisions

Prodigal is a protein-coding gene prediction software tool for bacterial and archaeal genomes. The acronym stands for PROkaryotic DYnamic Programming Genefinding ALgorithm. Dictionary.com provides several definitions of the word "prodigal". The one the authors wish to invoke is:

3. lavishly abundant; profuse: nature's prodigal resources.

and not the more common meanings (such as wastefulness, or the "Prodigal Son").

History

Prodigal was developed jointly between Oak Ridge National Laboratory and the University of Tennessee-Knoxville in 2007 under the auspices of the Department of Energy Joint Genome Institute. The first paper was published in BMC Bioinformatics in 2010. Since that time, Prodigal has gone on to become one of the most popular microbial gene prediction algorithms in the world. As of August 2014, the publication had been cited more than 600 times. The software has thousands of downloads and is in use in over 50 countries around the world. The National Center for Biotechnology Information includes Prodigal gene predictions at its ftp site for all bacterial and archaeal genomes.

What does Prodigal do?

  • Predicts protein-coding genes: Prodigal provides fast, accurate protein-coding gene predictions in GFF3, Genbank, or Sequin table format.
  • Handles draft genomes and metagenomes: Prodigal runs smoothly on finished genomes, draft genomes, and metagenomes.
  • Runs quickly: Prodigal analyzes the E. coli K-12 genome in 10 seconds on a modern MacBook Pro.
  • Runs unsupervised: Prodigal is an unsupervised machine learning algorithm. It does not need to be provided with any training data, and instead automatically learns the properties of the genome from the sequence itself, including genetic code, RBS motif usage, start codon usage, and coding statistics.
  • Handles gaps, scaffolds, and partial genes: The user can specify how Prodigal should deal with gaps and has numerous options for allowing or forbidding genes to run into or span gaps.
  • Identifies translation initiation sites: Prodigal predicts the correct translation initiation site for most genes, and can output information about every potential start site in the genome, including confidence score, RBS motif, and much more.
  • Outputs detailed summary statistics for each genome: Prodigal makes available many statistics for each genome, including contig length, gene length, GC content, GC skew, RBS motifs used, and start and stop codon usage.

What doesn't Prodigal do?

  • Predict RNA genes: For the time being, Prodigal does not predict RNA genes, although we haven't ruled out adding this capability in a future version.
  • Handle genes with introns: Genes with introns are rare enough that Prodigal doesn't bother trying to find them.
  • Functionally annotate genes: Prodigal does not provide functional annotations for the genes it predicts.
  • Deal with frame shifts: Prodigal does not contain any logic to handle insertions or deletions. These types of sequencing errors will have a deleterious effect on Prodigal's gene predictions.
  • Viral gene prediction: Prodigal has not been tested by the authors on viruses, although it is likely the anonymous mode would work in such cases; however, Prodigal contains no special rules or routines to handle viral genomes.

Prodigal 2.x: Older versions of Prodigal do not contain special rules for handling gaps and scaffolds, do not automatically distinguish between genetic codes 11 and 4, and do not provide summary statistics for each genome.

License

Prodigal is open source and freely available under the GPL.