This repository is based on computations using the FDA HIVE platform.
The High-performance Integrated Virtual Environment (HIVE), is a modern robust suite of software that provides an infrastructure for next-generation sequence (NGS) data analysis co-developed by Food and Drug Administration and George Washington University. The HIVE provides a distributed data retrieval system, archival capabilities, and computational environment architected to manipulate NGS data.
We use a two-step pipeline for this metagenomic analysis; CensuScope and HIVE-hexagon. CensuScope is a census-based tool that randomly samples a user-defined number of reads and BLASTs them against a reference DB. Our reference database (a filtered version of NTdb) is the NCBI Nucleotide db with all of the sequences lacking a clear taxonomic lineage filtered out. All artificial sequences have been removed either by our automated filter or manually, once an artificial sequence is identified during post analysis processing Sequences identified by CensuScope are used as references in Hexagon alignments. HIVE-hexagon, a K-mer based aligner, is more sensitive and faster than current standard alignment algorithms. HIVE-hexagon offers a decrease in computational cost, memory requirement and time for processing. For a full description of these methods please see Baseline human gut microbiota profile in healthy people and standard reporting template