Pypore is a python tool box for fast and accurate quality control, conversion and alignment of nanopore sequencing data, in their raw format (Fast5). We developed PyPore as a command-line tool composed by three modules (seqstats
, fastqgen
and alignment
), each provided with a set of specific options. PyPore comes out with a nice interactive result representation function, based on the plotly library, in order to allow user to zoom and pan the result summary getting information related to a specific experimental point.
- HDF5
- python 2.7
- biopython
- numpy
- h5py
- plotly
- python_dateutil
- ntpath
- pysam
Before proceeding with PyPore installation, check for HDF5 dependencies.
- In order to check if HDF5 library is already present, type:
h5cc -showconfig
- If you are on OS X system equipped with the HomeBrew package manager, check the available packges list by typing:
brew list
-
If missing, install HDF5 through the HomeBrew Science "tap":
brew tap homebrew/science brew install hdf5
-
- Alternatively, if you use a Python distribution, such as Anaconda or Miniconda, installation of HDF5 can be done (for all OS) on the command line via:
conda install -c anaconda hdf5
- For Linux or other Unix distributions the HDF5 library can be found in
libhdf5-dev
package. Make sure that you have the development headers, as they are usually not installed by default. - For Windows users the HDF5 library installer can be downloaded from here.
- Clone the PyPore repository:
- PyPore
git clone --single-branch -b master https://github.com/rsemeraro/PyPore
- PyPore with test data (170Mb)
git clone --single-branch -b Benchmark https://github.com/rsemeraro/PyPore.git
- PyPore
- Install as root:
cd PyPore python setup.py install
PyPore consists of the following three modules:
-
seqstats
provides an interface to explore the information related to a dataset of Fast5 files (single or multi read fast5) and to, optionally, convert and gather them in FastQ data. The basic syntax, for a set of single read Fast5 files, is:pypore seqstats -i Files/Folder -l sample_label
Alternatively, by triggering the
--multi_read_fast5/-m
argument is it possible to runseqstats
on a multi read Fast5 dataset:pypore seqstats -i Files/Folder -l sample_label --threads_number 8 --multi_read_fast5 yes
To use
seqstats
with Albacore outputs (FastQ and summary_file), an albacore summary file is requested (--albacore_summary/-a
). By switching to albacore mode, theseqstats
input (-i
) become the albacore fastq directory.pypore seqstats -i FastQFiles/Folder -l sample_label -a /path/to/summary_file.txt --threads_number 8
By means of
--fastq/-fq
and--threads_number/-n
options, it is possible to activate the fastq generation and to use multiple processors to speed up analysis.pypore seqstats -i Files/Folder -l sample_label --threads_number 8 --fastq yes
To use
seqstats
with the test_data, go to the PyPore folder and type:pypore seqstats -i test_folder/test_dataset -l my_test -fq yes -n 3
To see all options, type:
pypore seqstats -h
Interactive Summaries
Outputs generated by
seqstats
are: sequencing_summary.html pore_activity_map.html -
fastqgen
is a faster alternative to seqstats, for FastQ generation, allowing user to convert data without wasting time in multiple parsing. The basic syntax is:pypore fastqgen -i Files/Folder -l sample_label
By means of
--threads_number/-n
option, it is possible to use multiple processors to speed up conversion.pypore fastqgen -i Files/Folder -l sample_label -n 8
To see all options, type:
pypore fastqgen -h
-
The last feature of our tool consist of an alignment module based on three state-of-the-art long-read aligners and able to generate an interactive resulting summary. The basic syntax is:
pypore alignment -i input_1.fastq input_2.fastq -r reference.fasta -l sample_label
As input you can pass a single or multiple fastq, optionally, it is possible to obtain an HTML summary file, by means of argument
—-alignment_stats/-s
, or/and to customize the aligners list, composed by minimap2(m
), bwa(b
) and ngmlr(n
), removing some of them or editing their execution order—-aligner/-a
.pypore alignment -i input_1.fastq -r reference.fasta -l sample_label -a b m n -s yes
To see all options, type:
pypore alignment -h
Interactive Summary
This program has been developed by Roberto Semeraro, Department of Experimental and Clinical Medicine, University of Florence