Skip to content

A python tool box for fast and accurate quality control, conversion and alignment of nanopore sequencing data

License

Notifications You must be signed in to change notification settings

rsemeraro/PyPore

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyPore

Pypore is a python tool box for fast and accurate quality control, conversion and alignment of nanopore sequencing data, in their raw format (Fast5). We developed PyPore as a command-line tool composed by three modules (seqstats, fastqgen and alignment), each provided with a set of specific options. PyPore comes out with a nice interactive result representation function, based on the plotly library, in order to allow user to zoom and pan the result summary getting information related to a specific experimental point.

Requirements

  • HDF5
  • python 2.7
    • biopython
    • numpy
    • h5py
    • plotly
    • python_dateutil
    • ntpath
    • pysam

For Unix/OS X users only

Windows distribution comes out with precompiled samtools and minimap2 versions

Installation

Dependencies

Before proceeding with PyPore installation, check for HDF5 dependencies.

  1. In order to check if HDF5 library is already present, type:
    h5cc -showconfig
    
  2. If you are on OS X system equipped with the HomeBrew package manager, check the available packges list by typing:
    brew list
    
    • If missing, install HDF5 through the HomeBrew Science "tap":

      brew tap homebrew/science
      brew install hdf5
      
  3. Alternatively, if you use a Python distribution, such as Anaconda or Miniconda, installation of HDF5 can be done (for all OS) on the command line via:
    conda install -c anaconda hdf5
    
  4. For Linux or other Unix distributions the HDF5 library can be found in libhdf5-dev package. Make sure that you have the development headers, as they are usually not installed by default.
  5. For Windows users the HDF5 library installer can be downloaded from here.

PyPore

  1. Clone the PyPore repository:
    • PyPore
      git clone --single-branch -b master https://github.com/rsemeraro/PyPore
      
    • PyPore with test data (170Mb)
      git clone --single-branch -b Benchmark https://github.com/rsemeraro/PyPore.git
      
  2. Install as root:
    cd PyPore
    python setup.py install
    

Usage

PyPore consists of the following three modules:

  • seqstats

    seqstats provides an interface to explore the information related to a dataset of Fast5 files (single or multi read fast5) and to, optionally, convert and gather them in FastQ data. The basic syntax, for a set of single read Fast5 files, is:

    pypore seqstats -i Files/Folder -l sample_label
    

    Alternatively, by triggering the --multi_read_fast5/-m argument is it possible to run seqstats on a multi read Fast5 dataset:

    pypore seqstats -i Files/Folder -l sample_label --threads_number 8 --multi_read_fast5 yes
    

    To use seqstats with Albacore outputs (FastQ and summary_file), an albacore summary file is requested (--albacore_summary/-a). By switching to albacore mode, the seqstats input (-i) become the albacore fastq directory.

    pypore seqstats -i FastQFiles/Folder -l sample_label -a /path/to/summary_file.txt --threads_number 8
    

    By means of --fastq/-fq and --threads_number/-n options, it is possible to activate the fastq generation and to use multiple processors to speed up analysis.

    pypore seqstats -i Files/Folder -l sample_label --threads_number 8 --fastq yes
    

    To use seqstats with the test_data, go to the PyPore folder and type:

    pypore seqstats -i test_folder/test_dataset -l my_test -fq yes -n 3
    

    To see all options, type:

    pypore seqstats -h
    

    Interactive Summaries

    Outputs generated by seqstats are: Alt Text sequencing_summary.html Alt Text pore_activity_map.html

  • fastqgen

    fastqgen is a faster alternative to seqstats, for FastQ generation, allowing user to convert data without wasting time in multiple parsing. The basic syntax is:

    pypore fastqgen -i Files/Folder -l sample_label
    

    By means of --threads_number/-n option, it is possible to use multiple processors to speed up conversion.

    pypore fastqgen -i Files/Folder -l sample_label -n 8
    

    To see all options, type:

    pypore fastqgen -h
    
  • alignment

    The last feature of our tool consist of an alignment module based on three state-of-the-art long-read aligners and able to generate an interactive resulting summary. The basic syntax is:

    pypore alignment -i input_1.fastq input_2.fastq -r reference.fasta -l sample_label
    

    As input you can pass a single or multiple fastq, optionally, it is possible to obtain an HTML summary file, by means of argument —-alignment_stats/-s, or/and to customize the aligners list, composed by minimap2(m), bwa(b) and ngmlr(n), removing some of them or editing their execution order —-aligner/-a.

    pypore alignment -i input_1.fastq -r reference.fasta -l sample_label -a b m n -s yes
    

    To see all options, type:

    pypore alignment -h
    

    Interactive Summary

    Alt Text alignment_stats.html

Contacts

This program has been developed by Roberto Semeraro, Department of Experimental and Clinical Medicine, University of Florence

About

A python tool box for fast and accurate quality control, conversion and alignment of nanopore sequencing data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages