Skip to content

PALEOMIX v1.3.0

Compare
Choose a tag to compare
@MikkelSchubert MikkelSchubert released this 31 Aug 18:07
· 421 commits to master since this release

PALEOMIX v1.3.0 is a major maintenance release, with the goal of porting PALEOMIX to
Python 3 and to prepare for further work to update and expand the pipeline. A number of
deprecated tools and options have been removed, as has support for very old versions of
tools used by the pipelines.

Existing makefiles are compatible with PALEOMIX 1.3.0 with a few notable exceptions:

  • BAM pipeline support for the GATK Indel Realigner has been removed, and the options
    'RealignedBAM' and 'RawBam' no longer have any effect. These are now simply ignored
    and a "raw" BAM is always produced. The Indel Realigner tool was removed from GATK
    as of GATK4 (released 2018) and continued support is not deemed worthwhile due to the
    minor benefit from running the Indel Realigner.
  • BAM pipeline support for generating PCR duplicate histogram files for use with PreSeq
    has been removed. The option is simply ignored.
  • BAM pipeline support for AdapterRemoval options --pcr1 and --pcr2 has been removed,
    as these options are long deprecated and will be removed from AdapterRemoval. Use the
    --adapter1 and --adapter2 options as described in the BAM pipeline documentation.
  • Phylo pipeline options for BCFTools must be be updated to replace the option invoking
    the consensus caller ("-g") with "-c", or with "-m" for the multiallelic caller.
  • The Phylo pipeline genotyping methods 'Random Sampling' and 'Reference Sequence' are
    no longer supported.

Please open an issue if features or options import to your work have been removed.

Added

  • The BAM and Phylo pipelines print warnings when deprecated/removed options are used
  • A log-file is automatically created if errors are encountered during run-time.

Changed

  • Converted project from Python 2.7 to Python 3.5+.
  • Removed internal copy of pyyaml and added dependency on ruamel.yaml.
  • Command-line output was changed to a simpler, log-log output using coloredlogs.
  • Bumped minimum version requirements for most tools used by the pipelines; minimum
    versions were largely informed by availability in Debian stretch.
  • Changed naming of BAM index files created by the BAM pipeline from 'filename.bai' to
    'filename.bam.bai' in order to match the behavior of standard tools (e.g. samtools).
  • The filenames of input FASTQ files are now used in the intermediate file-structure,
    with the goal of making the pipeline more robust to changes in input files.
  • The pipeline no longer fails if a command generates more files than expected,
    instead this merely triggers a warning.
  • Moved PCR duplicate filtering and rescaling to 'Features' in BAM pipeline makefiles.

Fixed

  • Fixed spurious warnings from pysam (htslib) when opening BAMs without index files.

Removed

  • Removed limited support for 32 bit systems
  • Removed the 'cat' command.
  • Removed the 'duphist' command and the corresponding BAM pipeline feature.
  • Removed the 'ena' command.
  • Removed the 'sample_pileup' command.
  • Removed the 'retable' command. A more performant standalone version can be found at
    https://github.com/MikkelSchubert/retable
  • Removed the bam_pipeline 'remap' command.
  • Removed entry-points other than the 'paleomix' command; that is to say the stand-
    alone 'bam_pipeline', 'phylo_pipeline', etc. commands.
  • Removed data for the original publication of PALEOMIX. The instructions in that
    publication are outdated and cannot be carried out for current versions of PALEOMIX.
  • Removed support for configuration files with per-host sections. Files are now
    assumed to contain only one set of command-line options.
  • Removed --to-dot option for pipelines.
  • Removed keyboard shortcuts for modifying pipeline behavior during runtime.
  • Removed undocumented options from Zonkey.
  • Removed undocumented codeml support from the Phylo pipeline.
  • Removed 'Random Sampling' and 'Reference Sequence' genotyping methods.
  • Removed makefile metadata (filename, hash, mtime) from BAM pipeline summary reports.
  • Removed support for compressing intermediate FASTQ files using bzip2. Reads are now
    always compressed using gzip.
  • Removed ability to merge FASTQ files with the the SplitLanesByFilenames option.
    Files are now always split, meaning that individual FASTQ files or pairs are mapped.
  • Removed support for indel realignment using GATK due to its removal from GATK.
  • Removed creation of FASTA sequence dictionaries as they were only needed by GATK.
  • Removed support for labels for BAM pipeline prefixes.