Skip to content

Commit

Permalink
Merge branch 'release/5.1.0'
Browse files Browse the repository at this point in the history
  • Loading branch information
keiranmraine committed Jun 20, 2020
2 parents d0bb91b + 4aaf081 commit a2bcb9d
Show file tree
Hide file tree
Showing 18 changed files with 327 additions and 677 deletions.
1 change: 1 addition & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@
/docs.tar.gz
/setup.sh
/prerelease.sh
/blib
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ examples/cgp_gnos_pull.ini
/c/c_tests/tests_log
/bin/diff_bams
/bin/reheadSQ
/bin/mismatchQc
/bin/mmFlagModifier
/c/c_tests/01_bam_stats_output_tests
/c/c_tests/02_bam_access_tests
/c/c_tests/03_bam_stats_calcs_tests
9 changes: 9 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
# CHANGES

## 5.1.0

* Base image updated to Focal (Ubuntu 20.04).
* Majority of biobambam2 replaced with samtools functions.
* Reads undergo full collate when mapping from BAM/CRAM (bwa-mem2 prep).
* Duplicate marking `samtools markdup --mode` options exposed to `bwa_mem.pl`.
* Lanes mapped with earlier versions of PCAP-core cannot be merged without reporocessing to add "mate score tag" via `samtools fixmate`.
* Scramble option for `bwa_mem.pl` deprecated, relevant option for fast CRAM random access exposed.

## 5.0.5

* Add `noindex` commandline flag to `merge_or_mark.pl` for bammerge calls. Only permitted alongisde `qnamesort`
Expand Down
65 changes: 36 additions & 29 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,38 +1,42 @@
FROM quay.io/wtsicgp/cgpbigwig:1.1.0 as builder
FROM quay.io/wtsicgp/cgpbigwig:1.3.0 as builder

USER root

ARG BBB2_URL="https://gitlab.com/german.tischler/biobambam2/uploads/178774a8ece96d2201fcd0b5249884c7/biobambam2-2.0.146-release-20191030105216-x86_64-linux-gnu.tar.xz"
# ALL tool versions used by opt-build.sh
# need to keep in sync with setup.sh

# newer gitlab versions do not work
ARG BBB2_URL="https://github.com/gt1/biobambam2/releases/download/2.0.87-release-20180301132713/biobambam2-2.0.87-release-20180301132713-x86_64-etch-linux-gnu.tar.gz"
ARG BWAMEM2_URL="https://github.com/bwa-mem2/bwa-mem2/releases/download/v2.0pre2/bwa-mem2-2.0pre2_x64-linux.tar.bz2"
ARG STADEN="https://iweb.dl.sourceforge.net/project/staden/staden/2.0.0b11/staden-2.0.0b11-2016-linux-x86_64.tar.gz"
ARG VER_BIODBHTS="3.01"
ARG VER_BWA="v0.7.17"
ARG VER_HTSLIB="1.9"
ARG VER_SAMTOOLS="1.9"
ARG VER_HTSLIB="1.10.2"
ARG VER_SAMTOOLS="1.10"

RUN apt-get -yq update
RUN apt-get install -yq --no-install-recommends\
apt-transport-https\
locales\
curl\
ca-certificates\
libperlio-gzip-perl\
make\
bzip2\
gcc\
psmisc\
time\
zlib1g-dev\
libbz2-dev\
liblzma-dev\
libcurl4-gnutls-dev\
libncurses5-dev\
nettle-dev\
libp11-kit-dev\
libtasn1-dev\
libgnutls-dev\
libgd-dev\
libdb-dev
RUN apt-get install -yq --no-install-recommends apt-transport-https
RUN apt-get install -yq --no-install-recommends locales
RUN apt-get install -yq --no-install-recommends curl
RUN apt-get install -yq --no-install-recommends ca-certificates
RUN apt-get install -yq --no-install-recommends libperlio-gzip-perl
RUN apt-get install -yq --no-install-recommends make
RUN apt-get install -yq --no-install-recommends bzip2
RUN apt-get install -yq --no-install-recommends gcc
RUN apt-get install -yq --no-install-recommends psmisc
RUN apt-get install -yq --no-install-recommends time
RUN apt-get install -yq --no-install-recommends zlib1g-dev
RUN apt-get install -yq --no-install-recommends libbz2-dev
RUN apt-get install -yq --no-install-recommends liblzma-dev
RUN apt-get install -yq --no-install-recommends libcurl4-gnutls-dev
RUN apt-get install -yq --no-install-recommends libncurses5-dev
RUN apt-get install -yq --no-install-recommends nettle-dev
RUN apt-get install -yq --no-install-recommends libp11-kit-dev
RUN apt-get install -yq --no-install-recommends libtasn1-dev
RUN apt-get install -yq --no-install-recommends libdb-dev
RUN apt-get install -yq --no-install-recommends libgnutls28-dev
RUN apt-get install -yq --no-install-recommends xz-utils
RUN apt-get install -yq --no-install-recommends libexpat1-dev

RUN locale-gen en_US.UTF-8
RUN update-locale LANG=en_US.UTF-8
Expand All @@ -54,11 +58,11 @@ RUN bash build/opt-build.sh $OPT
COPY . .
RUN bash build/opt-build-local.sh $OPT

FROM ubuntu:16.04
FROM ubuntu:20.04

LABEL maintainer="[email protected]"\
uk.ac.sanger.cgp="Cancer, Ageing and Somatic Mutation, Wellcome Sanger Institute" \
version="5.0.5" \
version="5.1.0" \
description="pcap-core"

ENV OPT /opt/wtsi-cgp
Expand All @@ -67,6 +71,7 @@ ENV PATH $OPT/bin:$PATH
ENV PERL5LIB $OPT/lib/perl5
ENV LD_LIBRARY_PATH $OPT/lib:$OPT/scramble/lib
ENV LC_ALL C
ENV GPERF_FOR_BWA /usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4

RUN apt-get -yq update
RUN apt-get install -yq --no-install-recommends \
Expand All @@ -82,8 +87,10 @@ zlib1g \
liblzma5 \
libncurses5 \
p11-kit \
libcurl3 \
libcurl3-gnutls \
libcurl4 \
moreutils \
google-perftools \
unattended-upgrades && \
unattended-upgrade -d -v && \
apt-get remove -yq unattended-upgrades && \
Expand Down
21 changes: 6 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,18 +29,12 @@ Available programs are described in the [wiki][wiki].

## Docker, Singularity and Dockstore

There are docker and dockstore.org wrappers for this project at [dockstore-cgpmap][dockstore-cgpmap].
There are dockstore.org CWL and wrappers for this project at [dockstore-cgpmap][dockstore-cgpmap].

The docker image is held on [quay.io][quay-io-cgpmap].

The CWL bindings of `dockstore-cgpmap` specifically target execution of the BWA mem mapping flow,
however all tools are contained in the image and can be used if you construct the relevant docker
commands.
The docker image is held on [quay.io][quay-io-pcap-core].

The docker image is know to work correctly after import into a singularity image.

See the [dockstore-cgpmap][dockstore-cgpmap] documentation for more detail.

## Dependencies/Install

Please be aware that this expects basic C compilation libraries and tools to be available, most are listed in `INSTALL`.
Expand Down Expand Up @@ -69,13 +63,10 @@ Please see the respective licence for each before use.
### Cutting the release

1. Update `lib/PCAP.pm` to the correct version.
2. Ensure upgrade path for new version number is added to `lib/PCAP.pm`.
2. Update `Dockerfile` to the correct version.
3. Update `CHANGES.md` to show major items.
4. Run `./prerelease.sh`
5. Check all tests and coverage reports are acceptable.
6. Commit the updated docs tree and updated module/version.
7. Push commits.
8. Use the GitHub tools to draft a release.
4. Push commits and verify with Sanger internal CI.
5. Use the GitHub tools to draft a release.

<!-- References -->

Expand All @@ -87,7 +78,7 @@ Please see the respective licence for each before use.
[cancerit_github]: https://github.com/cancerit
[old_repo]: https://github.com/ICGC-TCGA-PanCancer/PCAP-core
[dockstore-cgpmap]: https://github.com/cancerit/dockstore-cgpmap
[quay-io-cgpmap]: https://quay.io/repository/wtsicgp/dockstore-cgpmap
[quay-io-pcap-core]: https://quay.io/repository/wtsicgp/pcap-core

<!-- Travis -->
[travis-base]: https://travis-ci.org/cancerit/PCAP-core
Expand Down
46 changes: 38 additions & 8 deletions bin/bwa_mem.pl
Original file line number Diff line number Diff line change
Expand Up @@ -52,15 +52,14 @@
my $options = setup();

my $threads = PCAP::Threaded->new($options->{'threads'});
&PCAP::Threaded::disable_out_err if(exists $options->{'index'});

# register processes
$threads->add_function('split', \&PCAP::Bwa::split_in);
$threads->add_function('bwamem', \&PCAP::Bwa::bwa_mem, exists $options->{'index'} ? 1 : $options->{'map_threads'});
$threads->add_function('split', \&PCAP::Bwa::split_in, split_threads($options));
$threads->add_function('bwamem', \&PCAP::Bwa::bwa_mem, exists $options->{'index'} ? 1 : $options->{'map_threads'});

PCAP::Bwa::mem_setup($options) if(!exists $options->{'process'} || $options->{'process'} eq 'setup');

$threads->run($options->{'max_split'}, 'split', $options) if(!exists $options->{'process'} || $options->{'process'} eq 'split');
$threads->run($options->{'max_split'}, 'split', $options) if(!exists $options->{'process'} || $options->{'process'} eq 'split');

if(!exists $options->{'process'} || $options->{'process'} eq 'bwamem') {
$options->{'max_index'} = PCAP::Bwa::mem_mapmax($options);
Expand All @@ -78,6 +77,24 @@
}
}

sub split_threads {
my $options = shift;
my $div = 1;
my $threads_per_split = 1;
if(exists $options->{index}) {
$div = 1;
$threads_per_split = $options->{threads};
}
elsif($options->{raw_files}->[0] =~ m/(bam|cram)$/) {
my $inputs = scalar @{$options->{raw_files}};
$threads_per_split = int ($options->{threads} / $inputs);
$threads_per_split = 1 if($threads_per_split < 1);
$div = $threads_per_split;
}
$options->{threads_per_split} = $threads_per_split; # so can be used later
return $div; # so can be used as return
}

sub cleanup {
my $options = shift;
my $tmpdir = $options->{'tmp'};
Expand All @@ -91,6 +108,8 @@ sub setup {
'mmqcfrac' => 0.05,
'threads' => 1,
'fragment' => 10,
'dupmode' => 't',
'seqslice' => 10000,
'csi' => undef,
);

Expand All @@ -116,6 +135,8 @@ sub setup {
'q|mmqc' => \$opts{'mmqc'},
'qf|mmqcfrac:f' => \$opts{'mmqcfrac'},
'bm2|bwamem2' => \$opts{'bwamem2'},
'd|dupmode:s' => \$opts{'dupmode'},
'ss|seqslice:i' => $opts{'seqslice'},
) or pod2usage(2);

pod2usage(-verbose => 1, -exitval => 0) if(defined $opts{'h'});
Expand Down Expand Up @@ -145,10 +166,14 @@ sub setup {
die "ERROR: Please generate $opts{dict}, e.g.\n\t\$ samtools dict -a \$ASSEMBLY -s \$SPECIES $opts{reference} > $opts{dict}\n";
}

if(defined $opts{'scramble'}) {
die "ERROR: -scramble option is deprecated, please see -seqslice\n";
}

delete $opts{'process'} unless(defined $opts{'process'});
delete $opts{'index'} unless(defined $opts{'index'});
delete $opts{'bwa'} unless(defined $opts{'bwa'});
delete $opts{'scramble'} unless(defined $opts{'scramble'});
delete $opts{'scramble'};
delete $opts{'bwa_pl'} unless(defined $opts{'bwa_pl'});
delete $opts{'mmqc'} unless(defined $opts{'mmqc'});
delete $opts{'csi'} unless(defined $opts{'csi'});
Expand Down Expand Up @@ -220,11 +245,12 @@ =head1 SYNOPSIS
Optional parameters:
-bwamem2 -bm2 Use bwa-mem2 instead of bwa.
-fragment -f Split input into fragments of X million repairs [10]
- only applies to fastq[.gz] input
-nomarkdup -n Don't mark duplicates [flag]
-csi Use CSI index instead of BAI for BAM files [flag].
-cram -c Output cram, see '-sc' [flag]
-scramble -sc Single quoted string of parameters to pass to Scramble when '-c' used
- '-I,-O' are used internally and should not be provided
-seqslice -ss seqs_per_slice for CRAM compression [samtools default: 10000]
-scramble -sc DEPRECATED
-bwa -b Single quoted string of additional parameters to pass to BWA
- '-t,-p,-R' are used internally and should not be provided.
- '-v' is set to 1 unless '-bwa' is set.
Expand All @@ -234,12 +260,15 @@ =head1 SYNOPSIS
-mmqc -q Mark reads as QCFAIL (0x200, 512) if mismatch rate exceeded [flag]
- Please see 'bwa_mem.pl -m'
-mmqcfrac -qf Mismatch fraction for -mmqc [0.05]
-dupmode -d see "samtools markdup -m" [t]
Targeted processing:
-process -p Only process this step then exit, optionally set -index
setup - checks and configure workspace (-index N/A)
split - split data by readgroup and chunk size (if applicable)
bwamem - only applicable if input is bam
mark - Run duplicate marking (-index N/A)
stats - Generates the *.bas file for the final BAM.
stats - Generates the *.bas file for the final BAM (-index N/A)
-index -i Optionally restrict '-p' to single job
bwamem - 1..<lane_count>
Expand All @@ -249,6 +278,7 @@ =head1 SYNOPSIS
https://github.com/gperftools/ (assuming number of cores not exceeded)
If available specify the path to 'gperftools/lib/libtcmalloc_minimal.so'.
- NOT APPLIED TO bwa-mem2
Falls back to environment variable GPERF_FOR_BWA when not set, or nothing.
Other:
-jobs -j For a parallel step report the number of jobs required
Expand Down
32 changes: 20 additions & 12 deletions bin/merge_or_mark.pl
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,6 @@
use PCAP::Bwa;
use version;

const my $COORD_SORT_ORDER => 'coordinate';
const my $QUERYNAME_SORT_ORDER => 'queryname';
const my @VALID_PROCESS => qw(setup mark stats);
const my %INDEX_FACTOR => ( 'setup' => 1,
Expand Down Expand Up @@ -72,7 +71,8 @@ sub setup {
my %opts = (
'threads' => 1,
'csi' => undef,
'sortorder' => $COORD_SORT_ORDER,
'dupmode' => 't',
'seqslice' => 10000,
);

GetOptions( 'h|help' => \$opts{'h'},
Expand All @@ -84,11 +84,13 @@ sub setup {
's|sample=s' => \$opts{'sample'},
'n|nomarkdup' => \$opts{'nomarkdup'},
'p|process=s' => \$opts{'process'},
'q|querynamesort' => \$opts{'qnamesort'},
'q|qnamesort' => \$opts{'qnamesort'},
'i|noindex' => \$opts{'noindex'},
'csi' => \$opts{'csi'},
'c|cram' => \$opts{'cram'},
'sc|scramble=s' => \$opts{'scramble'},
'd|dupmode:s' => \$opts{'dupmode'},
'ss|seqslice:i' => $opts{'seqslice'},
) or pod2usage(2);

pod2usage(-verbose => 1, -exitval => 0) if(defined $opts{'h'});
Expand All @@ -113,17 +115,20 @@ sub setup {
die "ERROR: Please generate $opts{dict}, e.g.\n\t\$ samtools dict -a \$ASSEMBLY -s \$SPECIES $opts{reference} > $opts{dict}\n";
}

if(defined $opts{'scramble'}) {
die "ERROR: -scramble option is deprecated, please see -seqslice\n";
}

delete $opts{'process'} unless(defined $opts{'process'});
delete $opts{'index'} unless(defined $opts{'index'});
delete $opts{'scramble'} unless(defined $opts{'scramble'});
delete $opts{'scramble'};
delete $opts{'csi'} unless(defined $opts{'csi'});
if($opts{'qnamesort'} && !$opts{'nomarkdup'}){
die "ERROR: -qnamesort can only be used in conjunction with -nomarkdups\n";
}
if($opts{'noindex'} && !$opts{'qnamesort'}){
die "ERROR: -noindex can only be used in conjunction with -qnamesort\n";
}
$opts{'sortorder'} = $QUERYNAME_SORT_ORDER if($opts{'qnamesort'});

if($opts{'threads'} > 4) {
warn "Setting 'threads' to 4 as higher values are of limited value\n";
Expand Down Expand Up @@ -172,18 +177,19 @@ =head1 SYNOPSIS
-nomarkdup -n Don't mark duplicates [flag]
-qnamesort -q Use queryname sorting flag in bammerge rather than coordinate. [flag].
To be used in conjunction with -nomarkdup only
-noindex -i Don't attempt to index the merged file. Only available in conjunction with
-noindex -i Don't attempt to index the merged file. Only available in conjunction with
-qnamesort.
-csi Use CSI index instead of BAI for BAM files [flag].
-cram -c Output cram, see '-sc' [flag]
-scramble -sc Single quoted string of parameters to pass to Scramble when '-c' used
- '-I,-O' are used internally and should not be provided
-seqslice -ss seqs_per_slice for CRAM compression [samtools default: 10000]
-scramble -sc DEPRECATED
-dupmode -d see "samtools markdup -m" [t]
Targeted processing:
-process -p Only process this step then exit, optionally set -index
bwamem - only applicable if input is bam
mark - Run duplicate marking (-index N/A)
stats - Generates the *.bas file for the final BAM.
-process -p Only process this step then exit
setup - only applicable if input is bam
mark - Run duplicate marking
stats - Generates the *.bas file for the final BAM
Other:
-help -h Brief help message.
Expand Down Expand Up @@ -261,6 +267,8 @@ =head2 OPTIONAL parameters
=item B<-scramble>
DEPRECATED - see -seqslice
Single quoted string of parameters to pass to Scramble when '-c' used. Please see the Scramble
documentation for details.
Expand Down
Loading

0 comments on commit a2bcb9d

Please sign in to comment.