Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable GCS, S3, and libdeflate support for bcftools #1019

Merged
merged 6 commits into from
Aug 21, 2024

Conversation

pettyalex
Copy link
Contributor

@pettyalex pettyalex commented Jul 9, 2024

Enable AWS S3, GCS, and libdeflate support for bcftools by running ./configure before compiling

This fixes #1018

If you want to merge this, I don't see a way to mark another build number for an already published package, but I'd be glad to update that if it exists.

I'd also be glad to add tests that test reading from AWS S3 or GCS storage directly to validate that these features are working.

Pull Request (PR) checklist:

  • Include a description of what is in this pull request in this message.
  • The dockerfile successfully builds to a test target for the user creating the PR. (i.e. docker build --tag samtools:1.15test --target test docker-builds/samtools/1.15 )
  • Directory structure as name of the tool in lower case with special characters removed with a subdirectory of the version number (i.e. spades/3.12.0/Dockerfile)
    • (optional) All test files are located in same directory as the Dockerfile (i.e. shigatyper/2.0.1/test.sh)
  • Create a simple container-specific README.md in the same directory as the Dockerfile (i.e. spades/3.12.0/README.md)
    • If this README is longer than 30 lines, there is an explanation as to why more detail was needed
  • Dockerfile includes the recommended LABELS
  • Main README.md has been updated to include the tool and/or version of the dockerfile(s) in this PR
  • Program_Licenses.md contains the tool(s) used in this PR and has been updated for any missing

Comment on lines 66 to 67
liblzma-dev \
libcurl4-gnutls-dev \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these actually need to be -dev versions with headers in the runtime container? Do plugins somehow compile/link against them at runtime or can they just be liblzma libcurl4-gnutls?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about compilation & linking, but I think the correct syntax is liblzma5 for Ubuntu Jammy: https://packages.ubuntu.com/search?suite=jammy&section=all&arch=any&keywords=liblzma&searchon=names

and libcurl4: https://packages.ubuntu.com/search?suite=jammy&section=all&arch=any&keywords=libcurl4&searchon=names

I believe when the samtools/bcftools/htslib dockerfiles were written we were following these instructions: https://github.com/samtools/samtools/blob/972c1889942a4f07d8f62e93330f723da919c271/INSTALL#L220

@kapsakcj
Copy link
Collaborator

kapsakcj commented Jul 9, 2024

Could you please mark this PR as draft? The dockerfile doesn't build successfully yet (according to the GH Actions log) and I think it will require some edits prior to review from our team.

We would love to have additional tests for these features built into the dockerfile, preferably in the test stage of the dockerfile

And my last thought - it may be good to also update the samtools and htslib dockerfiles as well as I imagine they are also missing these features (I have not checked though, don't quote me). Can be done as part of this PR or separately.

@pettyalex pettyalex marked this pull request as draft July 10, 2024 01:30
pettyalex added 2 commits July 9, 2024 20:32
…dependencies at runtime. Update htslib to configure before building and link against libdeflate.
@Kincekara
Copy link
Collaborator

@pettyalex Thank you for raising this issue and making a pull request. GCS/S3 and libdeflate support are important features that we missed while building version 1.20.
As a general principle, we avoid overwriting images we created before because we don't want to break people's pipelines and validations. Another common practice here is the "one tool, one PR". It is very easy to miss something in a crowded pull request. I personally check the build logs beside the tests at the end to catch the silent errors.

So, I will request a few changes from you:

  • Please create another folder and version (1.20.c) for the bcftools. You can create different PRs for samtools and htslib with similar naming or just complete bcftools for now.

  • I have revisited the bcftools GitHub and rechecked installation notes. I think this is a good chance to enable other features too. Please see my changes below.

  • Lastly and optionally, you can add your email and name to maintainer labels.

Any further tests, recommendations, and feedback will be appreciated.
Thank you,

# for easy upgrade later. ARG variables only persist during build time
ARG BCFTOOLS_VER="1.20"

FROM ubuntu:jammy as builder

# re-instantiate variable
ARG BCFTOOLS_VER

# install dependencies, cleanup apt garbage
RUN apt-get update && apt-get install --no-install-recommends -y \
  wget \
  ca-certificates \
  perl \
  bzip2 \
  autoconf \
  automake \
  make \
  gcc \
  zlib1g-dev \
  libbz2-dev \
  liblzma-dev \
  libcurl4-gnutls-dev \
  libssl-dev \
  libperl-dev \
  libgsl0-dev \
  libdeflate-dev \
  procps && \
  rm -rf /var/lib/apt/lists/* && apt-get autoclean


# download, compile, and install bcftools
RUN wget https://github.com/samtools/bcftools/releases/download/${BCFTOOLS_VER}/bcftools-${BCFTOOLS_VER}.tar.bz2 && \
  tar -xjf bcftools-${BCFTOOLS_VER}.tar.bz2 && \
  rm -v bcftools-${BCFTOOLS_VER}.tar.bz2 && \
  cd bcftools-${BCFTOOLS_VER} && \
  ./configure --enable-libgsl --enable-perl-filters &&\
  make && \
  make install && \
  make test 

### start of app stage ###
FROM ubuntu:jammy as app

# re-instantiate variable
ARG BCFTOOLS_VER

# putting the labels in
LABEL base.image="ubuntu:jammy"
LABEL dockerfile.version="1"
LABEL software="bcftools"
LABEL software.version="${BCFTOOLS_VER}"
LABEL description="Variant calling and manipulating files in the Variant Call Format (VCF) and its binary counterpart BCF"
LABEL website="https://github.com/samtools/bcftools"
LABEL license="https://github.com/samtools/bcftools/blob/develop/LICENSE"
LABEL maintainer="Erin Young"
LABEL maintainer.email="[email protected]"
LABEL maintainer2="Curtis Kapsak"
LABEL maintainer2.email="[email protected]"

# install dependencies required for running bcftools
# https://github.com/samtools/bcftools/blob/develop/INSTALL#L29
RUN apt-get update && apt-get install --no-install-recommends -y \
    perl\
    zlib1g \
    gsl-bin \
    bzip2 \
    liblzma5 \
    libcurl4-gnutls-dev \
    libdeflate0 \  
    procps \
    && apt-get autoclean && rm -rf /var/lib/apt/lists/*

# copy in bcftools executables from builder stage
COPY --from=builder /usr/local/bin/* /usr/local/bin/
# copy in bcftools plugins from builder stage
COPY --from=builder /usr/local/libexec/bcftools/* /usr/local/libexec/bcftools/

# set locale settings for singularity compatibility
ENV LC_ALL=C

# set final working directory
WORKDIR /data

# default command is to pull up help optoins
CMD ["bcftools", "--help"]

### start of test stage ###
FROM app as test

# running --help and listing plugins
RUN bcftools --help && bcftools plugin -lv

# install wget for downloading test files
RUN apt-get update && apt-get install -y wget vcftools

RUN echo "downloading test SC2 BAM and FASTA and running bcftools mpileup and bcftools call test commands..." && \
  wget -q https://raw.githubusercontent.com/artic-network/artic-ncov2019/master/primer_schemes/nCoV-2019/V4/SARS-CoV-2.reference.fasta && \
  wget -q https://raw.githubusercontent.com/StaPH-B/docker-builds/master/tests/SARS-CoV-2/SRR13957123.primertrim.sorted.bam && \
  bcftools mpileup -A -d 200 -B -Q 0 -f SARS-CoV-2.reference.fasta SRR13957123.primertrim.sorted.bam | \
  bcftools call -mv -Ov -o SRR13957123.vcf
  
RUN echo "testing plugins..." && \
  bcftools +counts SRR13957123.vcf

RUN echo "testing polysomy..." && \
  wget https://samtools.github.io/bcftools/howtos/cnv-calling/usage-example.tgz &&\
  tar -xvf usage-example.tgz &&\
  zcat test.fcr.gz | ./fcr-to-vcf -b bcftools -a map.tab.gz -o outdir/ &&\
  bcftools cnv -o cnv/ outdir/test.vcf.gz &&\
  bcftools polysomy -o psmy/ outdir/test.vcf.gz &&\
  head psmy/dist.dat

RUN echo "reading test data from Google Cloud to validate GCS support" && \
  bcftools head -h 20 gs://genomics-public-data/references/hg38/v0/1000G_phase1.snps.high_confidence.hg38.vcf.gz

RUN  echo "reading test data from S3 to validate AWS support" && \
 bcftools head -h 20 s3://human-pangenomics/T2T/CHM13/assemblies/variants/GATK_CHM13v2.0_Resource_Bundle/resources-broad-hg38-v0-1000G_phase1.snps.high_confidence.hg38.t2t-chm13-v2.0.vcf.gz

@pettyalex
Copy link
Contributor Author

Thank you for the feedback!

About libcurl4-gnutls-dev vs libcurl3-gnutls: https://askubuntu.com/questions/469360/what-is-the-difference-between-libcurl3-and-libcurl4

Libcurl3 is ABI compatible with libcurl4, so the name of the compiled library has not been incremented. That means that libcurl3-gnutls is the correct runtime library for libcurl4-gnutls-dev, and if you look in the libcurl4-gnutls-dev package it indeed contains libcurl3-gnutls

@pettyalex pettyalex marked this pull request as ready for review July 18, 2024 20:16
@Kincekara
Copy link
Collaborator

@pettyalex Thank you very much for the changes. This looks great!

I need one minor change as you see in the checklist. You will need to add <li>[1.20.c](./bcftools/1.20.c/)</li> to main README.md line 120 as below. If you enable "Allow edits from maintainers", I can make any more cosmetic changes if necessary.
I will merge and deploy this image. Thanks!

Before:

| [bcftools](https://hub.docker.com/r/staphb/bcftools/) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/bcftools)](https://hub.docker.com/r/staphb/bcftools) | <ul><li>[1.10.2](./bcftools/1.10.2/)</li><li>[1.11](./bcftools/1.11/)</li><li>[1.12](./bcftools/1.12/)</li><li>[1.13](./bcftools/1.13/)</li><li>[1.14](./bcftools/1.14/)</li><li>[1.15](./bcftools/1.15/)</li><li>[1.16](./bcftools/1.16/)</li><li>[1.17](./bcftools/1.17/)</li><li>[1.18](bcftools/1.18/)</li><li>[1.19](./bcftools/1.19/)</li><li>[1.20](./bcftools/1.20/)</li></ul> | https://github.com/samtools/bcftools |

After:

| [bcftools](https://hub.docker.com/r/staphb/bcftools/) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/bcftools)](https://hub.docker.com/r/staphb/bcftools) | <ul><li>[1.10.2](./bcftools/1.10.2/)</li><li>[1.11](./bcftools/1.11/)</li><li>[1.12](./bcftools/1.12/)</li><li>[1.13](./bcftools/1.13/)</li><li>[1.14](./bcftools/1.14/)</li><li>[1.15](./bcftools/1.15/)</li><li>[1.16](./bcftools/1.16/)</li><li>[1.17](./bcftools/1.17/)</li><li>[1.18](bcftools/1.18/)</li><li>[1.19](./bcftools/1.19/)</li><li>[1.20](./bcftools/1.20/)</li><li>[1.20.c](./bcftools/1.20.c/)</li></ul> | https://github.com/samtools/bcftools | 

@Kincekara Kincekara merged commit 55ee7e6 into StaPH-B:master Aug 21, 2024
2 checks passed
@Kincekara
Copy link
Collaborator

@pettyalex Thank you for your contribution!
You can check the image deployment from here: https://github.com/StaPH-B/docker-builds/actions/runs/10493960570.
The image will be available on both Dockerhub and Quay.io

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: bcftools is built without GCS, S3, or libdeflate support
3 participants