-
Notifications
You must be signed in to change notification settings - Fork 125
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
adding datasets version 16.35.0 (#1103)
* adding datasets version 16.35.0 * Update README.md Fix merge issues --------- Co-authored-by: Kutluhan Incekara <[email protected]>
- Loading branch information
Showing
3 changed files
with
67 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
FROM ubuntu:jammy as app | ||
|
||
ARG DATASETS_VER="16.35.0" | ||
|
||
LABEL base.image="ubuntu:jammy" | ||
LABEL dockerfile.version="1" | ||
LABEL software="NCBI's datasets and dataformat" | ||
LABEL software.version="${DATASETS_VER}" | ||
LABEL description="Downloads biological sequence data from NCBI" | ||
LABEL website="https://www.ncbi.nlm.nih.gov/datasets/docs/v1/" | ||
LABEL license="https://github.com/ncbi/datasets/blob/master/pkgs/ncbi-datasets-cli/LICENSE.md" | ||
LABEL maintainer="Erin Young" | ||
LABEL maintainer.email="[email protected]" | ||
|
||
# unzip isn't needed for datasets/dataformat, but it is often used after downloading files with datasets | ||
RUN apt-get update && apt-get install -y --no-install-recommends \ | ||
wget \ | ||
ca-certificates \ | ||
unzip && \ | ||
apt-get autoclean && rm -rf /var/lib/apt/lists/* | ||
|
||
WORKDIR /usr/local/bin | ||
|
||
# install ncbi datasets tool (pre-compiled binary) | ||
RUN wget https://github.com/ncbi/datasets/releases/download/v${DATASETS_VER}/linux-amd64.cli.package.zip && \ | ||
unzip linux-amd64.cli.package.zip && \ | ||
rm linux-amd64.cli.package.zip && \ | ||
chmod +x dataformat datasets | ||
|
||
ENV LC_ALL=C | ||
|
||
WORKDIR /data | ||
|
||
# datasets is generally datasets <subcommand> --help, but just typing in 'datasets' should bring up a help menu | ||
CMD [ "datasets" ] | ||
|
||
FROM app as test | ||
|
||
RUN dataformat --help && datasets --help | ||
|
||
# stolen from Curtis at https://github.com/StaPH-B/docker-builds/blob/master/pangolin/4.1.2/Dockerfile | ||
RUN datasets download virus genome accession ON924087.1 --filename ON924087.1.zip && \ | ||
unzip ON924087.1.zip && \ | ||
rm ON924087.1.zip && \ | ||
cp ncbi_dataset/data/genomic.fna ON924087.1.fna && \ | ||
wc -c ON924087.1.fna |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# NCBI datasets and dataformat container | ||
|
||
Main tool : [datasets](https://www.ncbi.nlm.nih.gov/datasets/docs/v1/download-and-install/#use-the-datasets-tool-to-download-biological-data) and [dataformat](https://www.ncbi.nlm.nih.gov/datasets/docs/v1/download-and-install/#use-the-dataformat-tool-to-convert-data-reports-to-other-formats) | ||
|
||
Full documentation: [https://www.ncbi.nlm.nih.gov/datasets/docs/v1/how-tos/](https://www.ncbi.nlm.nih.gov/datasets/docs/v1/how-tos/) | ||
|
||
> Use NCBI Datasets to gather metadata, download data packages, view reports and more | ||
## Example Usage | ||
|
||
```bash | ||
# will download the fasta for ON924087.1 in a zipped directory | ||
datasets download virus genome accession ON924087.1 --filename ON924087.1.zip | ||
|
||
# unzipping the directory and the fasta file will be located at ncbi_dataset/data/genomic.fna | ||
unzip ON924087.1.zip | ||
|
||
# copying the file into something with a better name | ||
cp ncbi_dataset/data/genomic.fna ncbi_dataset/data/ON924087.1.genomic.fna | ||
``` |