Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs #100

Merged
merged 14 commits into from
Nov 21, 2024
28 changes: 27 additions & 1 deletion .dockerignore
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,3 +1,29 @@
/shiny/*
/vignettes/*
/docs/*

# ignore file extensions
*.DS_Store
*.Rmd
*.html
*.md
*.Rhistory
*.gitignore
*.Rbuildignore
*.Rproj
DESCRIPTION

# ignore folders
**.git
**.github
**.Rproj.user
*docs
*vignettes
*man
*inst
*pkgdown

# ignore specific files
docker-compose.yml
docker-compose-*.yml
_pkgdown.yml
NAMESPACE
23 changes: 23 additions & 0 deletions .github/ISSUE_TEMPLATE/support-my-organism.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
name: Support my organism
about: I want SeqSender to support my organism.
title: "[ORGANISM : FEATURE / BUG]"
labels: help wanted
assignees: ''

---

**Have you attempted to upload your organism using SeqSender? If so, please list the organism affected. If not, please attempt using SeqSender with it first, as SeqSender currently supports a wide variety of organisms, databases, and submission options.**
Be sure to check the Submission Wizard in the documentation for all the available customizations for submitting your samples to repositories.

**Which databases are you uploading to? Are all of them affected? If not list which ones are affected:**
BIOSAMPLE/SRA/GENBANK/GISAID

**Is the problem related to a metadata field, an additional file, available submission options, or the submission process itself?**
If the field is only an attribute, it can be added to any database even if not validated by SeqSender, by simply adding the correct column name with the database prefix.

**If possible describe the solution you'd like to see along with any other additional details.**
A clear and concise description of what you think needs to change/made available to resolve your issue.

**Error Logs**
Add any other context, logs, or screenshots related to the issue here.
52 changes: 52 additions & 0 deletions .github/workflows/DH_GHCR_upload.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
name: Create and publish docker image to DockerHub and GitHub Container Repository

on:
release:
types: [published]

jobs:
push_to_registry:
runs-on: ubuntu-latest
permissions:
packages: write
contents: read
attestations: write
id-token: write
steps:
- name: Check out the repo
uses: actions/checkout@v4

- name: Log into GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Log in to Docker Hub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}

- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@v5
with:
images: |
cdcgov/seqsender
ghcr.io/${{ github.repository }}

- name: Build and push Docker image
id: push
uses: docker/build-push-action@v6
with:
context: .
file: ./Dockerfile
push: true
tags: |
cdcgov/seqsender:${{ github.ref_name }}
cdcgov/seqsender:latest
ghcr.io/cdcgov/seqsender:${{ github.ref_name }}
ghcr.io/cdcgov/seqsender:latest
labels: "Genomic sequence pipeline to automate the process of generating necessary submission files and batch uploading them to public databases."
42 changes: 0 additions & 42 deletions .github/workflows/GHCR_docker.yml

This file was deleted.

3 changes: 1 addition & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,4 @@ docker-compose-*.yaml
**/.Rproj.user
**/test_data/*
**/gisaid_cli/*
**/COV_TEST_DATA/*
**/FLU_TEST_DATA/*
**/*_TEST_DATA/*
194 changes: 92 additions & 102 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,102 +1,92 @@
# Create an argument to pull a particular version of micromamba image
ARG micromamba_version
ARG micromamba_version=${micromamba_version:-1.5.3}

############# base image ##################
FROM --platform=$BUILDPLATFORM ubuntu:focal as base

# local apt mirror support
# start every stage with updated apt sources
ARG APT_MIRROR_NAME=
RUN if [ -n "$APT_MIRROR_NAME" ]; then sed -i.bak -E '/security/! s^https?://.+?/(debian|ubuntu)^http://'"$APT_MIRROR_NAME"'/\1^' /etc/apt/sources.list && grep '^deb' /etc/apt/sources.list; fi
RUN apt-get update --allow-releaseinfo-change --fix-missing

############# micromamba image ##################

FROM --platform=$BUILDPLATFORM mambaorg/micromamba:${micromamba_version} as micromamba
RUN echo "Getting micromamba image"

############# Build Stage: Final ##################

# Build the final image
FROM base as final

# if image defaults to a non-root user, then we may want to make the
# next 3 ARG commands match the values in our image.
ENV MAMBA_USER=$MAMBA_USER
ENV MAMBA_USER_ID=$MAMBA_USER_ID
ENV MAMBA_USER_GID=$MAMBA_USER_GID
ENV MAMBA_ROOT_PREFIX="/opt/conda"
ENV MAMBA_EXE="/bin/micromamba"

COPY --from=micromamba "$MAMBA_EXE" "$MAMBA_EXE"
COPY --from=micromamba /usr/local/bin/_activate_current_env.sh /usr/local/bin/_activate_current_env.sh
COPY --from=micromamba /usr/local/bin/_dockerfile_shell.sh /usr/local/bin/_dockerfile_shell.sh
COPY --from=micromamba /usr/local/bin/_entrypoint.sh /usr/local/bin/_entrypoint.sh
COPY --from=micromamba /usr/local/bin/_dockerfile_initialize_user_accounts.sh /usr/local/bin/_dockerfile_initialize_user_accounts.sh
COPY --from=micromamba /usr/local/bin/_dockerfile_setup_root_prefix.sh /usr/local/bin/_dockerfile_setup_root_prefix.sh

# Install system dependencies
ARG DEBIAN_FRONTEND=noninteractive

# Install system libraries of general use
RUN apt-get update --allow-releaseinfo-change --fix-missing \
&& apt-get install --no-install-recommends -y \
dos2unix \
ca-certificates \
&& apt clean autoclean \
&& apt autoremove --yes \
&& rm -rf /var/lib/{apt,dpkg,cache,log}/

# Create working directory
ENV WORKDIR=/data

# Set up volume directory
VOLUME ${WORKDIR}

# Set up working directory
WORKDIR ${WORKDIR}

# Allow permission to read and write to working directory
RUN chmod -R a+rwx ${WORKDIR}

# Create a program variable
ENV PROJECT_DIR=/seqsender

# Set up a volume directory
VOLUME ${PROJECT_DIR}

# Copy all files to project directory
COPY . ${PROJECT_DIR}

############ Set-up micromamba environment ##################

# Copy requirement files to program directory
COPY env.yaml "${PROJECT_DIR}/env.yaml"

# Set up environments
RUN micromamba install --yes --name base -f "${PROJECT_DIR}/env.yaml" \
&& micromamba clean --all --yes

############# Launch PROGRAM ##################

# Copy bash script to run PROGRAM to docker image
COPY seqsender-kickoff "${PROJECT_DIR}/seqsender-kickoff"

# Convert bash script from Windows style line endings to Unix-like control characters
RUN dos2unix "${PROJECT_DIR}/seqsender-kickoff"

# Allow permission to excute the bash script
RUN chmod a+x "${PROJECT_DIR}/seqsender-kickoff"

# Allow permission to read and write to program directory
RUN chmod a+rwx ${PROJECT_DIR}

# Export bash script to path
ENV PATH="$PATH:${PROJECT_DIR}"

# Activate conda environment
ENV PATH="$PATH:${MAMBA_ROOT_PREFIX}/bin"

# Execute the pipeline
ENTRYPOINT ["tail", "-f", "/dev/null"]
# Create an argument to pull a particular version of micromamba image
ARG micromamba_version
ARG micromamba_version=${micromamba_version:-1.5.3}

############# micromamba image ##################

FROM mambaorg/micromamba:${micromamba_version} as micromamba
RUN echo "Getting micromamba image"

############# base image ##################

FROM ubuntu:focal as base

# if image defaults to a non-root user, then we may want to make the
# next 3 ARG commands match the values in our image.
ENV MAMBA_USER=$MAMBA_USER
ENV MAMBA_USER_ID=$MAMBA_USER_ID
ENV MAMBA_USER_GID=$MAMBA_USER_GID
ENV MAMBA_ROOT_PREFIX="/opt/conda"
ENV MAMBA_EXE="/bin/micromamba"

COPY --from=micromamba "$MAMBA_EXE" "$MAMBA_EXE"
COPY --from=micromamba /usr/local/bin/_activate_current_env.sh /usr/local/bin/_activate_current_env.sh
COPY --from=micromamba /usr/local/bin/_dockerfile_shell.sh /usr/local/bin/_dockerfile_shell.sh
COPY --from=micromamba /usr/local/bin/_entrypoint.sh /usr/local/bin/_entrypoint.sh
COPY --from=micromamba /usr/local/bin/_dockerfile_initialize_user_accounts.sh /usr/local/bin/_dockerfile_initialize_user_accounts.sh
COPY --from=micromamba /usr/local/bin/_dockerfile_setup_root_prefix.sh /usr/local/bin/_dockerfile_setup_root_prefix.sh

# Install system dependencies
ARG DEBIAN_FRONTEND=noninteractive

# local apt mirror support
# start every stage with updated apt sources
ARG APT_MIRROR_NAME=
RUN if [ -n "$APT_MIRROR_NAME" ]; then sed -i.bak -E '/security/! s^https?://.+?/(debian|ubuntu)^http://'"$APT_MIRROR_NAME"'/\1^' /etc/apt/sources.list && grep '^deb' /etc/apt/sources.list; fi
RUN apt-get update --allow-releaseinfo-change --fix-missing \
&& apt-get install --no-install-recommends -y \
dos2unix \
ca-certificates \
&& apt clean autoclean \
&& apt autoremove --yes \
&& rm -rf /var/lib/{apt,dpkg,cache,log}/

# Create working directory
ENV WORKDIR=/data

# Set up volume directory
VOLUME ${WORKDIR}

# Set up working directory
WORKDIR ${WORKDIR}

# Allow permission to read and write to working directory
RUN chmod -R a+rwx ${WORKDIR}

# Create a program variable
ENV PROJECT_DIR=/seqsender

# Set up a volume directory
VOLUME ${PROJECT_DIR}

# Copy all files to project directory
COPY . ${PROJECT_DIR}

############ Set-up micromamba environment ##################

# Copy requirement files to program directory
COPY env.yaml "${PROJECT_DIR}/env.yaml"

# Set up environments
RUN micromamba install --yes --name base -f "${PROJECT_DIR}/env.yaml" \
&& micromamba clean --all --yes

############# Launch PROGRAM ##################

# Copy bash script to run PROGRAM to docker image
COPY seqsender-kickoff "${PROJECT_DIR}/seqsender-kickoff"

# Convert bash script from Windows style line endings to Unix-like control characters
RUN dos2unix "${PROJECT_DIR}/seqsender-kickoff"

# Allow permission to excute the bash script
RUN chmod a+x "${PROJECT_DIR}/seqsender-kickoff"

# Allow permission to read and write to program directory
RUN chmod a+rwx ${PROJECT_DIR}

# Export bash script to path
ENV PATH="$PATH:${PROJECT_DIR}"

# Activate conda environment
ENV PATH="$PATH:${MAMBA_ROOT_PREFIX}/bin"
2 changes: 1 addition & 1 deletion README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ github_pages_url <- description$GITHUB_PAGES

<p style="font-size: 16px;"><em>Public Database Submission Pipeline</em></p>

**Beta Version**: v1.2.1. This pipeline is currently in Beta testing, and issues could appear during submission. Please use it at your own risk. Feedback and suggestions are welcome!
**Beta Version**: v1.2.5. This pipeline is currently in Beta testing, and issues could appear during submission. Please use it at your own risk. Feedback and suggestions are welcome!

**General Disclaimer**: This repository was created for use by CDC programs to collaborate on public health related projects in support of the [CDC mission](https://www.cdc.gov/about/organization/mission.htm). GitHub is not hosted by the CDC, but is a third party website used by CDC and its partners to share information and collaborate on software. CDC use of GitHub does not imply an endorsement of any one particular service, product, or enterprise.

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

</p>

**Beta Version**: 1.2.1. This pipeline is currently in Beta testing, and
**Beta Version**: 1.2.5. This pipeline is currently in Beta testing, and
issues could appear during submission. Please use it at your own risk.
Feedback and suggestions are welcome\!

Expand Down
12 changes: 11 additions & 1 deletion biosample_sra_handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -244,7 +244,15 @@ def process_biosample_sra_report(report_file: str, database: str, submission_dir
if "Action" not in report_dict["SubmissionStatus"]:
return submission_status, submission_id
try:
for action_dict in report_dict["SubmissionStatus"]["Action"]:
# If only a single sample, convert into list for proper formatting
if isinstance(report_dict["SubmissionStatus"]["Action"], list):
action_list = report_dict["SubmissionStatus"]["Action"]
elif isinstance(report_dict["SubmissionStatus"]["Action"], dict):
action_list = [report_dict["SubmissionStatus"]["Action"]]
else:
print(f"Error: Unable to correctly process BioSample report at: {report_file}", file=sys.stderr)
return submission_status, submission_id
for action_dict in action_list:
# Skip if incorrect database
if "@target_db" not in action_dict or action_dict["@target_db"].lower() != database.lower():
continue
Expand All @@ -271,6 +279,8 @@ def process_biosample_sra_report(report_file: str, database: str, submission_dir
sample_info.append({sample_name_col:sample_name, f"{column_prefix}_status":action_dict["@status"], f"{column_prefix}_accession":accession, f"{column_prefix}_message":""})
except:
pass
if submission_status == "PROCESSED" and not sample_info:
print(f"Error: Unable to process {database} report.xml to retrieve accessions at: {report_file}", file=sys.stderr)
if sample_info:
update_df = pd.DataFrame(sample_info)
upload_log.update_submission_status_csv(submission_dir=submission_dir, update_database=database, update_df=update_df)
Expand Down
Loading
Loading