Skip to content

Commit

Permalink
Merge branch 'master' of github.com:ctmrbio/stag-mwc
Browse files Browse the repository at this point in the history
  • Loading branch information
boulund committed Dec 6, 2022
2 parents c0d0d22 + eda858a commit 207f42a
Show file tree
Hide file tree
Showing 50 changed files with 1,147 additions and 638 deletions.
105 changes: 0 additions & 105 deletions .circleci/config.yml

This file was deleted.

2 changes: 1 addition & 1 deletion .github/workflows/build_containers.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ jobs:
steps:

- name: Check out code for the container builds
uses: actions/checkout@v2
uses: actions/checkout@v3

- name: Continue if Singularity Recipe exists
run: |
Expand Down
73 changes: 73 additions & 0 deletions .github/workflows/validate_syntax_and_dag.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
name: Validate syntax and DAG
on:
push:
branches:
- master
- develop
pull_request: [] # Do it on all PRs

jobs:
validate-syntax-and-dag:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
docker_tag:
- 'stable'
container:
image: snakemake/snakemake:${{ matrix.docker_tag }}

name: Validate syntax and DAG
steps:
- name: Check out code
uses: actions/checkout@v3

- name: Create empty input files
run: |
ls
mkdir -pv input
touch input/test1_1.fq.gz input/test1_2.fq.gz
touch input/test2_1.fq.gz input/test2_2.fq.gz
- name: Create placeholder db files
run: |
ls
mkdir -pv tmpdir
mkdir -pv db/hg19
mkdir -pv db/metaphlan
touch db/hg19/taxo.k2d
touch db/metaphlan/test.1.bt2
- name: Modify config.yaml
run: |
ls
sed -i 's/assess_depth: False/assess_depth: True/' config.yaml
sed -i 's/sketch_compare: False/sketch_compare: True/' config.yaml
sed -i 's/kaiju: False/kaiju: True/' config.yaml
sed -i 's/kraken2: False/kraken2: True/' config.yaml
sed -i 's/metaphlan: False/metaphlan: True/' config.yaml
sed -i 's/humann: False/humann: True/' config.yaml
sed -i 's/strainphlan: False/strainphlan: True/' config.yaml
sed -i 's/groot: False/groot: True/' config.yaml
sed -i 's/amrplusplus: False/amrplusplus: True/' config.yaml
sed -i 's/assembly: False/assembly: True/' config.yaml
sed -i 's/binning: False/binning: True/' config.yaml
sed -i 's|db_path: \"\"|db_path: \"db/hg19\"|' config.yaml
sed -i 's|db: \"\"|db: \"db\"|' config.yaml
sed -i 's|bt2_db_dir: \"\"|bt2_db_dir: \"db/metaphlan\"|' config.yaml
sed -i 's|bt2_index: \"\"|bt2_index: \"test\"|' config.yaml
sed -i 's|_db: \"\"|_db: \"db\"|' config.yaml
sed -i 's| index: \"\"| index: \"db\"|' config.yaml
sed -i 's|kmer_distrib: \"\"|kmer_distrib: \"db\"|' config.yaml
sed -i 's|tmpdir: \"/scratch\"|tmpdir: \"tmpdir\"|' config.yaml
cat config.yaml
- name: Run Snakemake
run: |
ls
snakemake --dryrun
31 changes: 30 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,43 @@ committed to the master branch that does not trigger any of the aforementioned
situations.


## [0.5.1] Unreleased
## [0.5.1] 2022-12-06
### Added
- Produce Snakemake report in zip format instead of HTML due to the HTML report being
broken in the later versions of Snakemake.
- Add KrakenUniq as taxonomic profiler as an alternative with lower false
positive rate than Kraken2.
- Added samplesheet as alternative input file selection method, this also
enables providing custom sample names that are not based on pattern in input
filenames.
- Samplesheet can be used to specify remote input files from S3 or HTTP/HTTPS sources.
- Added `run_krona` setting for taxonomic profilers to make it possible to disable Krona
table and plot creation.

### Fixed
- Corrected typo in `host_removal` rule concerning `keep_kreport` config flag.
- Corrected typo in bowtie2 annotation counts output files leading to workflow
complaining about missing output files.
- Removed unintended stdout printouts from various helper scripts and some
MetaPhlAn related rules.
- Removed outdated mentions of MetaPhlAn2 in report.

### Changed
- Replaced CircleCI automatic testing workflow with one implemented with Github actions.
- Updated MetaPhlAn to version 4.0.3.
- Updated HUMAnN to version 3.6.
- Modified area and MetaPhlAn heatmap plotting scripts to better deal
with MetaPhlAn 4 output formats.
- Updated the documentation to reflect recent changes in StaG.
- Updated KrakenTools to v1.2
- Updated `scripts/join_tables.py` to v1.1, which includes support for skipping
lines before the header.
- Improved automatic report generation code in main Snakefile to be more
robust. Now works well also when --use-singularity or --jobs are used
simultaneously with --report.

### Removed
- Removed old unmaintained DB download rules for groot, kaiju, kraken2.


## [0.5.0] 2021-11-18
Expand Down
18 changes: 10 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,16 @@

[![DOI](https://zenodo.org/badge/125840716.svg)](https://zenodo.org/badge/latestdoi/125840716)
[![Snakemake](https://img.shields.io/badge/snakemake-≥4.8.1-brightgreen.svg)](https://snakemake.bitbucket.io)
[![CircleCI](https://circleci.com/gh/ctmrbio/stag-mwc/tree/master.svg?style=svg)](https://circleci.com/gh/ctmrbio/stag-mwc/tree/master)

![StaG mwc logo](docs/source/img/stag_head_text.png "StaG mwc")

This repo contains the code for a Snakemake workflow of the StaG Metagenomic
Workflow Collaboration (mwc). Currently, the project focus is a barebones
metagenomics analysis workflow to produce primary output files from several
different metagenomic analysis tools.
This repo contains the StaG Metagenomic Workflow Collaboration (mwc) Snakemake
workflow. The project focuses on providing a metagenomics analysis workflow to
produce primary output files from several different metagenomic analysis tools.

Go to https://stag-mwc.readthedocs.org for the full documentation.


## Usage

### Step 0: Install conda and Snakemake
Expand All @@ -23,8 +22,9 @@ StaG-mwc. Most people would probably want to install
base environment. When running StaG with the `--use-conda` or
`--use-singularity` flags, all dependencies are managed automatically. If
using conda it will automatically install the required versions of all tools
required to run StaG-mwc. There is no need to combine the flags: the
Singularity images already contain all required dependencies.
required to run StaG-mwc. There is no need to combine the conda and singularity
flags: the Singularity images used by the workflow already contain all required
dependencies.

### Step 1: Clone workflow
To use StaG-mwc, you need a local copy of the workflow repository. Start by
Expand All @@ -40,7 +40,7 @@ cite the publications of the other tools used in your workflow.
Configure the workflow according to your needs by editing the file
`config.yaml`. The most common changes include setting the paths to input and
output folders, and configuring what steps of the workflow should be included
when running the workflow.
when running the workflow.

### Step 3: Execute workflow
Test your configuration by performing a dry-run via
Expand All @@ -65,6 +65,7 @@ Note that in all examples above, `--use-conda` can be replaced with
`--use-singularity` to run in Singularity containers instead of using a locally
installed conda. Read more about it under the Running section in the docs.


## Testing
A very basic continuous integration test is currently in place. It merely
validates the syntax by trying to let Snakemake build the dependency graph if
Expand All @@ -80,6 +81,7 @@ If you intend to modify or further develop this workflow, you are welcome to
fork this reposity. Please consider sharing potential improvements via a pull
request.


## Citing
If you find StaG-mwc useful in your research, please cite the Zenodo DOI:
https://zenodo.org/badge/latestdoi/125840716
Expand Down
63 changes: 51 additions & 12 deletions Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -8,18 +8,20 @@
# https://stag-mwc.readthedocs.org

from pathlib import Path
import copy
import subprocess
import textwrap

from snakemake.exceptions import WorkflowError
from snakemake.utils import min_version
min_version("5.5.4")

from rules.publications import publications
from scripts.common import UserMessages
from scripts.common import UserMessages, SampleSheet

user_messages = UserMessages()

stag_version = "0.5.0"
stag_version = "0.5.1"
singularity_branch_tag = "-master" # Replace with "-master" before publishing new version

configfile: "config.yaml"
Expand All @@ -33,7 +35,15 @@ TMPDIR = Path(config["tmpdir"])
DBDIR = Path(config["dbdir"])
all_outputs = []

SAMPLES = set(glob_wildcards(INPUTDIR/config["input_fn_pattern"]).sample)
if config["samplesheet"]:
samplesheet = SampleSheet(config["samplesheet"], keep_local=config["keep_local"], endpoint_url=config["s3_endpoint_url"])
SAMPLES = samplesheet.samples
INPUT_read1 = lambda w: samplesheet.sample_info[w.sample]["read1"]
INPUT_read2 = lambda w: samplesheet.sample_info[w.sample]["read2"]
else:
SAMPLES = set(glob_wildcards(INPUTDIR/config["input_fn_pattern"]).sample)
INPUT_read1 = INPUTDIR/config["input_fn_pattern"].format(sample="{sample}", readpair="1"),
INPUT_read2 = INPUTDIR/config["input_fn_pattern"].format(sample="{sample}", readpair="2")

onstart:
print("\n".join([
Expand All @@ -46,9 +56,12 @@ onstart:
)

if len(SAMPLES) < 1:
raise WorkflowError("Found no samples! Check input file pattern and path in config.yaml")
raise WorkflowError("Found no samples! Check input file options in config.yaml")
else:
print(f"Found the following samples in inputdir using input filename pattern '{config['input_fn_pattern']}':\n{SAMPLES}")
if config["samplesheet"]:
print(f"Found these samples in '{config['samplesheet']}':\n{SAMPLES}")
else:
print(f"Found these samples in '{config['inputdir']}' using input filename pattern '{config['input_fn_pattern']}':\n{SAMPLES}")


#############################
Expand All @@ -69,6 +82,7 @@ include: "rules/naive/bbcountunique.smk"
#############################
include: "rules/taxonomic_profiling/kaiju.smk"
include: "rules/taxonomic_profiling/kraken2.smk"
include: "rules/taxonomic_profiling/krakenuniq.smk"
include: "rules/taxonomic_profiling/metaphlan.smk"

#############################
Expand Down Expand Up @@ -173,10 +187,35 @@ onsuccess:
Path("citations.rst").unlink()
Path("citations.rst").symlink_to(citation_filename)

shell("{snakemake_call} --unlock".format(snakemake_call=argv[0]))
shell("{snakemake_call} --report {report}-{datetime}.html".format(
snakemake_call=argv[0],
report=config["report"],
datetime=report_datetime,
)
)
unlock_call = copy.deepcopy(argv)
unlock_call.append("--unlock")

report_args = copy.deepcopy(argv)
report_args.extend(["--report", f"{config['report']}-{report_datetime}.zip"])

# Report generation doesn't work if --jobs
# or --use-singularity are specified,
# so we strip all args related to these from argv
# for report generation call
skip = False
report_call = []
for arg in report_args:
if arg == "--use-singularity":
continue
if arg == "--singularity-args":
skip = True
continue
if arg == "--singularity-prefix":
skip = True
continue
if arg == "--jobs":
skip = True
continue
if skip:
skip = False
continue
report_call.append(arg)

subprocess.run(unlock_call)
subprocess.run(report_call)

Loading

0 comments on commit 207f42a

Please sign in to comment.