Releases: ctmrbio/stag-mwc
StaG v0.7.0
This StaG release is a considered a major release due to the removal of several main workflow sections, specifically all antibiotic resistance gene tools and the assembly workflow have been removed. Users interested in AMR profiling should consider using the BBMap or Bowtie2-based mapping workflows in StaG, or turn to another Snakemake or Nextflow workflow for that type of profiling (e.g. the latest AMR++ release). This StaG release also adds a new feature: host removal using Bowtie2, intended to be used for high-host microbiome samples in combination with Kraken2 because we have found that Kraken2 is very sensitive to remaining host contamination left behind after Kraken2-based host removal. We find that if performing taxonomic profiling with MetaPhlAn4 the quality of host removal has no substantial impact on the quality of the taxonomic profiles and users can benefit from continuing to use Kraken2 for host removal to save some time and compute resources in those cases. In addition, there is a critical bug fix when running HUMAnN3 that all users should upgrade to.
The previous StaG release, v0.6.1 was two weeks ago. The v0.7.0 release modifies approximately 35 files and removes approximately 1472 lines of code, with 381 new lines.
[0.7.0] 2023-06-13
Added
- Host removal: Bowtie2 now available as an option for host removal.
Fixed
- HUMAnN3: Fixed critical bug leading to entire system-wide temporary directory being emptied after successful HUMAnN run.
- Singularity: All Singularity definition files should from now on get version bumps in the description labels when conda environments built inside them are updated to reduce the risk of Singularity reusing old cached copies of images instead of downloading the latest version as intended.
Changed
- Preprocessing summary: Preprocessing summary script can now output a table of read counts regardless of which combination of read QC and host removal is used.
Removed
- AMR++, Groot: All tools for antibiotic resistance gene profiling have been removed entirely because they were out of date and had few active users. Users wanting to perform antibiotic resistance gene profiling are suggested to use the mapper modules with a suitable reference database or run the latest version of AMR++ separately.
- Assembly: The MetaWRAP-based assembly parts of the workflow have been removed entirely.
StaG v0.6.1
This StaG release is a maintenance release, fixing some things that did not work fully as intended in the previous release, most specifically making sure that the KrakenUniq module works as intended, also when bypassing quality control and host removal steps. It's been approximately one and a half month since the last StaG release (v0.6.1 was released 2023-04-17) and this release contains 23 commits and affects 17 files in the repo. A total of 120 lines were added and 67 removed.
The most notable changes are:
Added
- BBMap: now outputs sorted BAM file, added options
keep_sam
andkeep_bam
. - Bowtie2: added option
keep_bam
.
Fixed
- KrakenUniq: environment variable
LC_ALL
has been added to Singularity image
to prevent unnecessary warning messages related to it being undefined. - KrakenUniq: now able to run when host removal is skipped, solved by adding
krakenuniq_merge_reads
rule to create a temporary merged fasta file with
input data for KrakenUniq to avoid giving KrakenUniq symlinks as input.
Changed
- MetaPhlAn: Updated to v4.0.6
- HUMAnN3: Updated to v3.7
- HUMAnN3: Changed the way the temporary directory is resolved, now using
Snakemake's built-inresources.tmpdir
. This should prevent HUMAnN from
creating large temporary directories outside of Slurm job folders so that
they cannot be automatically cleaned up if the Slurm job times out or fails
before HUMAnN can clean up after itself. - KrakenUniq: Concatenate reads with BBMap's
fuse.sh
with a padding of one
N
instead of interleaving the paired inputs into a single FASTA to avoid
KrakenUniq treating paired reads independently.
Deprecated
- Kaiju, Kraken2, MetaPhlAn: area plot removed due to repeatedly leading to
failed runs in cached Singularity containers. The script still works as
intended in newer matplotlib versions and will remain in the scripts folder
for potential manual use if desired.
Removed
- Groot: Removed settings related to read length window as that feature was
removed in a previous StaG release.
StaG v0.6.0
This StaG release focuses on making the StaG repo conform to modern best practices for Snakemake workflows, and making sure that the KrakenUniq module works with Singularity. It's been approximately four months since the last StaG release (v0.5.1 was released 2022-12-06) and this release contains 60 commits and affects 122 files in the repo. A total of 2474 lines were added and 1118 removed.
Note that due to the new structure of the repo this is a potentially breaking change for some users.
The most notable changes are:
Added
- Added a new Slurm profile for use on CTMR Gandalf, also intended to be useful
as a starting point for creating custom Slurm profiles. - Added a README with basic instructions for how to configure the workflow.
- Added function to disable MetaPhlAn heatmap plots, which may be useful when
processing very large numbers of samples. - Added MetaPhlAn-style output tables for KrakenUniq.
- Added Krona plot output for KrakenUniq.
Fixed
- Fixed missing interactive Kaiju Krona plots for all samples in final report.
- Now reusing metaphlan conda environment and biobakery container for running
bowtie2 mapping rule to ensure a consistent execution environment for bowtie2. - KrakenUniq now works in Singularity, thanks to new custom Singularity image.
- KrakenUniq rule is now correctly not rerun if
keep_kraken
orkeep_kreport
settings are set to false when executing the workflow a second time. - Kraken2 rule is now correctly not rerun if
keep_kraken
orkeep_kreport
settings are set to false when executing the workflow a second time.
Changed
- Restructured repo to conform to modern Snakemake best practices. This also
includes updates to documentation where needed. - Hardcoded default thread values for all rules used during local execution
without profile. Intended to be overridden by profile. - Updated KrakenTools to its latest version (1.2), with a minor custom modification
ofkreport2mpa.py
, changing output column names totaxon_name
andreads
. - Updated all Python packages in the main stag-mwc conda environment to their
latest version. - Some minor scripts affected by Pandas and Matplotlib updates were modified to
work with the latest versions of those libraries. - Updated groot to 1.1.2 that brings many performance improvements, but removed
the built-in plotting functionality so the groot module no longer produces
any plots. Removed size window filtering with BBMap from groot alignment rule,
and renamed the groot config variableindex
toindex_dir
to better map to
--indexDir
used in groot CLI.
Deprecated
- Older Slurm profiles for CTMR Gandalf and UPPMAX Rackham are now considered
deprecated and will be removed in a future release. - MetaWRAP support is considered deprecated and will eventually be replaced by
another solution for metagenome assembly in a future release of StaG.
StaG v0.5.1
This StaG release focuses on including recent developments to MetaPhlAn (version 4) and HUMAnN (version 3.6). It's been almost a year since the last StaG release (v0.5.0 was released 2021-11-18) and this release marks the 200th pull request to the StaG project and contains almost 70 commits.
Among the most notable new features are:
- Support for samplesheet as input: allows decoupling sample names from filenames and allows files from several different locations in the local filesystem or remote sources such as HTTP(S) and S3.
- Updated MetaPhlAn to v4.0.3
- Updated HUMAnN to v3.6
- KrakenUniq module
- Ability to disable Krona plot generation
Added
- Produce Snakemake report in zip format instead of HTML due to the HTML report being broken in the later versions of Snakemake.
- Add KrakenUniq as taxonomic profiler as an alternative with lower false positive rate than Kraken2.
- Added samplesheet as alternative input file selection method, this also enables providing custom sample names that are not based on pattern in input filenames.
- Samplesheet can be used to specify remote input files from S3 or HTTP/HTTPS sources.
- Added run_krona setting for taxonomic profilers to make it possible to disable Krona table and plot creation.
Fixed
- Corrected typo in host_removal rule concerning keep_kreport config flag.
- Corrected typo in bowtie2 annotation counts output files leading to workflow complaining about missing output files.
- Removed unintended stdout printouts from various helper scripts and some MetaPhlAn related rules.
- Removed outdated mentions of MetaPhlAn2 in report.
Changed
- Replaced CircleCI automatic testing workflow with one implemented with Github actions.
- Updated MetaPhlAn to version 4.0.3.
- Updated HUMAnN to version 3.6.
- Modified area and MetaPhlAn heatmap plotting scripts to better deal with MetaPhlAn 4 output formats.
- Updated the documentation to reflect recent changes in StaG.
- Updated KrakenTools to v1.2
- Updated scripts/join_tables.py to v1.1, which includes support for skipping lines before the header.
- Improved automatic report generation code in main Snakefile to be more robust. Now works well also when --use-singularity or --jobs are used simultaneously with --report.
Removed
- Removed old unmaintained DB download rules for groot, kaiju, kraken2.
Known issues
- KrakenUniq currently doesn't work with Singularity. Recommended workaround is to run KrakenUniq with
--use-conda
until it is fixed upstream.
StaG v0.5.0
This StaG release focuses on including recent developments to MetaPhlAn and HUMAnN (which are both now at version 3), and also includes support for StrainPhlAn. Further, this release improves the automated procedures concerning Singularity image creation and distribution by using Github Actions and distributing the images via the Github Container Registry (ghcr.io), which will make developing even more robust StaG versions easier for people involved in the development. Aside from the most recent changes to Singularity image creation and distribution, the majority of the commits in this new release have been in production at CTMR for 8 months.
Notable changes in this release include:
- Updating to MetaPhlAn3 and HUMAnN3, with StrainPhlAn3 support added
- New user configureable feature to automatically remove some of the largest most common intermediary files
- Updating fastp, Kaiju, Kraken2, BBMap and MultiQC to more recent versions
- General bug fixes and usability improvements
For a detailed list of changes refer to the CHANGELOG.
StaG v0.4.1
Finally, the long overdue release of StaG v0.4.1 has been merged to the master branch. Many of the new features in this version have been in use in CTMR's production environment for almost a year.
The main new features are:
- Created Singularity images for all conda environments. Run with --use-singularity (do not combine with --use-conda).
- Added custom reimplementation of AMRPlusPlus v2.0 which can be executed with either --use-singularity or --use-conda.
Read about all the details in the CHANGELOG.
Some known issues still exist in the assembly and binning steps. Some observations of cache issues with Singularity and metaphlan2 use has also been reported.
StaG v0.4.0
First release in the 0.4.0 series
Changes
This version introduces a lot of changes, most notably probably the change from BBMap as host removal tool, now replaced with Kraken2. This version also adds the basic skeleton for per-sample metagenome assembly and binning, based on metaWRAP. Note that this is fairly untested sections of the workflow as of now.
For a complete list of changes, please review CHANGELOG.md
.
This release also adds continuous integration testing via CircleCI, but still only in a basic form.
There are a few known issues with this release. Please create new issues if you encounter anything strange.
StaG-mwc v0.3.0-beta
This is a beta release of StaG-mwc v0.3.0.
Several known issues exist, check or report issues in the issue tracker on Github.
Known issues using the available rules for downloading reference databases.
Second public release
Use pathlib (#31) * Use pathlib in Snakefile * Add logdir config param. Get tired because Snakemake doesn't support Path objects as input or log files. * Use pathlib for all paths. Add version printout to Snakefile * Add info about pathlib use * Add details on branching structure to CONTRIBUTING.md * Bump docs version
First public release
v0.1.0-dev [docs] Fix typo.