diff --git a/CHANGELOG.md b/CHANGELOG.md index 793e005802..f13a4cc7df 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,12 +7,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## dev -- [#1246](https://github.com/nf-core/sarek/pull/1246) - Back to dev - ### Added +- [#1246](https://github.com/nf-core/sarek/pull/1246) - Back to dev + ### Changed +- [#1248](https://github.com/nf-core/sarek/pull/1248) - Improve annotation-cache docs + ### Fixed - [#1247](https://github.com/nf-core/sarek/pull/1247) - FIX: Result paths for full size test to be correctly displayed on the website diff --git a/docs/usage.md b/docs/usage.md index 63e338c3c6..34811c8b4d 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -840,17 +840,17 @@ For GATK.GRCh38 the links for each reference file and the corresponding processe ## How to customise SnpEff and VEP annotation -SNPeff and VEP require a large resource of files known as a cache. +SNPeff and VEP both require a large resource of files known as a cache. These are folders composed of multiple gigabytes of files which need to be available for the software to properly function. To use these, supply the parameters `--vep_cache` and/or `--snpeff_cache` with the locations to the root of the annotation cache folder for each tool. ### Specify the cache location Params `--snpeff_cache` and `--vep_cache` are used to specify the locations to the root of the annotation cache folder. -The cache will be located within a subfolder with the path `${vep_species}/${vep_genome}_${vep_cache_version}` for VEP and `${snpeff_species}.${snpeff_version}` for SnpEff. +The cache will be located within a subfolder with the path `${snpeff_species}.${snpeff_version}` for SnpEff and `${vep_species}/${vep_genome}_${vep_cache_version}` for VEP. If this directory is missing, Sarek will raise an error. -For example this is a typical folder structure for GRCh38 and WBCel235, with SNPeff cache version 105 and VEP cache version 110: +For example this is a typical folder structure for `GRCh38` and `WBCel235`, with SNPeff cache version 105 and VEP cache version 110: ```text /data/ @@ -872,20 +872,20 @@ Both SnpEff and VEP will figure out internally the path towards the specific cac By default all is specified in the [igenomes.config](https://github.com/nf-core/sarek/blob/master/conf/igenomes.config) file. Explanation can be found for all params in the documentation: -- [snpeff_db](https://nf-co.re/sarek/latest/parameters#snpeff_db) -- [snpeff_genome](https://nf-co.re/sarek/latest/parameters#snpeff_genome) -- [vep_genome](https://nf-co.re/sarek/latest/parameters#vep_genome) -- [vep_species](https://nf-co.re/sarek/latest/parameters#vep_species) -- [vep_cache_version](https://nf-co.re/sarek/latest/parameters#vep_cache_version) +- [snpeff_db](https://nf-co.re/sarek/parameters#snpeff_db) +- [snpeff_genome](https://nf-co.re/sarek/parameters#snpeff_genome) +- [vep_genome](https://nf-co.re/sarek/parameters#vep_genome) +- [vep_species](https://nf-co.re/sarek/parameters#vep_species) +- [vep_cache_version](https://nf-co.re/sarek/parameters#vep_cache_version) With the previous example of `GRCh38`, these are the values that were used for these params: ```bash snpeff_db = '105' snpeff_genome = 'GRCh38' +vep_cache_version = '110' vep_genome = 'GRCh38' vep_species = 'homo_sapiens' -vep_cache_version = '110' ``` ### Usage recommendation with AWS iGenomes @@ -931,11 +931,11 @@ nextflow run nf-core/sarek \ These params can be specified in a config file or in a profile using the params scope, or even in a json or a yaml file using the `-params-file` nextflow option. Note: we recommend storing each annotation cache in a separate directory so each cache version is handled differently. -This may mean you have many similar directories but will dramatically reduce the storage burden on machines running the VEP or snpEff process. +This may mean you have many similar directories but will dramatically reduce the storage burden on machines running the SnpEff or VEP process. ### Use annotation-cache for SnpEff and VEP -[Annotation-cache](https://github.com/annotation-cache) is an open AWS registry resource that stores a mirror of some cache files on AWS S3 which can be used with Sarek. +[Annotation-cache](https://annotation-cache.github.io) is an open AWS registry resource that stores a mirror of some cache files on AWS S3 which can be used with Sarek. It contains some genome builds which can be found by checking the contents of the S3 bucket. SNPeff and VEP cache are stored at the following location on S3: @@ -954,7 +954,9 @@ aws s3 --no-sign-request ls s3://annotation-cache/vep_cache/ Since both Snpeff and VEP are internally figuring the path towards the specific cache version / species, `annotation-cache` is using an extra set of keys to specify the species and genome build. -So if you are using this resource, please either use the `--use_annotation_cache_keys`, or point towards the specific species, genome and build matches the directory structure within the cache. +So if you are using this resource, please either set `--use_annotation_cache_keys` to use the AWS annotation cache, or point towards your own cache folder structure matching the expected structure. + +Please refer to the [annotation-cache documentation](https://annotation-cache.github.io) for more details. ### Use Sarek to download cache and annotate in one go diff --git a/nextflow_schema.json b/nextflow_schema.json index 03ae8c15df..51850b6fa9 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -406,16 +406,14 @@ "fa_icon": "fas fa-file", "default": "s3://annotation-cache/vep_cache/", "description": "Path to VEP cache.", - "help_text": "Path to VEP cache which should contain the relevant species, genome and build directories at the path ${vep_species}/${vep_genome}_${vep_cache_version}", - "hidden": true + "help_text": "Path to VEP cache which should contain the relevant species, genome and build directories at the path ${vep_species}/${vep_genome}_${vep_cache_version}" }, "snpeff_cache": { "type": "string", "fa_icon": "fas fa-file", "default": "s3://annotation-cache/snpeff_cache/", "description": "Path to snpEff cache.", - "help_text": "Path to snpEff cache which should contain the relevant genome and build directory in the path ${snpeff_species}.${snpeff_version}", - "hidden": true + "help_text": "Path to snpEff cache which should contain the relevant genome and build directory in the path ${snpeff_species}.${snpeff_version}" }, "vep_include_fasta": { "type": "boolean", @@ -514,13 +512,12 @@ "default": "--everything --filter_common --per_gene --total_length --offline --format vcf", "fa_icon": "fas fa-toolbox", "description": "Add an extra custom argument to VEP.", - "hidden": true, "help_text": "Using this params you can add custom args to VEP." }, "use_annotation_cache_keys": { "type": "boolean", "fa_icon": "fas fa-toolbox", - "description": "Use annotation cache keys for snpeff_cache and vep_cache.", + "description": "Use annotation cache keys for snpeff_cache and vep_cache.\nOnly when using annotation-cache or a similar structure.\nSee [here](https://annotation-cache.github.io/) for more information.", "hidden": true }, "outdir_cache": { @@ -720,36 +717,31 @@ "type": "string", "fa_icon": "fas fa-database", "description": "snpEff DB version.", - "help_text": "If you use AWS iGenomes, this has already been set for you appropriately.\nThis is used to specify the database to be use to annotate with.\nAlternatively databases' names can be listed with the `snpEff databases`.", - "hidden": true + "help_text": "If you use AWS iGenomes, this has already been set for you appropriately.\nThis is used to specify the database to be use to annotate with.\nAlternatively databases' names can be listed with the `snpEff databases`." }, "snpeff_genome": { "type": "string", "fa_icon": "fas fa-microscope", "description": "snpEff genome.", - "help_text": "If you use AWS iGenomes, this has already been set for you appropriately.\nThis is used to specify the genome when using the container with pre-downloaded cache.", - "hidden": true + "help_text": "If you use AWS iGenomes, this has already been set for you appropriately.\nThis is used to specify the genome when using the container with pre-downloaded cache." }, "vep_genome": { "type": "string", "fa_icon": "fas fa-microscope", "description": "VEP genome.", - "help_text": "If you use AWS iGenomes, this has already been set for you appropriately.\nThis is used to specify the genome when using the container with pre-downloaded cache.", - "hidden": true + "help_text": "If you use AWS iGenomes, this has already been set for you appropriately.\nThis is used to specify the genome when using the container with pre-downloaded cache." }, "vep_species": { "type": "string", "fa_icon": "fas fa-microscope", "description": "VEP species.", - "help_text": "If you use AWS iGenomes, this has already been set for you appropriately.\nAlternatively species listed in Ensembl Genomes caches can be used.", - "hidden": true + "help_text": "If you use AWS iGenomes, this has already been set for you appropriately.\nAlternatively species listed in Ensembl Genomes caches can be used." }, "vep_cache_version": { "type": "number", "fa_icon": "fas fa-tag", "description": "VEP cache version.", - "help_text": "If you use AWS iGenomes, this has already been set for you appropriately.\nAlternatively cache version can be use to specify the correct Ensembl Genomes version number as these differ from the concurrent Ensembl/VEP version numbers", - "hidden": true + "help_text": "If you use AWS iGenomes, this has already been set for you appropriately.\nAlternatively cache version can be use to specify the correct Ensembl Genomes version number as these differ from the concurrent Ensembl/VEP version numbers" }, "save_reference": { "type": "boolean", diff --git a/workflows/sarek.nf b/workflows/sarek.nf index 13a2d8d73b..fdf42f447d 100644 --- a/workflows/sarek.nf +++ b/workflows/sarek.nf @@ -287,7 +287,7 @@ if (params.tools && (params.tools.split(',').contains('ascat') || params.tools.s } if ((params.download_cache) && (params.snpeff_cache || params.vep_cache)) { - error("Please specify either `--download_cache` or `--snpeff_cache`, `--vep_cache`.\nhttps://nf-co.re/sarek/dev/usage#how-to-customise-snpeff-and-vep-annotation") + error("Please specify either `--download_cache` or `--snpeff_cache`, `--vep_cache`.\nhttps://nf-co.re/sarek/usage#how-to-customise-snpeff-and-vep-annotation") } /* @@ -324,25 +324,45 @@ vep_species = params.vep_species ?: Channel.empty() // Initialize files channels based on params, not defined within the params.genomes[params.genome] scope if (params.snpeff_cache && params.tools && params.tools.contains("snpeff")) { - def snpeff_annotation_cache_key = params.use_annotation_cache_keys ? "${params.snpeff_genome}.${params.snpeff_db}/" : "" + if (params.snpeff_cache == "s3://annotation-cache/snpeff_cache") { + def snpeff_annotation_cache_key = "${params.snpeff_genome}.${params.snpeff_db}/" + } else { + def snpeff_annotation_cache_key = params.use_annotation_cache_keys ? "${params.snpeff_genome}.${params.snpeff_db}/" : "" + } def snpeff_cache_dir = "${snpeff_annotation_cache_key}${params.snpeff_genome}.${params.snpeff_db}" def snpeff_cache_path_full = file("$params.snpeff_cache/$snpeff_cache_dir", type: 'dir') if ( !snpeff_cache_path_full.exists() || !snpeff_cache_path_full.isDirectory() ) { - error("Files within --snpeff_cache invalid. Make sure there is a directory named ${snpeff_cache_dir} in ${params.snpeff_cache}.\nhttps://nf-co.re/sarek/dev/usage#how-to-customise-snpeff-and-vep-annotation") + if (params.snpeff_cache == "s3://annotation-cache/snpeff_cache") { + error("This path is not available within annotation-cache. Please check https://annotation-cache.github.io/ to create a request for it.") + } else { + error("Files within --snpeff_cache invalid. Make sure there is a directory named ${snpeff_cache_dir} in ${params.snpeff_cache}.\nhttps://nf-co.re/sarek/usage#how-to-customise-snpeff-and-vep-annotation") + } } snpeff_cache = Channel.fromPath(file("${params.snpeff_cache}/${snpeff_annotation_cache_key}"), checkIfExists: true).collect() .map{ cache -> [ [ id:"${params.snpeff_genome}.${params.snpeff_db}" ], cache ] } -} else snpeff_cache = [] + } else if (params.tools && params.tools.contains("snpeff") && !params.download_cache) { + error("No cache for SnpEff or automatic download of said cache has been detected.\nPlease refer to https://nf-co.re/sarek/docs/usage/#how-to-customise-snpeff-and-vep-annotation for more information.") + } else snpeff_cache = [] if (params.vep_cache && params.tools && params.tools.contains("vep")) { - def vep_annotation_cache_key = params.use_annotation_cache_keys ? "${params.vep_cache_version}_${params.vep_genome}/" : "" + if (params.vep_cache == "s3://annotation-cache/vep_cache") { + def vep_annotation_cache_key = "${params.vep_cache_version}_${params.vep_genome}/" + } else { + def vep_annotation_cache_key = params.use_annotation_cache_keys ? "${params.vep_cache_version}_${params.vep_genome}/" : "" + } def vep_cache_dir = "${vep_annotation_cache_key}${params.vep_species}/${params.vep_cache_version}_${params.vep_genome}" def vep_cache_path_full = file("$params.vep_cache/$vep_cache_dir", type: 'dir') if ( !vep_cache_path_full.exists() || !vep_cache_path_full.isDirectory() ) { - error("Files within --vep_cache invalid. Make sure there is a directory named ${vep_cache_dir} in ${params.vep_cache}.\nhttps://nf-co.re/sarek/dev/usage#how-to-customise-snpeff-and-vep-annotation") + if (params.vep_cache == "s3://annotation-cache/vep_cache") { + error("This path is not available within annotation-cache. Please check https://annotation-cache.github.io/ to create a request for it.") + } else { + error("Files within --vep_cache invalid. Make sure there is a directory named ${vep_cache_dir} in ${params.vep_cache}.\nhttps://nf-co.re/sarek/usage#how-to-customise-snpeff-and-vep-annotation") + } } vep_cache = Channel.fromPath(file("${params.vep_cache}/${vep_annotation_cache_key}"), checkIfExists: true).collect() -} else vep_cache = [] + } else if (params.tools && params.tools.contains("vep") && !params.download_cache) { + error("No cache for VEP or automatic download of said cache has been detected.\nPlease refer to https://nf-co.re/sarek/docs/usage/#how-to-customise-snpeff-and-vep-annotation for more information.") + } else vep_cache = [] vep_extra_files = []