Skip to content
This repository has been archived by the owner on Aug 23, 2024. It is now read-only.

Reference archives

AndyMenzies edited this page Apr 28, 2022 · 7 revisions

We support a limited number of pre-constructed reference sets.

Pre-generated reference sets

Human GRCh37

Core reference files for fragment based analysis (2.x +):

ftp://ftp.sanger.ac.uk/pub/cancer/dockstore/human/core_ref_GRCh37d5.tar.gz
ftp://ftp.sanger.ac.uk/pub/cancer/dockstore/human/VAGrENT_ref_GRCh37d5_ensembl_75.tar.gz
ftp://ftp.sanger.ac.uk/pub/cancer/dockstore/human/SNV_INDEL_ref_GRCh37d5-fragment.tar.gz
ftp://ftp.sanger.ac.uk/pub/cancer/dockstore/human/CNV_SV_ref_GRCh37d5_brass6+.tar.gz

SNV/INDEL pre 2.x:

ftp://ftp.sanger.ac.uk/pub/cancer/dockstore/human/SNV_INDEL_ref_GRCh37d5.tar.gz

If you are using pre 2.1.x releases you need the subclonal calling reference set to support cgpBattenberg. If not needed you can omit this by specifying -skipbb:

ftp://ftp.sanger.ac.uk/pub/cancer/dockstore/human/SUBCL_ref_GRCh37d5.tar.gz

Human GRCh38

Please use version 2.1.0+ releases when processing GRCh38. The "current" version of reference files can be found here:

ftp://ftp.sanger.ac.uk/pub/cancer/dockstore/human/GRCh38_hla_decoy_ebv/CNV_SV_ref_GRCh38_hla_decoy_ebv_brass6+.tar.gz
ftp://ftp.sanger.ac.uk/pub/cancer/dockstore/human/GRCh38_hla_decoy_ebv/core_ref_GRCh38_hla_decoy_ebv.tar.gz
ftp://ftp.sanger.ac.uk/pub/cancer/dockstore/human/GRCh38_hla_decoy_ebv/qcGenotype_GRCh38_hla_decoy_ebv.tar.gz
ftp://ftp.sanger.ac.uk/pub/cancer/dockstore/human/GRCh38_hla_decoy_ebv/SNV_INDEL_ref_GRCh38_hla_decoy_ebv-fragment.tar.gz
ftp://ftp.sanger.ac.uk/pub/cancer/dockstore/human/GRCh38_hla_decoy_ebv/VAGrENT_ref_GRCh38_hla_decoy_ebv_ensembl_91.tar.gz

All releases of reference file are stored in a YYYYMM sub-folder, if you want consistent execution and further details please see the README:

ftp://ftp.sanger.ac.uk/pub/cancer/dockstore/human/GRCh38_hla_decoy_ebv/README.md

Generating alternative reference sets

You are able to build your own reference bundles for different species and naming conventions. You need to construct archives emulating those above (the first element of the path within the archive is dropped in all cases).

cgpwgs is an aggregate container holding multiple algorithms and each will needs its own reference requirements satisfied. We make use of many external resources like UCSC or Ensembl, if the species/build you want to use isn't supported by those resources you will probably struggle to generate the required reference files

  • cgpmap - while cgpmap is not in cgpwgs some of its reference files are shared by many of the algorithms
  • cgpCaVEManWrapper - The cgpCaVEManWrapper paper is linked from its github wiki, see Support Protocol 2
  • cgpPindel - The cgpPindel paper is linked from its github wiki, see Support Protocol 2
  • VAGrENT - Vagrent has a reference generation script detailed in its github wiki
  • BRASS - The Brass reference generation is explained in its github wiki
  • ascatNgs - The Ascat reference generation is explained in its github wiki