Although we recommend running Battenberg on WGS data to get accurate clonal and subclonal allele-specific copy-number alteration calls, ASCAT can still be used to get a fast ploidy/purity estimate. To this end, we pre-generated a set of loci, alonside with GC content and replication timing correction files.
Briefly, such list was derived from 1000 Genomes Project SNPs (hg19 and hg38):
- Biallelic SNPs with allele frequency higher than 0.35 and lower than 0.65 in any population were selected using BCFtools.
- Duplicated entries were removed using R.
- SNPs located in the ENCODE blacklisted regions were discared.
- SNPs with noisy BAF (distant from 0/0.5/1) in normal samples (a.k.a probloci) as part of the Battenberg package were discarded.
Since hg38 data for the non-PAR region of chrX is not available (as of September 2021), hg38 data for the whole chrX comes from a lift-over from hg19.
GC content and replication timing correction files were then generated using scripts provided in the LogRcorrection folder.
Data availability:
- Loci files: hg19 & hg38 (unzip and set
alleles.prefix="G1000_loci_hg19_chr"
inascat.prepareHTS
) - Allele files: hg19 & hg38 (unzip and set
loci.prefix="G1000_alleles_hg19_chr"
inascat.prepareHTS
) - GC correction file: hg19 & hg38 (unzip and set
GCcontentfile="GC_G1000_hg19.txt"
inascat.correctLogR
) - Replication timing correction file: hg19 & hg38 (unzip and set
replictimingfile="RT_G1000_hg19.txt"
inascat.correctLogR
)
Please note that loci files provided above are not 'chr'-based (chromosome names are '1', '2', '3', etc. and not 'chr1', 'chr2', 'chr3', etc.). If your BAMs are 'chr'-based, you will need to add 'chr' (Bash: for i in {1..22} X; do sed -i 's/^/chr/' G1000_loci_hg19_chr${i}.txt; done
). ASCAT will internally remove 'chr' so the other files (allele, GC correction and RT correction) should not be modified and chrom_names
(ascat.prepareHTS
) should be c(1:22,'X')
.