We developed a statistical method, termed "aggregate mutation spectrum distance" (AMSD), to detect loci that are associated with mutation spectrum variation in recombinant inbred lines (RILs) (Figure {@fig:distance-method}; Materials and Methods).
Our approach leverages the fact that mutator alleles often leave behind distinct and detectable impressions on the mutation spectrum, even if they increase the overall mutation rate by a relatively small amount.
Given a population of haplotypes, we assume that each has been genotyped at the same collection of biallelic loci and that each harbors de novo mutations which have been partitioned by
Using simulated data, we find that our method's power is primarily limited by the initial mutation rate of the
{#fig:distance-method width=7.5in}
We applied our aggregate mutation spectrum distance method to 117 BXDs (Materials and Methods) with a total of 65,552 de novo germline mutations [@PMID:35545679].
Using mutation data that were partitioned by 1-mer nucleotide context, we discovered a locus on chromosome 4 that was significantly associated with mutation spectrum variation (Figure {@fig:distance-results}a; maximum adjusted cosine distance of 1.20e-2 at marker ID rs27509845
; position 118.28 Mbp in GRCm38/mm10 coordinates; 90% bootstrap confidence interval from 114.79 - 118.75 Mbp).
{#fig:distance-results width=7.5in}
Using quantitative trait locus (QTL) mapping, we previously identified a nearly-identical locus on chromosome 4 that was significantly associated with the C>A germline mutation rate in the BXDs [@PMID:35545679].
This locus overlapped 21 protein-coding genes that were annotated by the Gene Ontology as being involved in "DNA repair," but only one of those genes contained nonsynonymous differences between the two parental strains: Mutyh.
Mutyh encodes a protein involved in the base-excision repair of 8-oxoguanine (8-oxoG), a DNA lesion caused by oxidative damage, and prevents the accumulation of C>A mutations [@PMID:28551381;@PMID:28127763;@PMID:17581577].
C>A germline mutation fractions are nearly 50% higher in BXDs that inherit D genotypes at marker ID rs27509845
(the marker at which we observed the highest adjusted cosine distance on chromosome 4) than in those that inherit B genotypes (Figure @fig:spectra-comparison) [@PMID:35545679].
After confirming that AMSD could recover the mutator locus overlapping Mutyh, we tested its ability to identify additional mutator loci in the BXDs.
To eliminate potential confounding of the mutation spectrum landscape by the large-effect mutator locus on chromosome 4, we performed AMSD scans that were conditional on the presence of either D or B alleles at rs27509845
.
We also hypothesized that such conditioning might reveal epistatic interactions between alleles at the chromosome 4 locus and mutator alleles elsewhere in the genome.
Specifically, we divided the BXDs into those with either D (n = 66) or B (n = 44) genotypes at rs27509845
(n = 7 BXDs were heterozygous) and ran an aggregate mutation spectrum distance scan using each group separately (Figure {@fig:distance-results}b-c).
We excluded the BXD68 RIL from these scans, since we previously found that BXD68 harbors a strain-private C>A mutator allele of even larger effect [@PMID:35545679].
Using the BXDs with D genotypes at rs27509845
, we identified a locus on chromosome 6 that was significantly associated with mutation spectrum variation (Figure {@fig:distance-results}b; maximum adjusted cosine distance of 3.69e-3 at marker rs46276051
; position 111.27 Mbp in GRCm38/mm10 coordinates; 90% bootstrap confidence interval from 95.01 - 114.02 Mbp).
This signal was specific to BXDs with D genotypes at the rs27509845
locus, as we did not observe any new mutator loci after performing an AMSD scan using BXDs with B genotypes at rs27509845
(Figure {@fig:distance-results}c).
The peak markers on chromosome 4 and 6 did not exhibit strong linkage disequilibrium (
We queried the region surrounding the top marker on chromosome 6 (+/- the 90% bootstrap confidence interval) and discovered 64 protein-coding genes, of which four were annotated with a Gene Ontology (GO) [@PMID:10802651;@PMID:33290552] term related to "DNA repair": Fancd2, Ogg1, Setmar, and Rad18. None of the remaining genes were annotated with a cellular function that would obviously contribute to a germline mutator phenotype; however, many of these GO annotations are imperfect and/or incomplete. Although we focus our analysis on DNA repair genes, it remains possible that other genes within the confidence interval could underlie the C>A mutator phenotype we identified in the BXDs.
Of the annotated DNA repair genes within the confidence interval, two harbored nonsynonymous differences between the parental C57BL/6J and DBA/2J strains (Table @tbl:nonsyn-diffs). Ogg1 encodes a key member of the base-excision repair response to oxidative DNA damage (a pathway that also includes Mutyh), and in mice Setmar encodes a SET domain-containing histone methyltransferase; both Ogg1 and Setmar are expressed in mouse gonadal cells. Because the bootstrap can exhibit poor coverage in QTL mapping studies [@PMID:16783000], we also scanned an interval +/- 5 Mbp from the peak AMSD marker on chromosome 6 for additional candidate genes. Although the choice of a 10 Mbp interval is somewhat arbitrary, the interval does contain a plausible candidate: Mbd4, a protein-coding gene involved in base excision repair that also harbors a non-synonymous difference between the BXD parental strains (Table @tbl:nonsyn-diffs).
Gene name | Ensembl transcript name | Nucleotide change | Amino acid change | Position in GRCm38/mm10 coordinates | PhyloP conservation score | SIFT prediction |
---|---|---|---|---|---|---|
Setmar | ENSMUST00000049246 | C>T | p.Leu103Phe | chr6:108,075,853 | 0.422 | 0.0 (intolerant/deleterious) |
Setmar | ENSMUST00000049246 | T>G | p.Ser273Arg | chr6:108,076,365 | -0.355 | 0.3 (tolerant/benign) |
Ogg1 | ENSMUST00000032406 | A>G | p.Thr95Ala | chr6:113,328,510 | -0.016 | 0.84 (tolerant/benign) |
Mbd4 | ENSMUST00000032469 | C>T | p.Asp129Asn | chr6:115,849,644 | 2.28 | 0.02 (intolerant/deleterious) |
Table: Nonsynonymous mutations in DNA repair genes near the chr6 peak {#tbl:nonsyn-diffs}
We also considered the possibility that expression quantitative trait loci (eQTLs), rather than nonsynonymous mutations, could contribute to the C>A mutator phenotype associated with the locus on chromosome 6. Using GeneNetwork [@PMID:27933521] we mapped eQTLs for the five aforementioned DNA repair genes (as well as Mbd4) in a number of tissues, though we did not have access to expression data from germline cells. Notably, D alleles near the cosine distance peak on chromosome 6 were significantly associated with decreased Ogg1 expression in kidney, liver, hippocampus, and gastrointestinal tissues (Table @tbl:eqtl-results). Although these cis-eQTLs are challenging to interpret (given their tissue specificity and our lack of access to germline expression data), the presence of strong-effect cis-eQTLs for Ogg1 suggests that the C>A mutator phenotype observed in the BXDs may be mediated by regulatory, rather than protein-altering, variants.
Finally, we queried a dataset of structural variants (SVs) identified via high-quality, long-read assembly of inbred laboratory mouse strains [@doi:10.1016/j.xgen.2023.100291] and found 176 large insertions or deletions (>100 bp) within the 90% bootstrap confidence interval around the cosine distance peak on chromosome 6; none overlapped the exonic sequences of protein-coding genes.
One protein-coding gene involved in DNA repair (Rad18) harbored an intronic deletion within the interval on chromosome 6 (chr6:112,629,618-112,636,619); however, additional experimental evidence will be needed to probe the functional impact of this structural variant.
Next, we more precisely characterized the effects of the chromosome 4 and 6 mutator alleles on mutation spectra in the BXDs.
To pinpoint the mutation type(s) that underlied the significant cosine distance peak on chromosome 6, we compared the aggregate counts of each 1-mer mutation type (plus CpG>TpG) on BXD haplotypes with D genotypes at rs27509845
and either D or B genotypes at rs46276051
.
We found that C>A mutations were significantly enriched on BXD haplotypes with D genotypes at the chromosome 6 mutator locus, relative to those with B genotypes (
We also used SigProfilerExtractor [@PMID:36388765] to assign the germline mutations in each BXD to single-base substitution (SBS) mutation signatures from the COSMIC catalog [@PMID:30371878]. Mutation signatures often reflect specific exogenous or endogenous sources of DNA damage, and the proportions of mutations attributable to particular SBS signatures can suggest a genetic or environmental etiology. The SBS1, SBS5, and SBS30 mutation signatures were active in nearly all BXDs, regardless of genotypes at the chromosome 4 and 6 mutator loci (Figure {@fig:spectra-comparison}c). However, the SBS18 signature, which is dominated by C>A mutations and likely reflects unrepaired DNA damage from reactive oxygen species, was almost exclusively active in mice with D alleles at the chromosome 4 locus; the highest SBS18 activity was observed in mice with D alleles at both mutator loci (Figure {@fig:spectra-comparison}c). SBS18 activity was lowest in mice with D alleles at the chromosome 6 mutator locus alone (Figure {@fig:spectra-comparison}c), further demonstrating that D alleles at this locus are not sufficient to cause a mutator phenotype.
To more formally test for statistical epistasis, we fit a generalized (Poisson) linear model predicting counts of C>A mutations in each BXD as a function of genotypes at rs27509845
and rs46276051
(the markers with the largest adjusted cosine distance at the two mutator loci); the model also accounted for differences in inbreeding duration and sequencing coverage between the BXDs (Materials and Methods).
A model that included an interaction term between genotypes at the two markers fit the data significantly better than a model including only additive effects (p = 7.92e-7; Materials and Methods), indicating that the combined effects of D genotypes at both loci exceeded the sum of marginal effects of D genotypes at either locus alone.
{#fig:spectra-comparison width=7.5in}
To explore the effects of the two mutator loci in other inbred laboratory mice, we also compared the germline mutation spectra of Sanger Mouse Genomes Project (MGP) strains [@PMID:21921910]. Dumont [@PMID:30753674] previously identified germline mutations that were private to each of the 29 MGP strains; these private variants likely represent recent de novo mutations (Figure {@fig:spectra-comparison-mgp}). Only two of the MGP strains possess D genotypes at both the chromosome 4 and chromosome 6 mutator loci: DBA/1J and DBA/2J. As before, we tested for epistasis in the MGP strains by fitting two linear models predicting C>A mutation counts as a function of genotypes at the two mutator loci. A model incorporating an interaction term did not fit the MGP data significantly better than a model with additive effects alone (p = 0.806), so we are unable to confirm the signal of epistasis; however, this may be due to the smaller number of MGP strains with de novo germline mutation data.
To determine whether the candidate mutator alleles on chromosome 6 were segregating in natural populations, we queried previously published sequencing data generated from 67 wild-derived mice [@PMID:27622383]. These data include three subspecies of Mus musculus, as well as the outgroup Mus spretus. We found that the Ogg1 D allele was segregating at an allele frequency of 0.259 in Mus musculus domesticus, the species from which C57BL/6J and DBA/2J derive the majority of their genomes [@PMID:17660819], and was fixed in Mus musculus musculus, Mus musculus castaneus, and the outgroup Mus spretus (Figure @fig:wild-afs). The Setmar p.Ser273Arg D allele was also present at an allele frequency of 0.37 in Mus musculus domesticus, while D alleles at the Setmar p.Leu103Phe variant were not observed in any wild Mus musculus domesticus animals. D alleles at the Mbd4 p.Asp129Asn variant were also absent from all wild mouse populations (Figure @fig:wild-afs).