Passenger Hotspots

Code to reproduce the figures and results from the manuscript "Passenger Hotspot Mutations in Cancer" (Hess et al. 2019, https://doi.org/10.1016/j.ccell.2019.08.002)

What's in this repo?

This repository contains:

MATLAB notebooks to reproduce all of the analyses from the manuscript in PHS/
Functions specific to this manuscript invoked by notebooks in misc_functions/
Code required to compile and run the log-normal-Poisson regression model in LNP/
Code required for these analyses but not specific to this manuscript in funcs/

What's not in this repo?

Any reference data used by these analyses, which totals approximately 19 GB. These data are hosted in a Google storage bucket (gs://getzlab-passengerhotspots); to download this, please install gsutil and run (in the root directory of this repo):

gsutil -m cp -r gs://getzlab-passengerhotspots/* ref

Any dbGaP protected data, which will have to be obtained by individuals with appropriate authorization.

All external reference data are assumed by the code to reside in the ref/ directory, and external mutation data in the mutation_data/ directory. Protected data, when present, is clearly denoted in the code.

How to run this code

All notebooks are intended to be run interactively in the MATLAB console. Start MATLAB (any version R2014b or newer should work) in the root directory of this repo — this is necessary for startup.m to properly add dependencies to the MATLAB path. Code tested to work only under 64 bit Linux; other architectures may work after recompiling C/C++ .mex files.

What's in each notebook?

The notebooks are roughly sorted in the order in which they should be run. However, grouping similar code together takes precedence over having a perfectly linear notebook structure.

Sections of the code that produce figures and tables are clearly denoted in the code as, e.g., "Figure 3A", "Figure S5C", or "Table S1".

00_process_maf
- Take the raw mutations calls (in TSV format) and convert them to a format optimized for downstream analysis.
- Filter regions of low mappability; apply addition panel-of-normals filtering
- Split into subcohorts whose patients' tumors are dominated by specific mutational processes
- Generate Figure S4, plots of the trinucleotide context distributions for mutations in these subcohorts
01_process_territory
- Calculate regions of low mappability
- Process regression covariates
- Tabulate sequence context territories
02_run_sig_algos
- Run each of the four significance methods on the mutation dataset
05_compute_LNP_posterior_predictives
- Compute p-values for LNP regression
- Generate Figure S5C, contrasting the effect of incorporating APOBEC3A substrate optimality (Buisson et al. 2019) on LNP p-values
10_sig_analysis_and_q_scatter
- Generate Figure S2A, scatterplots contrasting methods' p-values
12_histogram_figure
- Generate Figure 3A, the observed fraction of synonymous hotspots and expected fractions predicted by each model
- Generate Figure 3B, visualizing probabilities of specific driver/passenger mutations as predicted by each model
20_tabulate_effect_territories
- Tabulate joint sequence context/protein coding effect territories (e.g., number of T(C->G)A mutations that cause synonymous mutations)
21_global_effect_analysis
- Compute expected fraction of protein coding effects given a set of mutations, if those mutations were randomly distributed throughout the exome
- Generate Figure 1, contrasting the observed/expected fraction of protein coding effects for hotspot mutations significant by each model
- Generate Figure S1, showing the distribution of protein coding effects for each trinucleotide context
22_gene_dNdS_analysis
- Compute somatic dN/dS for each gene (for true/false positive truth sets)
- Generate Table S1
- Generate Table S3
23_ROC-FDR
- Generate Figure 2A, receiver-operator characteristic analysis of the four methods
- Generate Figure 2B, empirical FDR analysis of the methods
- Generate Figure S2B, ROC plots using an alternate true positive truth set
24_qq
- Generate Figure S2C, quantile-quantile plots of the four methods' p-values
30_hypermut
- Generate Figure S5A, contrasting properties of mutational processes in hypermutated vs. non-hypermutated samples
50_signature_analysis
- Generate Figure 4A, visualizing log-normal-Poisson posterior distributions for different mutations processes
- Generate Figure 4B, computing variance explained by each genomic covariate for different mutations processes
- Generate Figure S5B, computing the variance explained by adding XR-seq coverage as a covariate to the UV mutational process

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LNP @ 0bf0abe		LNP @ 0bf0abe
PHS		PHS
funcs @ 296448f		funcs @ 296448f
misc_functions		misc_functions
ref		ref
.gitmodules		.gitmodules
README.md		README.md
java.opts		java.opts
javaclasspath.txt		javaclasspath.txt
startup.m		startup.m

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Passenger Hotspots

What's in this repo?

What's not in this repo?

How to run this code

What's in each notebook?

About

Releases

Packages

Languages

broadinstitute/getzlab-PHS

Folders and files

Latest commit

History

Repository files navigation

Passenger Hotspots

What's in this repo?

What's not in this repo?

How to run this code

What's in each notebook?

About

Resources

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages