Skip to content
/ gruyere Public

Genome-wide Rare Variant EnRichment Evaluation

Notifications You must be signed in to change notification settings

daklab/gruyere

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gruyere: Genome-wide Rare Variant EnRichment Evaluation

We develop a genome-wide rare variant association test designed for identifying trait-associated loci and functional annotations. This repository accompanies our recent preprint: Leveraging functional annotations to map rare variants associated with Alzheimer’s disease with gruyere.

gruyere model

Installation

gruyere is written in Python. You can load gruyere along with required dependencies with the following:

git clone https://github.com/daklab/gruyere.git
cd gruyere
pip install -r requirements.txt OR conda create --name gruyere --file requirements.txt

Overview: Inputs and Outputs

Model Inputs

  • G: Genotypes for N individuals and P variants [P x N]. Index should contain gene name that variant maps to. Can optionally include variant id "gene_variantID"
  • Z: Functional annotations for P variants and Q annotations [P x Q]. Index should contain gene name that variant maps to. Can optionally include variant id "gene_variantID"
  • XY: Individual-level covariates for N individuals and C covariates [NxC] and "Diagnosis" column for binary or continuous phenotypes

Model Outputs (Joint analysis)

  • alpha.csv: Learned covariate weights by gene
  • tau.csv: Learned genome-wide annotation weights
  • wg.csv: Learned gene weights (mean and standard deviation)
  • losses.txt: Loss per epoch
  • train_performance.csv: AUC and accuracy of predictions by gene on training set
  • test_performance.csv: AUC and accuracy of predictions by gene on held-out test set (optional; if using test set)

Model Outputs (Per-gene analysis)

  • pvals_chr{chromosome}.csv: gene p-values and coefficients for all genes in chromosome
  • preds_chr{chromosome}.csv: individual-level predictions for all genes in chromosome

Example:

  • example_data/inputs.yaml contains example inputs:
---
output: '../example_outputs/' # Path where outputs are saved
XY: '../example_data/XY.csv' # File with covariates (X) and phenotypes (Y)
G: '../example_data/genotypes/' # Path to genotypes, per chromosome
Z: '../example_data/annotations/' # Path to annotations, per chromosome
epochs: 300 
n_samples: 50 # Number of times to sample the posterior to determine mean/standard deviation estimates
test_prop: 0.2 # Test set proportion
lr: 0.1
genes: '../example_data/joint_analysis_genes.txt' # List of genes to perform joint analysis on (we use FST-significant genes)
simulate: False 

Scripts

  • models.py: contains gruyere model class
  • data_class: processes input data and stores as dataclass object
  • load_data: functions to load input data
  • utils.py: utility functions
  • performance.py: calculates AUROC and accuracy on gruyere predictions
  • gruyere_joint.py: fits joint gruyere model
  • gruyere_pergene.py: fits per-gene gruyere regression

Run gruyere

python src/gruyere_joint.py example_data/inputs.yaml
python src/gruyere_pergene.py example_data/inputs.yaml $CHR # For each chromosome

About

Genome-wide Rare Variant EnRichment Evaluation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published