Skip to content
forked from calico/borzoi

RNA-seq prediction with deep convolutional neural networks.

License

Notifications You must be signed in to change notification settings

othertea/borzoi

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Borzoi - Predicting RNA-seq from DNA Sequence

Code repository for Borzoi models, which are convolutional neural networks trained to predict RNA-seq coverage at 32bp resolution given 524kb input sequences. The model is described in the following bioRxiv preprint:

https://www.biorxiv.org/content/10.1101/2023.08.30.555582v1.

Borzoi was trained on a large set of RNA-seq experiments from ENCODE and GTEx, as well as re-processed versions of the original Enformer training data (including ChIP-seq and DNase data from ENCODE, ATAC-seq data from CATlas, and CAGE data from FANTOM5). Click here for a list of trained-on experiments.

The repository contains example usage code (including jupyter notebooks for predicting and visualizing genetic variants) as well as links for downloading model weights, training data, QTL benchmark tasks, etc.

Contact drk (at) @calicolabs.com or jlinder (at) @calicolabs.com for questions about the model or data.

Installation

Borzoi depends on the baskerville repository, which can be installed by issuing the following commands:

git clone https://github.com/calico/baskerville.git
cd baskerville
pip install -e .

Next, install the borzoi repository by issuing the following commands:

git clone https://github.com/calico/borzoi.git
cd borzoi
pip install -e .

These repositories further depend on a number of python packages (which are automatically installed with borzoi). See setup.cfg for a complete list. The most important version dependencies are:

Note: The example notebooks require jupyter, which can be installed with pip install notebook.
A new conda environment can be created with conda create -n borzoi_py39 python=3.9.

Model Availability

The model weights can be downloaded as .h5 files from the following URLs:

Borzoi V2 Cross-fold 0
Borzoi V2 Cross-fold 1
Borzoi V2 Cross-fold 2
Borzoi V2 Cross-fold 3

Mini Borzoi Models

We have trained a collection of (smaller) model instances on various subsets of data modalities (or on all data modalities but with architectural changes compared to the original architecture). For example, some models are trained only on RNA-seq data while others are trained on DNase-, ATAC- and RNA-seq. Similarly, some model instances are trained on human-only data while others are trained on human- and mouse data. The models are available at the URL below:

Mini Borzoi Model Collection

For example, here are the weights, targets, and parameter file of a model trained on K562 RNA-seq:

Borzoi K562 RNA-seq Fold 0
Borzoi K562 RNA-seq Fold 1
Borzoi K562 RNA-seq Targets
Borzoi K562 RNA-seq Parameters

Data Availability

The training data for Borzoi can be downloaded from the following URL:

Borzoi V2 Training Data

Note: This data bucket is very large and thus set to "Requester Pays".

QTL Availability

The curated e-/s-/pa-/ipaQTL benchmarking data can be downloaded from the following URLs:

eQTL Data
sQTL Data
paQTL Data
ipaQTL Data

Example Notebooks

The following notebooks contain example code for predicting and interpreting genetic variants.

Notebook 1a: Interpret eQTL SNP (expression)
Notebook 1b: Interpret sQTL SNP (splicing)
Notebook 1c: Interpret paQTL SNP (polyadenylation)
Notebook 1d: Interpret ipaQTL SNP (splicing and polya)

About

RNA-seq prediction with deep convolutional neural networks.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 60.6%
  • Jupyter Notebook 39.2%
  • Makefile 0.2%