Skip to content

Commit

Permalink
Revamp manuscripts documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
nsheff committed Jun 13, 2024
1 parent 20ae12f commit a98c3fc
Show file tree
Hide file tree
Showing 7 changed files with 36 additions and 11 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@

This paper was our first publication showing how to build and evaluate region set embeddings using region-set2vec, based on word2vec.

See: [train Region2Vec embeddings](../tutorials/region2vec.md)
See: [train Region2Vec embeddings](../geniml/tutorials/region2vec.md)
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,5 @@ As available genomic interval data increase in scale, we require fast systems to

This paper trained BEDspace models (using StarSpace with BED files). See these tutorials:

- [How to use BEDSpace to jointly embed regions and metadata](../tutorials/bedspace.md)
- [How to use BEDSpace to jointly embed regions and metadata](../geniml/tutorials/bedspace.md)

14 changes: 14 additions & 0 deletions docs/manuscripts/gu2021.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Bedshift: perturbation of genomic interval sets

Paper: [Manuscript at Genome Biology](https://doi.org/10.1186/s13059-021-02440-w)


## Abstract

Functional genomics experiments, like ChIP-Seq or ATAC-Seq, produce results that are summarized as a region set. There is no way to objectively evaluate the effectiveness of region set similarity metrics. We present Bedshift, a tool for perturbing BED files by randomly shifting, adding, and dropping regions from a reference file. The perturbed files can be used to benchmark similarity metrics, as well as for other applications. We highlight differences in behavior between metrics, such as that the Jaccard score is most sensitive to added or dropped regions, while coverage score is most sensitive to shifted regions.

## Relevant tutorials

Analysis from the paper is described in these tutorials:

- [Randomizing BED files with BEDshift](../geniml/tutorials/bedshift.md)
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,11 @@ Paper: [Manuscript at bioRxiv](http://dx.doi.org/10.1101/2023.08.01.551452)
**Motivation** Data from the single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) is now widely available. One major computational challenge is dealing with high dimensionality and inherent sparsity, which is typically addressed by producing lower-dimensional representations of single cells for downstream clustering tasks. Current approaches produce such individual cell embeddings directly through a one-step learning process. Here, we propose an alternative approach by building embedding models pre-trained on reference data. We argue that this provides a more flexible analysis workflow that also has computational performance advantages through transfer learning.

**Results** We implemented our approach in scEmbed, an unsupervised machine learning framework that learns low-dimensional embeddings of genomic regulatory regions to represent and analyze scATAC-seq data. scEmbed performs well in terms of clustering ability and has the key advantage of learning patterns of region co-occurrence that can be transferred to other, unseen datasets. Moreover, pre-trained models on reference data can be exploited to build fast and accurate cell-type annotation systems without the need for other data modalities. scEmbed is implemented in Python and it is available to download from GitHub. We also make our pre-trained models available on huggingface for public use.

## Relevant tutorials

Analysis from the paper is described in these tutorials:

- [Train single-cell embeddings](../geniml/tutorials/train-scembed-model.md)
- [Populate a vector store](../geniml/tutorials/load-qdrant-with-cell-embeddings.md)
- [Predict cell-types using KNN](../geniml/tutorials/cell-type-annotation-with-knn.md)
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@ This paper published 2 types of method: 1. Methods to *construct* a universe, an

You can construct a universe either on the command line, or using geniml as a library:

- [Create consensus peaks with CLI](../tutorials/create-consensus-peaks.md)
- [Create consensus peaks with Python](../code/create-consensus-peaks-python.md)
- [Create consensus peaks with CLI](../geniml/tutorials/create-consensus-peaks.md)
- [Create consensus peaks with Python](../geniml/code/create-consensus-peaks-python.md)

### 2. Evaluating a universe

The main methods are implemented in the `assess-universe` model with tutorial:

- [Assess universe fit tutorial](../tutorials/assess-universe.md)
- [Assess universe fit tutorial](../geniml/tutorials/assess-universe.md)
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,6 @@ Representation learning models have become a mainstay of modern genomics. These

## Relevant tutorials

To evaluate, refer to this tutorial: https://github.com/databio/region2vec_eval
Analysis from the paper is described in these tutorials:

- [How to evalute embeddings](../geniml/tutorials/evaluation.md)
11 changes: 6 additions & 5 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -119,11 +119,12 @@ nav:
- How to cite:
- How to cite: citations.md
- Published manuscripts:
- Gharavi et al. 2021: geniml/manuscripts/gharavi2021.md
- Rymuza et al. 2024: geniml/manuscripts/rymuza2024.md
- Gharavi et al. 2024: geniml/manuscripts/gharavi2024.md
- LeRoy et al. 2024: geniml/manuscripts/leroy2024.md
- Zheng et al. 2024: geniml/manuscripts/zheng2024.md
- Gharavi et al. 2021: manuscripts/gharavi2021.md
- Gu et al. 2021: manuscripts/gu2021.md
- Rymuza et al. 2024: manuscripts/rymuza2024.md
- Gharavi et al. 2024: manuscripts/gharavi2024.md
- LeRoy et al. 2024: manuscripts/leroy2024.md
- Zheng et al. 2024: manuscripts/zheng2024.md

autodoc:
jupyter:
Expand Down

0 comments on commit a98c3fc

Please sign in to comment.