Skip to content

Commit

Permalink
Merge branch 'release/v4.3.1'
Browse files Browse the repository at this point in the history
  • Loading branch information
ACEnglish committed Sep 9, 2024
2 parents 812bb3e + 0f02337 commit e6a49e7
Show file tree
Hide file tree
Showing 64 changed files with 2,658 additions and 107 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
[![pylint](imgs/pylint.svg)](https://github.com/acenglish/truvari/actions/workflows/pylint.yml)
[![FuncTests](https://github.com/acenglish/truvari/actions/workflows/func_tests.yml/badge.svg?branch=develop&event=push)](https://github.com/acenglish/truvari/actions/workflows/func_tests.yml)
[![coverage](imgs/coverage.svg)](https://github.com/acenglish/truvari/actions/workflows/func_tests.yml)
[![develop](https://img.shields.io/github/commits-since/acenglish/truvari/v4.2.2)](https://github.com/ACEnglish/truvari/compare/v4.2.2...develop)
[![develop](https://img.shields.io/github/commits-since/acenglish/truvari/v4.3.0)](https://github.com/ACEnglish/truvari/compare/v4.3.0...develop)
[![Downloads](https://static.pepy.tech/badge/truvari)](https://pepy.tech/project/truvari)

![Logo](https://raw.githubusercontent.com/ACEnglish/truvari/develop/imgs/BoxScale1_DarkBG.png)
Expand Down Expand Up @@ -52,6 +52,7 @@ Use Truvari's comparison engine to consolidate redundant variants in a merged mu
- [segment](https://github.com/acenglish/truvari/wiki/segment) - Normalization of SVs into disjointed genomic regions
- [stratify](https://github.com/acenglish/truvari/wiki/stratify) - Count variants per-region in vcf
- [divide](https://github.com/ACEnglish/truvari/wiki/divide) - Divide a VCF into independent shards
- [ga4gh](https://github.com/ACEnglish/truvari/wiki/ga4gh) - Consolidate benchmarking result VCFs

## 🔎 More Information

Expand Down
30 changes: 30 additions & 0 deletions docs/v4.3.1/Citations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Citing Truvari

English, A.C., Menon, V.K., Gibbs, R.A. et al. Truvari: refined structural variant comparison preserves allelic diversity. Genome Biol 23, 271 (2022). https://doi.org/10.1186/s13059-022-02840-6

# Citations

List of publications using Truvari. Most of these are just pulled from a [Google Scholar Search](https://scholar.google.com/scholar?q=truvari). Please post in the [show-and-tell](https://github.com/spiralgenetics/truvari/discussions/categories/show-and-tell) to have your publication added to the list.
* [A robust benchmark for detection of germline large deletions and insertions](https://www.nature.com/articles/s41587-020-0538-8)
* [Leveraging a WGS compression and indexing format with dynamic graph references to call structural variants](https://www.biorxiv.org/content/10.1101/2020.04.24.060202v1.abstract)
* [Duphold: scalable, depth-based annotation and curation of high-confidence structural variant calls](https://academic.oup.com/gigascience/article/8/4/giz040/5477467?login=true)
* [Parliament2: Accurate structural variant calling at scale](https://academic.oup.com/gigascience/article/9/12/giaa145/6042728)
* [Learning What a Good Structural Variant Looks Like](https://www.biorxiv.org/content/10.1101/2020.05.22.111260v1.full)
* [Long-read trio sequencing of individuals with unsolved intellectual disability](https://www.nature.com/articles/s41431-020-00770-0)
* [lra: A long read aligner for sequences and contigs](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009078)
* [Samplot: a platform for structural variant visual validation and automated filtering](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02380-5)
* [AsmMix: A pipeline for high quality diploid de novo assembly](https://www.biorxiv.org/content/10.1101/2021.01.15.426893v1.abstract)
* [Accurate chromosome-scale haplotype-resolved assembly of human genomes](https://www.nature.com/articles/s41587-020-0711-0)
* [Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome](https://www.nature.com/articles/s41587-019-0217-9)
* [NPSV: A simulation-driven approach to genotyping structural variants in whole-genome sequencing data](https://academic.oup.com/bioinformatics/article-abstract/37/11/1497/5466452)
* [SVIM-asm: structural variant detection from haploid and diploid genome assemblies](https://academic.oup.com/bioinformatics/article/36/22-23/5519/6042701?login=true)
* [Readfish enables targeted nanopore sequencing of gigabase-sized genomes](https://www.nature.com/articles/s41587-020-00746-x)
* [stLFRsv: A Germline Structural Variant Analysis Pipeline Using Co-barcoded Reads](https://internal-journal.frontiersin.org/articles/10.3389/fgene.2021.636239/full)
* [Long-read-based human genomic structural variation detection with cuteSV](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02107-y)
* [An international virtual hackathon to build tools for the analysis of structural variants within species ranging from coronaviruses to vertebrates](https://f1000research.com/articles/10-246)
* [Paragraph: a graph-based structural variant genotyper for short-read sequence data](https://link.springer.com/article/10.1186/s13059-019-1909-7)
* [Genome-wide investigation identifies a rare copy-number variant burden associated with human spina bifida](https://www.nature.com/articles/s41436-021-01126-9)
* [TT-Mars: Structural Variants Assessment Based on Haplotype-resolved Assemblies](https://www.biorxiv.org/content/10.1101/2021.09.27.462044v1.abstract)
* [An ensemble deep learning framework to refine large deletions in linked-reads](https://www.biorxiv.org/content/10.1101/2021.09.27.462057v1.abstract)
* [MAMnet: detecting and genotyping deletions and insertions based on long reads and a deep learning approach](https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbac195/6587170)](https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbac195/6587170)
* [Automated filtering of genome-wide large deletions through an ensemble deep learning framework](https://www.sciencedirect.com/science/article/pii/S1046202322001712#b0110)
23 changes: 23 additions & 0 deletions docs/v4.3.1/Collapse-on-Regions-with-a-High‐Density-of-SVs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
Some regions of the reference genome can give rise to a huge number of SVs. These regions are typically in telomeres or centromeres, especially on chm13. Furthermore, some multi-sample VCFs might contain samples with a high genetic distance from the reference can also create an unreasonable number of SVs. These 'high-density' regions can cause difficulties for `truvari collapse` when there are too many comparisons that need to be made.

If you find `truvari collapse` 'freezing' during processing where it is no longer writing variants to the output VCFs, you should investigate if there are regions which should be excluded from analysis. To do this, first run:

```
truvari anno chunks input.vcf.gz > counts.bed
```

The `counts.bed` will have the chromosome, start, and end position of each chunk in the VCF as well as three additional columns. The 3rd column has the number of SVs inside the chunk while the 4th and 5th have a comma-separated counts of the number of variants in 'sub-chunks'. These sub-chunk counts correspond to advanced separation of the variants beyond just their window. The 4th is after separating by svtype and svlen, the 5th is re-chunking the svtype/svlen separation by distance.

If you find spans in the `counts.bed` with a huge number of SVs, these are prime candidates for exclusion to speed up `truvari collapse`. To exclude them, you first subset the regions of interest and then use `bedtools` to create a bed file that will skip them.

```
# exclude regions with more than 30k SVs. You should test this threshold.
awk '$4 >= 30000' counts.bed > to_exclude.bed
bedtools complement -g genome.bed -i to_exclude.bed > to_analyize.bed
truvari collapse --bed to_analyze.bed -i input.vcf.gz
```

When considering what threshold you would like to use, just looking at the 3rd column may not be sufficient as the 'sub-chunks' may be smaller and will therefore run faster. There also is no guarantee that a region with high SV density will be slow. For example, if there are 100k SVs in a window, but they all could collapse, it would only take O(N - 1) comparisons to perform the collapse. The problems arise when the 100k SVs have few redundant SVs and therefore requires O(N**2) comparisons.

A conservative workflow for figuring out which regions to exclude would run `truvari collapse` without a `--bed`, wait for regions to 'freeze', and then check if the last variant written out by `truvari collapse` is just before a high-density chunk in the `counts.bed`. That region could then be excluded an a collapsing could be repeated until success.
90 changes: 90 additions & 0 deletions docs/v4.3.1/Development.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Truvari API
Many of the helper methods/objects are documented such that developers can reuse truvari in their own code. To see developer documentation, visit [readthedocs](https://truvari.readthedocs.io/en/latest/).

Documentation can also be seen using
```python
import truvari
help(truvari)
```

# docker

A Dockerfile exists to build an image of Truvari. To make a Docker image, clone the repository and run
```bash
docker build -t truvari .
```

You can then run Truvari through docker using
```bash
docker run -v `pwd`:/data -it truvari
```
Where `pwd` can be whatever directory you'd like to mount in the docker to the path `/data/`, which is the working directory for the Truvari run. You can provide parameters directly to the entry point.
```bash
docker run -v `pwd`:/data -it truvari anno svinfo -i example.vcf.gz
```

If you'd like to interact within the docker container for things like running the CI/CD scripts
```bash
docker run -v `pwd`:/data --entrypoint /bin/bash -it truvari
```
You'll now be inside the container and can run FuncTests or run Truvari directly
```bash
bash repo_utils/truvari_ssshtests.sh
truvari anno svinfo -i example.vcf.gz
```

# CI/CD

Scripts that help ensure the tool's quality. Extra dependencies need to be installed in order to run Truvari's CI/CD scripts.

```bash
pip install pylint anybadge coverage
```

Check code formatting with
```bash
python repo_utils/pylint_maker.py
```
We use [autopep8](https://pypi.org/project/autopep8/) (via [vim-autopep8](https://github.com/tell-k/vim-autopep8)) for formatting.

Test the code and generate a coverage report with
```bash
bash repo_utils/truvari_ssshtests.sh
```

Truvari leverages github actions to perform these checks when new code is pushed to the repository. We've noticed that the actions sometimes hangs through no fault of the code. If this happens, cancel and resubmit the job. Once FuncTests are successful, it uploads an artifact of the `coverage html` report which you can download to see a line-by-line accounting of test coverage.

# git flow

To organize the commits for the repository, we use [git-flow](https://danielkummer.github.io/git-flow-cheatsheet/). Therefore, `develop` is the default branch, the latest tagged release is on `master`, and new, in-development features are within `feature/<name>`

When contributing to the code, be sure you're working off of develop and have run `git flow init`.

# versioning

Truvari uses [Semantic Versioning](https://semver.org/) and tries to stay compliant to [PEP440](https://peps.python.org/pep-0440/). As of v3.0.0, a single version is kept in the code under `truvari/__init__.__version__`. We try to keep the suffix `-dev` on the version in the develop branch. When cutting a new release, we may replace the suffix with `-rc` if we've built a release candidate that may need more testing/development. Once we've committed to a full release that will be pushed to PyPi, no suffix is placed on the version. If you install Truvari from the develop branch, the git repo hash is appended to the installed version as well as '.uc' if there are un-staged commits in the repo.

# docs

The github wiki serves the documentation most relevant to the `develop/` branch. When cutting a new release, we freeze and version the wiki's documentation with the helper utility `docs/freeze_wiki.sh`.

# Creating a release
Follow these steps to create a release

0) Bump release version
1) Run tests locally
2) Update API Docs
3) Change Updates Wiki
4) Freeze the Wiki
5) Ensure all code is checked in
6) Do a [git-flow release](https://danielkummer.github.io/git-flow-cheatsheet/)
7) Use github action to make a testpypi release
8) Check test release
```bash
python3 -m venv test_truvari
python3 -m pip install --index-url https://test.pypi.org/simple --extra-index-url https://pypi.org/simple/ truvari
```
9) Use GitHub action to make a pypi release
10) Download release-tarball.zip from step #9’s action
11) Create release (include #9) from the tag
12) Checkout develop and Bump to dev version and README ‘commits since’ badge
36 changes: 36 additions & 0 deletions docs/v4.3.1/Home.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
The wiki holds documentation most relevant for develop. For information on a specific version of Truvari, see [`docs/`](https://github.com/spiralgenetics/truvari/tree/develop/docs)

Citation:
English, A.C., Menon, V.K., Gibbs, R.A. et al. Truvari: refined structural variant comparison preserves allelic diversity. Genome Biol 23, 271 (2022). https://doi.org/10.1186/s13059-022-02840-6

# Before you start
VCFs aren't always created with a strong adherence to the format's specification.

Truvari expects input VCFs to be valid so that it will only output valid VCFs.

We've developed a separate tool that runs multiple validation programs and standard VCF parsing libraries in order to validate a VCF.

Run [this program](https://github.com/acenglish/usable_vcf) over any VCFs that are giving Truvari trouble.

Furthermore, Truvari expects 'resolved' SVs (e.g. DEL/INS) and will not interpret BND signals across SVTYPEs (e.g. combining two BND lines to match a DEL call). A brief description of Truvari bench methodology is linked below.

Finally, Truvari does not handle multi-allelic VCF entries and as of v4.0 will throw an error if multi-allelics are encountered. Please use `bcftools norm` to split multi-allelic entries.

# Index

- [[Updates|Updates]]
- [[Installation|Installation]]
- Truvari Commands:
- [[anno|anno]]
- [[bench|bench]]
- [[collapse|collapse]]
- [[consistency|consistency]]
- [[divide|divide]]
- [[ga4gh|ga4gh]]
- [[phab|phab]]
- [[refine|refine]]
- [[segment|segment]]
- [[stratify|stratify]]
- [[vcf2df|vcf2df]]
- [[Development|Development]]
- [[Citations|Citations]]
72 changes: 72 additions & 0 deletions docs/v4.3.1/Installation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
Recommended
===========
For stable versions of Truvari, use pip
```
python3 -m pip install truvari
```
Specific versions can be installed via
```
python3 -m pip install truvari==3.2.0
```
See [pypi](https://pypi.org/project/Truvari/#history) for a history of all distributed releases.

Manual Installation
===================
To build Truvari directly, clone the repository and switch to a specific tag.
```
git clone https://github.com/ACEnglish/truvari.git
git checkout tags/v3.0.0
python3 -m pip install .
```

To see a list of all available tags, run:
```
git tag -l
```

If you have an older clone of the repository and don't see the version you're looking for in tags, make sure to pull the latest changes:
```
git pull
git fetch --all --tags
```

Mamba / Conda
=============
NOTE!! There is a very old version of Truvari on bioconda that - for unknown reasons - supersedes the newer, supported versions. Users may need to specify to conda which release to build. See [this ticket](https://github.com/ACEnglish/truvari/issues/130#issuecomment-1196607866) for details.

Truvari releases are automatically deployed to bioconda.
Users can follow instructions here (https://mamba.readthedocs.io/en/latest/installation.html) to install mamba. (A faster alternative conda compatible package manager.)

Creating an environment with Truvari and its dependencies.
```
mamba create -c conda-forge -c bioconda -n truvari truvari
```

Alternatively, see the [conda page](https://anaconda.org/bioconda/truvari) for details
```
conda install -c bioconda truvari
```

Building from develop
=====================
The default branch is `develop`, which holds in-development changes. This is for developers or those wishing to try experimental features and is not recommended for production. Development is versioned higher than the most recent stable release with an added suffix (e.g. Current stable release is `3.0.0`, develop holds `3.1.0-dev`). If you'd like to install develop, repeat the steps above but without `git checkout tags/v3.0.0`. See [wiki](https://github.com/spiralgenetics/truvari/wiki/Development#git-flow) for details on how branching is handled.

Docker
======
See [Development](https://github.com/spiralgenetics/truvari/wiki/Development#docker) for details on building a docker container.

edlib error
===========
Some environments have a hard time installing edlib due to a possible cython bug ([source](https://bugs.launchpad.net/ubuntu/+source/cython/+bug/2006404)). If you seen an error such as the following:
```
edlib.bycython.cpp:198:12: fatal error: longintrepr.h: No such file or directory
198 | #include "longintrepr.h"
```

One method to prevent the error message is to run the following commands ([source](https://github.com/Martinsos/edlib/issues/212))
```bash
python3 -m venv my_env
source my_env/bin/activate
python3 -m pip install --no-cache-dir cython setuptools wheel
EDLIB_USE_CYTHON=1 python3 -m pip install truvari --no-cache
```
Loading

0 comments on commit e6a49e7

Please sign in to comment.