📊 Long-read SV Benchmark CMRG/GIAB

This is a reproducible benchmark of structural variants for two benchmark datasets:

Challenging Medically Relevant Genes. The truth set is described in detail here:

Curated variation benchmarks for challenging medically relevant autosomal genes. Wagner et al., 2022. Nature Biotechnology

The Genome In A Bottle (GIAB) truthset v1.1 is also assessed.

GIAB

Requirements

Linux machine
~200 Gb space, 32 gb Ram, 4 cores
Nextflow

ONT Results:

caller	platform	Benchmark	TP	FP	FN	precision	recall	f1	gt_concordance
sniffles	ont	CMRG	194	18	23	0.9151	0.894	0.9044	0.8763
cuteSV	ont	CMRG	197	10	20	0.9517	0.9078	0.9292	0.8934
severus	ont	CMRG	194	10	23	0.951	0.894	0.9216	0.5722
delly	ont	CMRG	194	18	23	0.9151	0.894	0.9044	0.8866
dysgu	ont	CMRG	203	16	14	0.9269	0.9355	0.9312	0.9015
sniffles	ont	GIAB	18382	1227	10164	0.9374	0.6439	0.7635	-
cuteSV	ont	GIAB	19101	1477	9738	0.9282	0.6623	0.7731	-
severus	ont	GIAB	17101	1170	10605	0.936	0.6172	0.7439	-
delly	ont	GIAB	17695	2516	9798	0.8755	0.6436	0.7419	-
dysgu	ont	GIAB	20370	1980	8176	0.9114	0.7136	0.8005	-

PacBio Results:

caller	platform	Benchmark	TP	FP	FN	precision	recall	f1	gt_concordance
sniffles	pacbio	CMRG	191	9	26	0.955	0.8802	0.9161	0.9162
cuteSV	pacbio	CMRG	186	7	31	0.9637	0.8571	0.9073	0.8871
severus	pacbio	CMRG	182	0	35	1	0.8387	0.9123	0.5495
delly	pacbio	CMRG	184	9	33	0.9534	0.8479	0.8976	0.8859
sawfish	pacbio	CMRG	205	0	12	1	0.9447	0.9716	0.961
dysgu	pacbio	CMRG	199	3	18	0.9851	0.9171	0.9499	0.9296
sniffles	pacbio	GIAB	19380	935	9014	0.954	0.6825	0.7957	-
cuteSV	pacbio	GIAB	19338	1164	9208	0.9432	0.6774	0.7885	-
severus	pacbio	GIAB	16107	954	11882	0.9441	0.5755	0.7151	-
delly	pacbio	GIAB	17173	1561	10741	0.9167	0.6152	0.7363	-
sawfish	pacbio	GIAB	20273	1766	7511	0.9199	0.7297	0.8138	-
dysgu	pacbio	GIAB	20895	1197	7715	0.9458	0.7303	0.8242	-

Reads were from Oxford Nanopore kit14 (~40X coverage), and PacBio Vega HiFi (~30X coverage). SV callers tested were as follows:

For benchmarking truvari v4.2 was used.

To repeat these results run:

bash fetch_data.sh
nextflow pipeline.nf

Notes

Truvari refine was used on the GIAB benchmark. For the CMRG benchmark, refine was not run as most vcfs triggered errors using this step.

For Severus, a script was used to convert the BND-notation of deletions to symbolic DEL calls. See convert_severus.py for details.

Truvari parameters were:

--passonly -r 1000 --dup-to-ins -p 0

To modify parameters, edit the nextflow.config file.

Contributing

Happy to accept PR's to add other callers to this benchmark, so long as:

The caller works with the input data (bam and cram files)
The caller requires < 5 hours to process a sample, and consumes < 32 Gb memory
The caller can be reliably installed without editing the source code

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.gitignore		.gitignore
README.md		README.md
benchmark_result.CMRG.png		benchmark_result.CMRG.png
benchmark_result.GIAB.png		benchmark_result.GIAB.png
convert_severus.py		convert_severus.py
environment.yml		environment.yml
fetch_data.sh		fetch_data.sh
nextflow.config		nextflow.config
ont_results.md		ont_results.md
pacbio_results.md		pacbio_results.md
pipeline.nf		pipeline.nf
plot_benchmark.py		plot_benchmark.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📊 Long-read SV Benchmark CMRG/GIAB

Requirements

ONT Results:

PacBio Results:

Notes

Contributing

About

Releases

Packages

Languages

kcleal/SV_Benchmark_CMRG_GIAB

Folders and files

Latest commit

History

Repository files navigation

📊 Long-read SV Benchmark CMRG/GIAB

Requirements

ONT Results:

PacBio Results:

Notes

Contributing

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages