From 68dbce01e9b9c8e8ed57450a7bee45e2ee29b942 Mon Sep 17 00:00:00 2001 From: James Bonfield Date: Thu, 12 Sep 2024 15:11:45 +0100 Subject: [PATCH] Fix on-the-fly indexing of VCF w.r.t virtual offsets. When using bcftools view --write-index -o out.vcf.gz the virtual file offsets can differ depending on whether we do a bgzf_tell before or after a flush. Specifically it could point to the last byte in the current BGZF block or the first byte in the next BGZF block. Ultimately both of these resolve to the same physical location, but in some situations the former may mean attempting to read zero bytes (the remainder of the bgzf block). This has been known in the past to be misinterpreted as an EOF. (See samtools/samtools#1861) It also means the contents of the index produced by --write-index and a separate bcftools index command can yield different results, albeit both representing the same data. The fix for the samtools / bcftools issue above (samtools/htslib#1672) when multi-threading inadvertently recreated the bug when not multi-threading. --- NEWS | 4 ++++ vcf.c | 2 ++ 2 files changed, 6 insertions(+) diff --git a/NEWS b/NEWS index be8de6682..4a573d91d 100644 --- a/NEWS +++ b/NEWS @@ -89,6 +89,10 @@ Bug fixes * Fix small OSS-Fuzz reported issues with CRAM encoding and long CIGARS and/or illegal positions. (PR #1775, PR #1801, PR #1817) +* Fix issues with on-the-fly indexing of VCF/BCF (bcftools --write-index) + when not using multiple threads. (PR #1837. Fixes samtools/bcftools#2267, + reported by Giulio Genovese) + * Stricter limits on POS / MPOS / TLEN in sam_parse1(). This fixes a signed overflow reported by OSS-Fuzz and should help prevent other as-yet undetected bugs. (PR #1812) diff --git a/vcf.c b/vcf.c index 7ce306f92..105c7539d 100644 --- a/vcf.c +++ b/vcf.c @@ -4238,6 +4238,8 @@ int vcf_write(htsFile *fp, const bcf_hdr_t *h, bcf1_t *v) if ( fp->format.compression!=no_compression ) { if (bgzf_flush_try(fp->fp.bgzf, fp->line.l) < 0) return -1; + if (fp->idx && !fp->fp.bgzf->mt) + hts_idx_amend_last(fp->idx, bgzf_tell(fp->fp.bgzf)); ret = bgzf_write(fp->fp.bgzf, fp->line.s, fp->line.l); } else { ret = hwrite(fp->fp.hfile, fp->line.s, fp->line.l);