Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid index is produced by --write-index and --threads #1985

Closed
lacek opened this issue Aug 31, 2023 · 0 comments · Fixed by samtools/htslib#1672
Closed

Invalid index is produced by --write-index and --threads #1985

lacek opened this issue Aug 31, 2023 · 0 comments · Fixed by samtools/htslib#1672

Comments

@lacek
Copy link

lacek commented Aug 31, 2023

Versions:

  • bcftools 1.18
  • Using htslib 1.18

Steps to reproduce:

wget -N ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/archive_2.0/2023/clinvar_20230819.vcf.gz
echo $(seq 1 22) X Y | awk -v RS=' ' '{print $1"\tchr"$1} END {print "MT\tchrM"}' > hg38_rename.txt
bcftools annotate --rename-chrs hg38_rename.txt --write-index --threads 1 -Oz -o clinvar_20230819.hg38.vcf.gz clinvar_20230819.vcf.gz
bcftools view -H clinvar_20230819.hg38.vcf.gz chrY | head

The following error is shown at this point:

[E::get_intv] Failed to parse TBX_VCF, was wrong -p [type] used?
The offending line was: "1627532;CLNDISDB=MedGen:CN517202;CLNDN=not_provided;CLNHGVS=NC_000001.11:g.931107C>T;CLNREVSTAT=criteria_provided,_single_submitter;CLNSIG=Likely_benign;CLNVC=single_nucleotide_variant;CLNVCSO=SO:0001483;GENEINFO=SAMD11:148398;MC=SO:0001627|intron_variant;ORIGIN=1"
Error: BCF read error

Recreate another one without --threads 1 and there's no error:

bcftools annotate --rename-chrs hg38_rename.txt --write-index -Oz -o clinvar_20230819.hg38.nothreads.vcf.gz clinvar_20230819.vcf.gz
bcftools view -H clinvar_20230819.hg38.nothreads.vcf.gz  chrY | head

If the index file is recreated by bcftools index, there's no error too:

bcftools index -f clinvar_20230819.hg38.vcf.gz
bcftools view -H clinvar_20230819.hg38.vcf.gz chrY | head

So there should be something wrong with the index file when it is produced by --write-index with --threads (>0).

daviesrob added a commit to daviesrob/htslib that referenced this issue Sep 18, 2023
* Switch from hts_idx_push() to bgzf_idx_push() for on-the-fly
  indexing of BCF and VCF.bgz files.  The latter function is
  needed to record the correct offsets when using multi-threaded
  BGZF compression.

  Fixes samtools/bcftools#1985

* Only allow indexing of BGZF-compressed files.  It's necessary to
  enforce this as on-the-fly indexing assumes that the file
  pointer is in htsFile::fp.bgzf, but uncompressed VCF uses
  htsFile::fp.hfile.
whitwham pushed a commit to samtools/htslib that referenced this issue Sep 22, 2023
* Switch from hts_idx_push() to bgzf_idx_push() for on-the-fly
  indexing of BCF and VCF.bgz files.  The latter function is
  needed to record the correct offsets when using multi-threaded
  BGZF compression.

  Fixes samtools/bcftools#1985

* Only allow indexing of BGZF-compressed files.  It's necessary to
  enforce this as on-the-fly indexing assumes that the file
  pointer is in htsFile::fp.bgzf, but uncompressed VCF uses
  htsFile::fp.hfile.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant