diff --git a/.gitignore b/.gitignore
index 4cecece..75ba058 100644
--- a/.gitignore
+++ b/.gitignore
@@ -2,3 +2,7 @@
 data/
 *.bcf
 src/duphold
+*.bam*
+*.cram*
+syndip/
+*.vcf*
diff --git a/README.md b/README.md
index cbfb9f1..b5da78e 100644
--- a/README.md
+++ b/README.md
@@ -18,7 +18,7 @@ single sample with:
 + **DHBFC**: fold-change for the variant depth *relative to bins in the genome with similar GC-content*.
 + **DHD**: rapid change in depth at one of the break-points (1 for higher (DUP). 0 for no or conflicting changes. -1 for drop (DUP), 2 or -2 for both break points)
 
-If a SNP/Indel VCF is given, `duphold` will annotate each DEL/DUP call with:
+If a SNP/Indel VCF/BCF is given, `duphold` will annotate each DEL/DUP call with (see below for more detail on what it does):
 
 + **DHET**: counts of SNP heterozygotes in the SV supporting: [0] a normal heterozygote, [1] a triploid heterozygote.
             for a DUP, we expect most hets to have an allele balance closer to 0.33 or 0.67 than to 0.5. A good heterozygous
@@ -29,6 +29,20 @@ If a SNP/Indel VCF is given, `duphold` will annotate each DEL/DUP call with:
 
 It also adds **GCF** to the INFO field indicating the fraction of G or C bases in the variant.
 
+
+## SNP/Indel annotation
+
+**NOTE** it is strongly recommended to use BCF for the `--snp` argument as otherwise VCF parsing will be a bottleneck.
+
++ A DEL call with many HETs is unlikely to be valid.
++ A DUP call that has many HETs that have a 0.5 allele balance is unlikely to be valid.
+
+When the user specifies a `--snp` VCF, `duphold` finds the appropriate sample in that file and extracts high (> 20) quality, bi-allelic
+SNP calls. For each chromosome, it will store a minimal (low-memory representation) in a sorted data-structure for fast access. It will
+then query this data structure for each SV and count the number of heterozygotes supporting a diploid HET (allele balance close to 0.5)
+or a triploid HET (allele balance close to 0.33 or 0.67) into `DHET`. It will store the number of Hom-Ref, Hom-Alt, Unnkown calls in
+`DHHU`.
+
 ## Performance
 
 ### Speed
@@ -48,10 +62,13 @@ coming soon.
 ## Usage
 
 ```
-duphold -t 4 -v $svvcf -b $cram -f $fasta -o $output.bcf
-duphold --threads 4 --vcf $svvcf --bam $cram --fasta $fasta --output $output.bcf
+duphold -s $gatk_vcf -t 4 -v $svvcf -b $cram -f $fasta -o $output.bcf
+duphold --snp $gatk_bcf --threads 4 --vcf $svvcf --bam $cram --fasta $fasta --output $output.bcf
 ```
 
+`--snp` can be a multi-sample VCF/BCF. `duphold` will be much faster with a BCF, especially if
+the snp/indel file contains many (>20 or so) samples.
+
 the threads are decompression threads so increasing up to about 4 works.
 
 ## Examples
diff --git a/nim.cfg b/nim.cfg
index b81c897..0c5211f 100644
--- a/nim.cfg
+++ b/nim.cfg
@@ -1 +1,2 @@
 threads:on
+--passC:"-flto"
diff --git a/src/duphold.nim b/src/duphold.nim
index 2ceac60..75e9429 100644
--- a/src/duphold.nim
+++ b/src/duphold.nim
@@ -665,7 +665,7 @@ Options:
   -v --vcf <path>           path to sorted SV VCF/BCF
   -b --bam <path>           path to indexed BAM/CRAM
   -f --fasta <path>         indexed fasta reference.
-  -s --snp <path>           optional path to snp/indel VCF with which to annotate SVs.
+  -s --snp <path>           optional path to snp/indel VCF/BCF with which to annotate SVs. BCF is highly recommended as it's much faster to parse.
   -t --threads <int>        number of decompression threads. [default: 4]
   -o --output <string>      output VCF/BCF (default is VCF to stdout) [default: -]
   -d --drop                 drop all samples from a multi-sample --vcf *except* the sample in --bam. useful for parallelization by sample followed by merge.