-
Notifications
You must be signed in to change notification settings - Fork 9
ntEdit Secondary Bloom filter
A secondary Bloom filter with kmers to exclude may be provided to ntEdit v1.3+ with the -e option (for the ntedit
binary).
This option is useful when running ntEdit in SNV mode, effectively taking a "slice" of robust (non-error and non-repeated) kmers.
For example (options for ntedit
):
-f draft from A (pseudo haploid)
-r Bloom filter from A (diploid): non-error kmers
-e Bloom filter from A (diploid): repeat kmers
-s 1
When computing a repeat kmer Bloom filter with ntHits, we recommend that you first run ntCard and plot the kmer coverage histogram to identify the repeat cutoff value. A higher precision can be achieved with a more strict repeat filter cutoff, albeit at the risk of impacting sensitivity.
Alternate kmers that pass the presence verification stage of ntEdit will not be considered if present in the secondary Bloom filter. This option may be used to map homozygous variation between species/individual. For example, in such a set up: -f draft from A (pseudo haploid) -r Bloom filter from B (different individual or species, diploid): non-error kmers -e Bloom filter from A (diploid): non-error kmers -s 0 This will map sites that are different (homozygous variants) between B and A.
*These are provided as examples. Other experimental setup are possible.
We recommend setting the jump parameter (j) to 1 when using the secondary Bloom filter or using the following compatible j/k values:
-j 1: all k combinations may be used -j 2: k31, k33, k35, k37, k39, k41, k43, k45, k47, k49, k51, k53, k55, k57, k59, k61 (odd k value) -j 3: k31, k34, k37, k40, k43, k46, k49, k52, k55, k58, k61
Faster ntEdit runs are achieved at -j 3. Higher values of j would not provide enough kmers in the k subset and have not been tested.
Users should perform their own benchmarks.