Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug where bin number could overflow when looking for max_off #1595

Merged
merged 1 commit into from
Apr 5, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 38 additions & 24 deletions hts.c
Original file line number Diff line number Diff line change
Expand Up @@ -3145,17 +3145,24 @@ hts_itr_t *hts_itr_query(const hts_idx_t *idx, int tid, hts_pos_t beg, hts_pos_t
}

// compute max_off: a virtual offset from a bin to the right of end
bin = hts_bin_first(idx->n_lvls) + ((end-1) >> idx->min_shift) + 1;
if (bin >= idx->n_bins) bin = 0;
while (1) {
// search for an extant bin by moving right, but moving up to the
// parent whenever we get to a first child (which also covers falling
// off the RHS, which wraps around and immediately goes up to bin 0)
while (bin % 8 == 1) bin = hts_bin_parent(bin);
if (bin == 0) { max_off = (uint64_t)-1; break; }
k = kh_get(bin, bidx, bin);
if (k != kh_end(bidx) && kh_val(bidx, k).n > 0) { max_off = kh_val(bidx, k).list[0].u; break; }
bin++;
// First check if end lies within the range of the index (it won't
// if it's HTS_POS_MAX)
if (end < 1LL << (idx->min_shift + 3 * idx->n_lvls)) {
Copy link
Contributor

@jkbonfield jkbonfield Apr 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a useful shortcut I think, but it looks like using uint64_t for bin would have made the old code work too, as hts_bin_first is then huge and bin gets reset to 0, falling out of the while loop with max_off = UINT64_MAX.

Otherwise we're still using a negative bin below, eg:

            for (i = n_off = 0; i < iter->bins.n; ++i)
                if ((k = kh_get(bin, bidx, iter->bins.a[i])) != kh_end(bidx))
                    n_off += kh_value(bidx, k).n;

Edit: argh! That blasted obscure templating code... ignore me. "bin" isn't a variable here, but a namespace shadowing a variable name. Sigh.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I now see how to get a positive legal bin and still cause mayhem, so agreed your solution of checking for the maximum value is definitely the best way of handling it.

$ tabix -m10 -f /tmp/_.bed.gz;./tabix /tmp/_.bed.gz a

That takes 10 sec to query my 5 entry bed file. (Admittedly with debugging on)

bin = hts_bin_first(idx->n_lvls) + ((end-1) >> idx->min_shift) + 1;
if (bin >= idx->n_bins) bin = 0;
while (1) {
// search for an extant bin by moving right, but moving up to the
// parent whenever we get to a first child (which also covers falling
// off the RHS, which wraps around and immediately goes up to bin 0)
while (bin % 8 == 1) bin = hts_bin_parent(bin);
if (bin == 0) { max_off = UINT64_MAX; break; }
k = kh_get(bin, bidx, bin);
if (k != kh_end(bidx) && kh_val(bidx, k).n > 0) { max_off = kh_val(bidx, k).list[0].u; break; }
bin++;
}
} else {
// Searching to end of reference
max_off = UINT64_MAX;
}

// retrieve bins
Expand Down Expand Up @@ -3314,20 +3321,27 @@ int hts_itr_multi_bam(const hts_idx_t *idx, hts_itr_t *iter)
}

// compute max_off: a virtual offset from a bin to the right of end
bin = hts_bin_first(idx->n_lvls) + ((end-1) >> idx->min_shift) + 1;
if (bin >= idx->n_bins) bin = 0;
while (1) {
// search for an extant bin by moving right, but moving up to the
// parent whenever we get to a first child (which also covers falling
// off the RHS, which wraps around and immediately goes up to bin 0)
while (bin % 8 == 1) bin = hts_bin_parent(bin);
if (bin == 0) { max_off = (uint64_t)-1; break; }
k = kh_get(bin, bidx, bin);
if (k != kh_end(bidx) && kh_val(bidx, k).n > 0) {
max_off = kh_val(bidx, k).list[0].u;
break;
// First check if end lies within the range of the index (it
// won't if it's HTS_POS_MAX)
if (end < 1LL << (idx->min_shift + 3 * idx->n_lvls)) {
bin = hts_bin_first(idx->n_lvls) + ((end-1) >> idx->min_shift) + 1;
if (bin >= idx->n_bins) bin = 0;
while (1) {
// search for an extant bin by moving right, but moving up to the
// parent whenever we get to a first child (which also covers falling
// off the RHS, which wraps around and immediately goes up to bin 0)
while (bin % 8 == 1) bin = hts_bin_parent(bin);
if (bin == 0) { max_off = UINT64_MAX; break; }
k = kh_get(bin, bidx, bin);
if (k != kh_end(bidx) && kh_val(bidx, k).n > 0) {
max_off = kh_val(bidx, k).list[0].u;
break;
}
bin++;
}
bin++;
} else {
// Searching to end of reference
max_off = UINT64_MAX;
}

//convert coordinates to file offsets
Expand Down