Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconsistent results from repeated calls to union!(s::BitSet, r::UnitRange{Int}) #45574

Closed
dmbates opened this issue Jun 3, 2022 · 3 comments · Fixed by #45578
Closed

inconsistent results from repeated calls to union!(s::BitSet, r::UnitRange{Int}) #45574

dmbates opened this issue Jun 3, 2022 · 3 comments · Fixed by #45578

Comments

@dmbates
Copy link
Member

dmbates commented Jun 3, 2022

As part of a larger task I need to count the number of elements in the union of a set of intervals. A sample set of intervals from genomic sequences and a function to evaluate the overlap size is in
bugreport.txt

I get inconsistent results

ulia> overlapsz(rngs)
8833

julia> overlapsz(rngs)
10413

julia> overlapsz(rngs)
8663

julia> versioninfo()
Julia Version 1.8.0-rc1
Commit 6368fdc656 (2022-05-27 18:33 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: 8 × 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, tigerlake)
  Threads: 8 on 8 virtual cores

I suspect that somewhere in the union!(s::BitSet, r::AbstractUnitRange{<:Integer}) method starting at line 126 of bitset.jl the bits vector of s is being extended without zeroing the extending part.

@dmbates
Copy link
Member Author

dmbates commented Jun 4, 2022

I can narrow it down a bit further with the script

using DataFrames

tup(bs::BitSet) = (; blen = length(bs.bits), offset = bs.offset, slen = length(bs))
ranges = [46644306:46644488, 46644343:46644488, 46648318:46648619, 46648458:46648538,]
function summarize(rng)
  bs = BitSet();
  result = [tup(bs)];
  for r in rng
    union!(bs, r)
    push!(result, tup(bs))
  end
  DataFrame(result)
end

summarize(ranges)
summarize(ranges)

If you run summarize(ranges) multiple times you can see that after the extension of the bits vector at the union! with the third element of ranges the length of the BitSet is off

julia> summarize(ranges)
5×3 DataFrame
 Row │ blen   offset                slen  
     │ Int64  Int64                 Int64 
─────┼────────────────────────────────────
   10  -1152921504606846976      0
   24                728817    183
   34                728817    183
   468                728817   1222
   568                728817   1222

julia> summarize(ranges)
5×3 DataFrame
 Row │ blen   offset                slen  
     │ Int64  Int64                 Int64 
─────┼────────────────────────────────────
   10  -1152921504606846976      0
   24                728817    183
   34                728817    183
   468                728817   1170
   568                728817   1170

@dmbates
Copy link
Member Author

dmbates commented Jun 4, 2022

I seem to take a while to get around to a minimal example but this seems clear enough.

julia> length(union!(BitSet(46644306:46644488), 46648318:46648619))
1994

julia> length(union!(BitSet(46644306:46644488), 46648318:46648619))
532

julia> length(union!(BitSet(46644306:46644488), 46648318:46648619))
516

julia> length(union!(BitSet(46644306:46644488), 46648318:46648619))
550

@dmbates
Copy link
Member Author

dmbates commented Jun 4, 2022

The union! of two non-overlapping BitSets properly zeros the elements of the bits vector between the two non-zero segments.

julia> union!(BitSet(0:255), BitSet(512:599)).bits
10-element Vector{UInt64}:
 0xffffffffffffffff
 0xffffffffffffffff
 0xffffffffffffffff
 0xffffffffffffffff
 0x0000000000000000
 0x0000000000000000
 0x0000000000000000
 0x0000000000000000
 0xffffffffffffffff
 0x0000000000ffffff

But a union! of a BitSet and UnitRange that do not overlap doesn't zero the intermediate section of the bits vector.

julia> union!(BitSet(0:255), 512:599).bits
10-element Vector{UInt64}:
 0xffffffffffffffff
 0xffffffffffffffff
 0xffffffffffffffff
 0xffffffffffffffff
 0x00007f8b0f95c850
 0x00007f8b0f95c880
 0x00007f8b0f95c8b0
 0x00007f8b0f95c8e0
 0xffffffffffffffff
 0x0000000000ffffff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant