-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hashing BigInts is slow #8727
Comments
Yup. I knew that was going to be trouble some day. The rest of BigInt hashing should be ok, actually. |
This particular problem can be solved without fixing ndigits0z since we just need to know how many bits the value is, but it would be good to actually just fix that function. |
Why a library that contains the world's most advanced big integer algorithms cannot accurately compute the number of digits in a number is simply beyond me. |
That one was too hard. |
It says that base 2 is exact, so perhaps it's worth having a specific |
Just checking for bases that are powers inside of ndigits0z might be good enough. |
`sizeinbase` from gmp is exact for powers of two, so the checks are not needed.
Oh sorry, I cooked an easy PR for this without seeing the updated thread here. |
Here's some investigation of when GMP is off by one: julia> using StatsBase
julia> nd(x::BigInt, b::Integer=10) =
int(ccall((:__gmpz_sizeinbase,:libgmp), Culong, (Ptr{BigInt}, Int32), &x, b))
nd (generic function with 2 methods)
julia> map(x->factor(x+1), cumsum(rle([ nd(big(n),7)-ndigits(big(n),7) for n=1:2^17-1 ])[2]))
13-element Array{Dict{Int64,Int64},1}:
Dict(2=>2)
Dict(7=>1)
Dict(2=>5)
Dict(7=>2)
Dict(2=>8)
Dict(7=>3)
Dict(2=>11)
Dict(7=>4)
Dict(2=>14)
Dict(7=>5)
Dict(2=>16)
Dict(7=>6)
Dict(2=>17)
julia> map(x->factor(x+1), cumsum(rle([ nd(big(n),6)-ndigits(big(n),6) for n=1:2^17-1 ])[2]))
13-element Array{Dict{Int64,Int64},1}:
Dict(2=>2)
Dict(2=>1,3=>1)
Dict(2=>5)
Dict(2=>2,3=>2)
Dict(2=>7)
Dict(2=>3,3=>3)
Dict(2=>10)
Dict(2=>4,3=>4)
Dict(2=>12)
Dict(2=>5,3=>5)
Dict(2=>15)
Dict(2=>6,3=>6)
Dict(2=>17) This is largely for my own record since it's probably not super-clear to anyone else what this is showing, but in short, it looks like the answer is off by one between powers of the base and the next power of two – which makes a lot of sense. The question is how to figure out when this is the case efficiently. |
Backported in b1fc473 |
Thanks, @ivarne. |
As reported here, using BigInts as keys to a Dict is slow. The culprit is hashing, specifically this line, which gets called from here. I don't know enough about BigInts to propose a better way to hash them, but I strongly suspect that the current approach is not The Answer.
The text was updated successfully, but these errors were encountered: