Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better sizehint for primes #16333

Closed
wants to merge 2 commits into from
Closed

Better sizehint for primes #16333

wants to merge 2 commits into from

Conversation

mschauer
Copy link
Contributor

@mschauer mschauer commented May 12, 2016

Better sizehint for primes(lo,hi).

Before it failed for 1 << lo < hi as the sizehint! actually consumed hi / log(hi) memory independent of lo, but the needed size is better approximated by hi / log(hi) - lo/log(lo). Add a test which test primes and friends on short intervals with 1 << lo.

@@ -2253,6 +2253,9 @@ for T in [Int,BigInt], n = [1:1000;1000000]
@test s[k] == primesmask(k, k)[1]
end
end
let i = rand(1:2^40)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2^40 overflows on 32 bit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes, and I swear I did not just try to make a point here - #5573

@mschauer mschauer force-pushed the primes branch 2 times, most recently from 7f07a50 to 85904a1 Compare May 12, 2016 18:07
@@ -83,7 +83,7 @@ function primes(lo::Int, hi::Int)
lo ≤ 3 ≤ hi && push!(list, 3)
lo ≤ 5 ≤ hi && push!(list, 5)
hi < 7 && return list
sizehint!(list, floor(Int, hi / log(hi)))
sizehint!(list, floor(Int, max(hi,0) / log(max(hi,2)) - max(lo,0) / log(max(lo,2))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you don't need max here except for max(lo,0) and max(lo,2) as hi will be greater or equal than 7 at this point.

Copy link
Contributor

@pabloferz pabloferz May 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can clean this up a bit by writing lo = max(2, lo) after the lo ≤ hi check. And then just write hi / log(hi) - lo / log(lo) for the sizehint

@mschauer mschauer force-pushed the primes branch 2 times, most recently from 8a5d887 to 4ec97a0 Compare May 13, 2016 09:28
@mschauer
Copy link
Contributor Author

Thanks for the feedback. Not a big surprise, but for example now the batched infinite prime iterator
primeiter = Base.flatten(primes(1000000*(i-1)+1,1000000*i) for i in countfrom(1))
actually works

julia-before> @time first(drop(primeiter, 10^9))
 93.870623 seconds (2.00 G allocations: 81.113 TB, 10.60% gc time)
22801763513
julia-after> @time first(drop(primeiter, 10^9))
 88.758689 seconds (2.00 G allocations: 87.218 GB, 3.50% gc time)
22801763513

For comparison,

julia> @time primes(22801763513)[end];
114.720337 seconds (10 allocations: 13.415 GB, 0.15% gc time)

takes a bit longer.

@pabloferz
Copy link
Contributor

We could use better bound than n/log(n), but if the increase in time for big numbers is not that bad this PR might be a good compromise.

@mschauer
Copy link
Contributor Author

It's only slightly too big (with relative error of 5% for n=10^9 for example), which is good. That is better than choosing something too small but much closer (Li(n) for example).

@pabloferz
Copy link
Contributor

A much better bound for n>3 is n/(log(n)-1.12). Se here https://projecteuclid.org/download/pdf_1/euclid.rmjm/1181070157

@pabloferz
Copy link
Contributor

But that will only work for n > 8 as it is a decreasing function before that point.

@pabloferz
Copy link
Contributor

pabloferz commented May 13, 2016

Ok, as we have

x / log(x) < π(x) < x / (log(x) - 1.12)

then

hi / log(hi) - lo / (log(lo) - 1.12) < π_hi - π_lo < hi / (log(hi) - 1.12) - lo / log(lo)

so it would probably be better to have (though it would use more memory than you have now):

sizehint!(list, floor(Int, hi / (log(hi) - 1.12) - lo / log(lo))

I guess is just a matter of experimenting a bit.

@mschauer
Copy link
Contributor Author

Do we have π(x) < x / (log(x) - 1.12) ?

@pabloferz
Copy link
Contributor

See equation 10 from the reference a pasted above

@mschauer
Copy link
Contributor Author

hi / (log(hi) - 1.12) - lo / log(lo) is much too big if hi-lo is relatively small. I'll take
hi / (log(hi) - 1.12) - lo / (log(lo) - 1.12) to make it a tighter upper bound in the standard setting primes(n). That also shaves another 10 percent of the memory consumption in the example.

@pabloferz
Copy link
Contributor

Just beware of lo / (log(lo) - 1.12) when lo < 4 you should use lo / (log(lo) - (lo > 3 ? 1.12 : 0) or something like that

@mschauer
Copy link
Contributor Author

mschauer commented May 13, 2016

I think we can take that one. That should work fine in practice, as sizehint!() rounds to the next multiple of a resizing factor and does not need much precision, but only good overall estimates which give enough space to accommodate the array.

@pabloferz
Copy link
Contributor

LGTM.

@mschauer
Copy link
Contributor Author

@ararslan Do you want to have a look? I think this is good to go.

@ararslan
Copy link
Member

ararslan commented May 17, 2016

@mschauer Funny you should ask me, I'm just some guy. 😜

Looks good to me (and it appears the AppVeyor failure was just a timeout). 👍

@ararslan
Copy link
Member

Want to try closing and reopening the PR to see if we can get AppVeyor to cooperate?

@mschauer
Copy link
Contributor Author

Thanks to the unknown helping hand triggering AppVeyor.

@mschauer
Copy link
Contributor Author

#16357

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants