Better sizehint for primes #16333

mschauer · 2016-05-12T17:03:19Z

Better sizehint for primes(lo,hi).

Before it failed for 1 << lo < hi as the sizehint! actually consumed hi / log(hi) memory independent of lo, but the needed size is better approximated by hi / log(hi) - lo/log(lo). Add a test which test primes and friends on short intervals with 1 << lo.

tkelman · 2016-05-12T17:32:00Z

test/numbers.jl

@@ -2253,6 +2253,9 @@ for T in [Int,BigInt], n = [1:1000;1000000]
        @test s[k] == primesmask(k, k)[1]
    end
 end
+let i = rand(1:2^40)


2^40 overflows on 32 bit

Oh, yes, and I swear I did not just try to make a point here - #5573

pabloferz · 2016-05-12T21:01:06Z

base/primes.jl

@@ -83,7 +83,7 @@ function primes(lo::Int, hi::Int)
    lo ≤ 3 ≤ hi && push!(list, 3)
    lo ≤ 5 ≤ hi && push!(list, 5)
    hi < 7 && return list
-    sizehint!(list, floor(Int, hi / log(hi)))
+    sizehint!(list, floor(Int, max(hi,0) / log(max(hi,2)) - max(lo,0) / log(max(lo,2))))


I think you don't need max here except for max(lo,0) and max(lo,2) as hi will be greater or equal than 7 at this point.

I guess we can clean this up a bit by writing lo = max(2, lo) after the lo ≤ hi check. And then just write hi / log(hi) - lo / log(lo) for the sizehint

mschauer · 2016-05-13T10:10:35Z

Thanks for the feedback. Not a big surprise, but for example now the batched infinite prime iterator
primeiter = Base.flatten(primes(1000000*(i-1)+1,1000000*i) for i in countfrom(1))
actually works

julia-before> @time first(drop(primeiter, 10^9))
 93.870623 seconds (2.00 G allocations: 81.113 TB, 10.60% gc time)
22801763513
julia-after> @time first(drop(primeiter, 10^9))
 88.758689 seconds (2.00 G allocations: 87.218 GB, 3.50% gc time)
22801763513

For comparison,

julia> @time primes(22801763513)[end];
114.720337 seconds (10 allocations: 13.415 GB, 0.15% gc time)

takes a bit longer.

pabloferz · 2016-05-13T10:20:07Z

We could use better bound than n/log(n), but if the increase in time for big numbers is not that bad this PR might be a good compromise.

mschauer · 2016-05-13T10:28:03Z

It's only slightly too big (with relative error of 5% for n=10^9 for example), which is good. That is better than choosing something too small but much closer (Li(n) for example).

pabloferz · 2016-05-13T10:30:17Z

A much better bound for n>3 is n/(log(n)-1.12). Se here https://projecteuclid.org/download/pdf_1/euclid.rmjm/1181070157

pabloferz · 2016-05-13T10:36:41Z

But that will only work for n > 8 as it is a decreasing function before that point.

pabloferz · 2016-05-13T11:05:55Z

Ok, as we have

x / log(x) < π(x) < x / (log(x) - 1.12)

then

hi / log(hi) - lo / (log(lo) - 1.12) < π_hi - π_lo < hi / (log(hi) - 1.12) - lo / log(lo)

so it would probably be better to have (though it would use more memory than you have now):

sizehint!(list, floor(Int, hi / (log(hi) - 1.12) - lo / log(lo))

I guess is just a matter of experimenting a bit.

mschauer · 2016-05-13T11:41:18Z

Do we have π(x) < x / (log(x) - 1.12) ?

pabloferz · 2016-05-13T11:43:04Z

See equation 10 from the reference a pasted above

mschauer · 2016-05-13T12:17:06Z

hi / (log(hi) - 1.12) - lo / log(lo) is much too big if hi-lo is relatively small. I'll take
hi / (log(hi) - 1.12) - lo / (log(lo) - 1.12) to make it a tighter upper bound in the standard setting primes(n). That also shaves another 10 percent of the memory consumption in the example.

pabloferz · 2016-05-13T12:30:27Z

Just beware of lo / (log(lo) - 1.12) when lo < 4 you should use lo / (log(lo) - (lo > 3 ? 1.12 : 0) or something like that

mschauer · 2016-05-13T15:28:18Z

I think we can take that one. That should work fine in practice, as sizehint!() rounds to the next multiple of a resizing factor and does not need much precision, but only good overall estimates which give enough space to accommodate the array.

pabloferz · 2016-05-13T15:33:29Z

LGTM.

mschauer · 2016-05-17T08:36:00Z

@ararslan Do you want to have a look? I think this is good to go.

ararslan · 2016-05-17T18:22:48Z

@mschauer Funny you should ask me, I'm just some guy. 😜

Looks good to me (and it appears the AppVeyor failure was just a timeout). 👍

ararslan · 2016-05-18T18:34:33Z

Want to try closing and reopening the PR to see if we can get AppVeyor to cooperate?

mschauer · 2016-05-19T08:50:12Z

Thanks to the unknown helping hand triggering AppVeyor.

mschauer · 2016-05-22T14:17:33Z

#16357

mschauer force-pushed the primes branch from b7e677f to c7086fc Compare May 12, 2016 17:09

tkelman reviewed May 12, 2016
View reviewed changes

mschauer force-pushed the primes branch 2 times, most recently from 7f07a50 to 85904a1 Compare May 12, 2016 18:07

pabloferz reviewed May 12, 2016
View reviewed changes

mschauer force-pushed the primes branch 2 times, most recently from 8a5d887 to 4ec97a0 Compare May 13, 2016 09:28

Better sizehint for primes

4ec97a0

Tweaking estimate of prime counting function

502a74c

mschauer force-pushed the primes branch from b184f09 to 502a74c Compare May 13, 2016 13:28

ararslan mentioned this pull request May 19, 2016

Contents and repositories for JuliaMath JuliaMath/Roadmap.jl#1

Closed

16 tasks

mschauer closed this May 22, 2016

simonbyrne mentioned this pull request May 27, 2016

Move PRs from Base JuliaMath/Primes.jl#8

Closed

3 tasks

mschauer mentioned this pull request May 27, 2016

Better sizehint for primes JuliaMath/Primes.jl#11

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better sizehint for primes #16333

Better sizehint for primes #16333

mschauer commented May 12, 2016 •

edited

Loading

tkelman May 12, 2016

mschauer May 12, 2016

pabloferz May 12, 2016

pabloferz May 12, 2016 •

edited

Loading

mschauer commented May 13, 2016

pabloferz commented May 13, 2016

mschauer commented May 13, 2016

pabloferz commented May 13, 2016

pabloferz commented May 13, 2016

pabloferz commented May 13, 2016 •

edited

Loading

mschauer commented May 13, 2016

pabloferz commented May 13, 2016

mschauer commented May 13, 2016

pabloferz commented May 13, 2016

mschauer commented May 13, 2016 •

edited

Loading

pabloferz commented May 13, 2016

mschauer commented May 17, 2016

ararslan commented May 17, 2016 •

edited

Loading

ararslan commented May 18, 2016

mschauer commented May 19, 2016

mschauer commented May 22, 2016

Better sizehint for primes #16333

Better sizehint for primes #16333

Conversation

mschauer commented May 12, 2016 • edited Loading

tkelman May 12, 2016

Choose a reason for hiding this comment

mschauer May 12, 2016

Choose a reason for hiding this comment

pabloferz May 12, 2016

Choose a reason for hiding this comment

pabloferz May 12, 2016 • edited Loading

Choose a reason for hiding this comment

mschauer commented May 13, 2016

pabloferz commented May 13, 2016

mschauer commented May 13, 2016

pabloferz commented May 13, 2016

pabloferz commented May 13, 2016

pabloferz commented May 13, 2016 • edited Loading

mschauer commented May 13, 2016

pabloferz commented May 13, 2016

mschauer commented May 13, 2016

pabloferz commented May 13, 2016

mschauer commented May 13, 2016 • edited Loading

pabloferz commented May 13, 2016

mschauer commented May 17, 2016

ararslan commented May 17, 2016 • edited Loading

ararslan commented May 18, 2016

mschauer commented May 19, 2016

mschauer commented May 22, 2016

mschauer commented May 12, 2016 •

edited

Loading

pabloferz May 12, 2016 •

edited

Loading

pabloferz commented May 13, 2016 •

edited

Loading

mschauer commented May 13, 2016 •

edited

Loading

ararslan commented May 17, 2016 •

edited

Loading