Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve isprime performance for 32 bit integers #9

Merged
merged 4 commits into from
Jun 7, 2016
Merged

Conversation

pabloferz
Copy link
Member

Brought from JuliaLang/julia#16349.

As I commented over there there is another possibility (show below) instead of the current PR. It has been argued that it uses more memory (which is true, see JuliaLang/julia#16349 (comment)), but is much less memory than the current implementation anyway and is faster than this PR.

const bases = UInt16[15591,2018,166,7429,8064,16045,10503,4399,1949,1295,2776,3620,560,3128,5212,
2657,2300,2021,4652,1471,9336,4018,2398,20462,10277,8028,2213,6219,620,3763,4852,5012,3185,
1333,6227,5298,1074,2391,5113,7061,803,1269,3875,422,751,580,4729,10239,746,2951,556,2206,
3778,481,1522,3476,481,2487,3266,5633,488,3373,6441,3344,17,15105,1490,4154,2036,1882,1813,
467,3307,14042,6371,658,1005,903,737,1887,7447,1888,2848,1784,7559,3400,951,13969,4304,177,41,
19875,3110,13221,8726,571,7043,6943,1199,352,6435,165,1169,3315,978,233,3003,2562,2994,10587,
10030,2377,1902,5354,4447,1555,263,27027,2283,305,669,1912,601,6186,429,1930,14873,1784,1661,
524,3577,236,2360,6146,2850,55637,1753,4178,8466,222,2579,2743,2031,2226,2276,374,2132,813,
23788,1610,4422,5159,1725,3597,3366,14336,579,165,1375,10018,12616,9816,1371,536,1867,10864,
857,2206,5788,434,8085,17618,727,3639,1595,4944,2129,2029,8195,8344,6232,9183,8126,1870,3296,
7455,8947,25017,541,19115,368,566,5674,411,522,1027,8215,2050,6544,10049,614,774,2333,3007,
35201,4706,1152,1785,1028,1540,3743,493,4474,2521,26845,8354,864,18915,5465,2447,42,4511,
1660,166,1249,6259,2553,304,272,7286,73,6554,899,2816,5197,13330,7054,2818,3199,811,922,350,
7514,4452,3449,2663,4708,418,1621,1171,3471,88,11345,412,1559,194]

function witnesses(n::Union{UInt8,Int8,UInt16,Int16,UInt32,Int32})
    i = UInt64(n)
    i = ((i >> 16) $ i) * 0x45d9f3b
    i = ((i >> 16) $ i) * 0x45d9f3b
    i = ((i >> 16) $ i) & 255 + 1
    bases[i]
end

@coveralls
Copy link

coveralls commented May 27, 2016

Coverage Status

Coverage increased (+0.9%) to 96.97% when pulling aac32f1 on pz/isprime into 078a15c on master.

@simonbyrne
Copy link
Member

Sorry, I haven't really followed the discussion on the other threads (I don't actually know all that much about prime number algorithms), but is the only downside of the alternative method that it needs a slightly larger lookup table?

If so, that's definitely worth it: 512 bytes is basically nothing (a processor these days typically has 64K L1 cache), special functions implementations often use way larger tables, e.g.
https://github.com/JuliaLang/julia/blob/e11ff359f44121d9b9bf6057769465b29aa28405/base/special/log.jl

Also, add a reference to the paper in the comments (this should also be included in the eventual documents as well).

for m in (3,5,7,11,13,17,19,23)
n % m == 0 && return false
end
n < 841 && return n > 1
s = trailing_zeros(n-1)
d = (n-1) >>> s
for a in witnesses(n)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied comment:

I think you can give a type hint

for a in witnesses(n)::Tuple{Vararg{Int}}

making the witnesses stack allocated.

@pabloferz
Copy link
Member Author

@simonbyrne I also think the larger lookup table is worth it. But the first time the improvement strategy was discussed there was someone that wasn't so happy about it.

So I'll change it to the other version and will address your comments about the reference.

@simonbyrne
Copy link
Member

might also see some speedup by using @inbounds bases[i].

@coveralls
Copy link

coveralls commented May 27, 2016

Coverage Status

Coverage increased (+3.2%) to 99.248% when pulling 77debe1 on pz/isprime into 078a15c on master.

i = (((n >> 16) $ n) * 0x45d9f3b) & 0xffffffff
i = ((i >> 16) $ i) & 15 + 1
(2,(2249,483,194,199,15,369,499,945,419,735,33,471,946,615,497,702)[i])
# Forišek and Jančina, "Fast Primality Testing for Integers That Fit into a Machine Word", 2015
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this FJ32_256? that seem to be the only one described in the paper (though it's a little different?).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just took the hash and bases from the FJ32_256 there, since we were already had a Miller-Rabin test implemented.

That was the only thing we really needed, since everything else was already here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks! Would be good to note that this is the algorithm used :) Is there any value is the 64-bit version (for 64 bit types)? the paper seems to suggest they weren't sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll mention the algorithm used. The 64-bit version needs a way larger look-up table and is probably better to try one of the approaches discussed here JuliaLang/julia#11594

@mschauer
Copy link
Member

mschauer commented Jun 1, 2016

Looks good.

@coveralls
Copy link

coveralls commented Jun 1, 2016

Coverage Status

Coverage increased (+0.9%) to 96.992% when pulling 4dc4fae on pz/isprime into 078a15c on master.

@simonbyrne
Copy link
Member

So is this good to merge?

@simonbyrne simonbyrne merged commit 5407ca0 into master Jun 7, 2016
@simonbyrne simonbyrne deleted the pz/isprime branch June 7, 2016 09:49
@simonbyrne
Copy link
Member

Great, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants