Updated shuffle implementation to use better hash function #1257

djns99 · 2020-08-14T02:54:33Z

Fixes #1256 by changing the hash function from taus88 to use wyhash. Adds test for a random distribution of numbers

thrust/system/detail/generic/shuffle.inl

testing/shuffle.cu

thrust/system/detail/generic/shuffle.inl

alliepiper · 2020-08-27T21:00:42Z

@djns99 @RAMitchell Just checking in -- it looks like you two have been iterating on this PR, just let me know when you're ready for me to do a final review and start testing/landing it.

RAMitchell · 2020-08-28T01:06:59Z

@djns99 I'm happy with your current approach using murmurhash, did you have anything else you wanted to add? Would be good to have a couple of comments with references for CephesFunctions and murmerhash. Please also check the licenses are compatible i.e. we are allowed to use their code in this way.

djns99 · 2020-08-28T01:34:54Z

@djns99 I'm happy with your current approach using murmurhash, did you have anything else you wanted to add? Would be good to have a couple of comments with references for CephesFunctions and murmerhash. Please also check the licenses are compatible i.e. we are allowed to use their code in this way.

@RAMitchell I probably wont have time to do this in the next few days. However, I am still having difficulty obtaining sufficiently random results, even if I increase the number of rounds. Some preliminary investigations indicate that it is likely the murmur hash function with insufficient randomness, however it is a significant improvement over the current results even at 8 rounds.
I would also note that it could be a test bug since I would expect a high enough number of rounds to pass, but I did not see this behaviour

As for licensing, cephes was taken from the NIST suite for testing randomness, which according to https://csrc.nist.gov/Projects/Random-Bit-Generation/Documentation-and-Software is in the public domain. I took murmur hash from the smhasher repo, the licensing information there say it is also in the public domain.

RAMitchell · 2020-08-28T02:47:50Z

Ok, I think the key thing is for us to make the next toolkit release (11.1?). Let's catch up next week on this.

djns99 · 2020-09-09T03:07:45Z

@RAMitchell I have done a bit of work on this and decided to use the RC5 encryption algorithm https://doi.org/10.1007/3-540-60590-8_7. This uses a 12-20 round (I have it set to 12 at present) feistel network using data dependent rotations on an w-bit register. The only shortfall of this is that it requires us to round to a power of 4 instead of 2. However, it means we can make some guarantees about the strength of our internal function (i.e. any errors are due to an insufficiently random key).
There are still some issues with the test I am in the process of diagnosing

As for licensing, the patent for the algorithm expired in 2015 https://patents.google.com/patent/US5835600A/en so I believe this means we are allowed to use it.

djns99 · 2020-09-09T03:48:38Z

Test issues are resolved.

RAMitchell

Regarding the use of RC5, that's up to the thrust team. You might want to have an alternative in mind.

The only shortfall of this is that it requires us to round to a power of 4 instead of 2.

I'm a bit confused, current code still uses nearest power of two in get_cipher_bits.

Looks really good otherwise and glad to see you've got the tests passing at high accuracy!

thrust/system/detail/generic/shuffle.inl

djns99 · 2020-09-09T04:31:42Z

I'm a bit confused, current code still uses nearest power of two in get_cipher_bits.

Note the change to

m >>= 2u;

This counts the number of pairs of bits now, with w representing one half of the feistel block

RAMitchell · 2020-09-09T04:41:21Z

I see, thanks! Nearest power of four is not ideal from a performance perspective but I think we are still much faster than alternatives and robustness is priority one.

djns99 · 2020-09-09T04:47:50Z

performance perspective

In theory we could modify the RC5 algorithm slightly and use modulus by a different amount for each register to allow for a power of two instead of 4. However, this would likely require a more comprehensive analysis of the problem to ensure the correct behaviour is maintained, so I dont believe this is the place for this optimisation. There are a range of other candidate ciphers available to consider, but RC5/RC6 (though RC6 rounds to a power of 16) was the main one I found that explicitly supports a variable block width.

+ Uses RC5 cipher as bijective function + Adds more comprehensive tests for shuffle distribution

RAMitchell · 2020-09-09T21:54:16Z

@djns99 let us know when you are ready for final review.

djns99 · 2020-09-12T11:53:49Z

If we are ok with RC5 I don't think there is anything left to do at present. So we should be good to go ahead with review and merge.

alliepiper · 2020-09-18T15:22:10Z

Whatever implementation you two decide on is fine with me, but it sounds like there are some possible concerns with licensing, too?

Can someone summarize the potentially problematic portions of the current PR, along with their licenses, patents, and other IP concerns?

djns99 · 2020-09-18T20:36:31Z

There are two parts that potentially have licensing/IP issues. The first is the Cephes functions used in the test suite, which are directly taken from a NIST test suite for random numbers.
The second potential problem is the round function used for the Feistel Network, of which there are currently two options: RC5 and MurmurHash2

As for licensing, cephes was taken from the NIST suite for testing randomness, which according to https://csrc.nist.gov/Projects/Random-Bit-Generation/Documentation-and-Software is in the public domain. I took murmur hash from the smhasher repo, the licensing information there say it is also in the public domain.

The Cephes functions are fine as is Murmur Hash if we go that route to the best of my understanding.

As for licensing, the patent for the algorithm expired in 2015 https://patents.google.com/patent/US5835600A/en so I believe this means we are allowed to use it.

If we go for RC5 (my preferred function @RAMitchell ) then it is covered by an expired patent, with the implementation being my own (based off the original paper https://doi.org/10.1007/3-540-60590-8_7). My understanding is the expired patent means the algorithm is in the public domain, so it is acceptable to use as we have here, however I don't know enough to say for sure.

To the best of my knowledge there are no other licensing/IP issues.

brycelelbach · 2020-09-20T01:48:25Z

I'm going to have one of the NVIDIA open source lawyers look at this.

alliepiper · 2020-10-04T18:03:06Z

@djns99 @RAMitchell I haven't been able to get a straight answer on this from our lawyers, and I get the feeling that the only way this patch will move forward is by rewriting it to avoid anything that is known to be associated with a non-NVIDIA owned patent.

Would you be willing to submit a different implementation that avoids the patent-related algorithms?

RAMitchell · 2020-10-04T21:31:09Z

Ok, we will look at rewriting with a different hash. @djns99 if you don't have time, I will take a stab at it later this week.

djns99 · 2020-10-04T21:55:35Z

@RAMitchell it would be good if you can. I suspect going back to MurmurHash2 will probably work now I have fixed the tests. Just a bit less academic backing behind it's quality. Should also be able to go to power of two instead of 4 again

alliepiper · 2020-10-05T17:01:52Z

Thanks. Sorry about this, but it's the only way to get this moving again in a reasonable amount of time.

alliepiper · 2020-10-12T20:26:22Z

Replaced by #1309 due to IP concerns.

djns99 mentioned this pull request Aug 14, 2020

thrust::shuffle gives low randomness permutations #1256

Closed

djns99 force-pushed the fix_shuffle branch from 863b942 to 0768213 Compare August 14, 2020 02:57

djns99 commented Aug 14, 2020

View reviewed changes

thrust/system/detail/generic/shuffle.inl Outdated Show resolved Hide resolved

djns99 force-pushed the fix_shuffle branch 2 times, most recently from 31e7415 to 47a7c62 Compare August 14, 2020 03:14

RAMitchell reviewed Aug 14, 2020

View reviewed changes

testing/shuffle.cu Show resolved Hide resolved

thrust/system/detail/generic/shuffle.inl Outdated Show resolved Hide resolved

thrust/system/detail/generic/shuffle.inl Outdated Show resolved Hide resolved

djns99 force-pushed the fix_shuffle branch 5 times, most recently from 134cf64 to f8a04a5 Compare August 20, 2020 04:39

brycelelbach added this to the 1.11.0 milestone Sep 2, 2020

djns99 force-pushed the fix_shuffle branch from f8a04a5 to 0661b58 Compare September 8, 2020 06:43

djns99 force-pushed the fix_shuffle branch 2 times, most recently from 640e7c3 to 3f1dcb4 Compare September 9, 2020 03:43

RAMitchell reviewed Sep 9, 2020

View reviewed changes

thrust/system/detail/generic/shuffle.inl Outdated Show resolved Hide resolved

thrust/system/detail/generic/shuffle.inl Outdated Show resolved Hide resolved

thrust/system/detail/generic/shuffle.inl Outdated Show resolved Hide resolved

djns99 force-pushed the fix_shuffle branch 2 times, most recently from 09f7a87 to f9ea205 Compare September 9, 2020 04:49

Updated shuffle implementation to use better bijective function

836facb

+ Uses RC5 cipher as bijective function + Adds more comprehensive tests for shuffle distribution

djns99 force-pushed the fix_shuffle branch from f9ea205 to 836facb Compare September 9, 2020 04:51

alliepiper assigned RAMitchell and djns99 Sep 18, 2020

alliepiper added the info needed Cannot make progress without more information. label Sep 18, 2020

alliepiper assigned alliepiper and unassigned RAMitchell and djns99 Sep 23, 2020

alliepiper added blocked Cannot make progress. and removed info needed Cannot make progress without more information. labels Sep 25, 2020

alliepiper removed the blocked Cannot make progress. label Oct 5, 2020

alliepiper assigned RAMitchell and unassigned alliepiper Oct 5, 2020

RAMitchell mentioned this pull request Oct 10, 2020

Improve shuffle quality #1309

Merged

alliepiper closed this Oct 12, 2020

alliepiper removed this from the 1.11.0 milestone Oct 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated shuffle implementation to use better hash function #1257

Updated shuffle implementation to use better hash function #1257

djns99 commented Aug 14, 2020 •

edited

Loading

alliepiper commented Aug 27, 2020

RAMitchell commented Aug 28, 2020

djns99 commented Aug 28, 2020 •

edited

Loading

RAMitchell commented Aug 28, 2020

djns99 commented Sep 9, 2020

djns99 commented Sep 9, 2020

RAMitchell left a comment •

edited

Loading

djns99 commented Sep 9, 2020

RAMitchell commented Sep 9, 2020

djns99 commented Sep 9, 2020 •

edited

Loading

RAMitchell commented Sep 9, 2020

djns99 commented Sep 12, 2020

alliepiper commented Sep 18, 2020

djns99 commented Sep 18, 2020

brycelelbach commented Sep 20, 2020

alliepiper commented Oct 4, 2020

RAMitchell commented Oct 4, 2020

djns99 commented Oct 4, 2020

alliepiper commented Oct 5, 2020

alliepiper commented Oct 12, 2020

Updated shuffle implementation to use better hash function #1257

Updated shuffle implementation to use better hash function #1257

Conversation

djns99 commented Aug 14, 2020 • edited Loading

alliepiper commented Aug 27, 2020

RAMitchell commented Aug 28, 2020

djns99 commented Aug 28, 2020 • edited Loading

RAMitchell commented Aug 28, 2020

djns99 commented Sep 9, 2020

djns99 commented Sep 9, 2020

RAMitchell left a comment • edited Loading

Choose a reason for hiding this comment

djns99 commented Sep 9, 2020

RAMitchell commented Sep 9, 2020

djns99 commented Sep 9, 2020 • edited Loading

RAMitchell commented Sep 9, 2020

djns99 commented Sep 12, 2020

alliepiper commented Sep 18, 2020

djns99 commented Sep 18, 2020

brycelelbach commented Sep 20, 2020

alliepiper commented Oct 4, 2020

RAMitchell commented Oct 4, 2020

djns99 commented Oct 4, 2020

alliepiper commented Oct 5, 2020

alliepiper commented Oct 12, 2020

djns99 commented Aug 14, 2020 •

edited

Loading

djns99 commented Aug 28, 2020 •

edited

Loading

RAMitchell left a comment •

edited

Loading

djns99 commented Sep 9, 2020 •

edited

Loading