Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Add MD5 to existing hashing functionality #5438
[REVIEW] Add MD5 to existing hashing functionality #5438
Changes from 7 commits
b626432
0d3845c
d32552d
a5ed47f
f4ad66e
fa8c5d1
535ff93
c1725ec
14d7672
fa93274
94ceeae
8ae065e
efd8fdb
ba9efae
f3daf4b
f41c8cc
614182b
6aaa7b8
3b68a9d
7379b23
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
run spell check on all comments once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't suggest using
thrust::copy
here. Use the compiler built-inmemcpy
, otherwise it may not get optimized away.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
memcpy is slower.
https://stackoverflow.com/a/49037139/1550940 (tested on 3.0 CC device. Need to validate once)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jrhemstad
memcpy copies one byte at a time and uses loop.
thrust::copy_n
copies the type size at a time.For example, for int it uses unrolled loop of copying 32-bit at a time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No actual copies should be performed. The compiler will optimize them away. This is a standard trick for type-punning without breaking aliasing rules: https://gist.github.com/shafik/848ae25ee209f698763cffee272a58f8#how-do-we-type-pun-correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example: https://godbolt.org/z/xTrKnj
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's optimizing away if these are registers!
(for global memory, it doesn't, as expected).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the compiler needs to see the declarations in order to elide the copies.
It is not expected that the same optimization doesn't work for global memory. This is purely a pessimization made by nvcc where if it can't see the declaration, it assumes the pointer is underaligned and performs 1B copies. Notice how gcc makes the same optimization without needing to go through temporaries: https://godbolt.org/z/f6Kra8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing this to
memcpy(&buffer_element_as_int, hash_state->buffer + g * 4, 4);
the compilation being killedMaybe i'm missing something here, but that's why I had gone back to
thrust::copy_n
despite your original suggestion.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:/ That will require further investigation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After removing the
md5_element_hasher
class, the replacing the copy_n worked fine.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use funnel_shift here.
Check other bit intrinsic functions too.
https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__INTRINSIC__INT.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I substituted
__funnelshift_l
in -- replacing lines 78-79 withB = B + __funnelshift_l(F, F, shift_constants[((j / 16) * 4) + (j % 4)]);
but I'm a little hesitant to commit and push the changes because it more than tripled my build time. Any idea why this would cause such a massive jump?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides, build time. Does it improve performance?
Is it built only for your GPU architecture or all GPU architectures?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was building for multiple architectures, but I don't think I was comparing apples to apples because I was using ninja to build cudf. In a few tests adding and reverting changes, the compile times seem nearly identical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is
MD5Hash
templated onKey
in addition to theoperator()
s being templates? I would think the struct does not need to be a template and instead just make theoperator()
be a template. That would simplify your specializations.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was largely copying the structure of the murmur hash function, I'll change the operator to a template.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but notice that
MurmurHas3_32
is a class template:cudf/cpp/include/cudf/detail/utilities/hash_functions.cuh
Lines 32 to 33 in 855e735
But the
operator()
is not:cudf/cpp/include/cudf/detail/utilities/hash_functions.cuh
Line 76 in 855e735
In your case, both are templates, which just complicates the specializations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to fix the type dispatching error this was changed. The base operator is no longer templated https://github.com/rapidsai/cudf/pull/5438/files#diff-a6ce3f9a4f61a23dd6469473c7dbf15fR147
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the template when I removed the md5_element_hasher
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of repeating
hash_constants
andshift_constants
as function parameters, it might be nicer to make these data members of yourMD5Hash
class that are set at construction.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They could be made
__constant__
.