-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up k_find_block_bounds #1105
Conversation
* There is something missing in the tests
block_size = 32 | ||
def reference_block_bounds(coords: NDArray, box: NDArray, block_size: int) -> Tuple[NDArray, NDArray]: | ||
# Make a copy to avoid modify the coordinates that end up used later by the Neighborlist | ||
coords = coords.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't have this copy, you can end up re-centering all of the coordinates which results in not testing the PBCs in the kernel. This is an issue on master, but thankfully the kernel was implemented correctly. But when the kernel incorrectly handled PBCs, tests would still pass. With this copy and a bad implementation, it does correctly fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch!
Still investigatingWhy bad implementations of k_find_block_bounds could pass when generating bad boxes
The issue was that the python was actually imaging the coordinates prior to passing them into the Neighborlist (https://github.com/proteneer/timemachine/blob/master/tests/test_nblist.py#L41-L59) which resulted in never actually testing that PBCs were handled correctly in the GPU kernel.
Benchmarks
A10 Cuda Arch 86
Ranges from a <1% speed up to a 3% speed up. The cases where it is most significant is where there are more neighborlists which is going to be the non-apo cases where there are all pairs + ixn groups which result in more neighborlist evaluations
Master
Shuffle syncing across threads
Kernel Timings
Master
Changes