-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyCUDA and PyOpenCL backends for ASSET joint prob. matrix calculation #404
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Benchmarks (compared to the original Python implementation):
Changes to
_JSFUniformOrderStat3D
class:pycuda()
backend that allows copying arrays from RAM (virtual Python memory) directly to CUDA global memory.pyopencl()
backend - suited for all laptops with built-in Intel GPU card. As in the PyCUDA backend, it allows copying directly to (Intel) GPU global memory.cuda()
backend to_cuda()
. This backend was the breakthrough in accelerating ASSET with CUDA, but it suffers from disk I/O operations. The_cuda()
backend andjoint_pmat_old.cu
file should be removed once the PyCUDA backend becomes no longer an experimental feature.[0, 1]
interval. For this reason, thetolerance
parameter is added in thejoint_probability_matrix
function.install.rst
) to prevent kernel resets when computation takes a long time to finish. Doing so, of course, makes the system unresponsive until the compute program terminates.Changes to
_pmat_neighbors
function:_PMatNeighbors
class to facilitate different backends. It became thousands of times faster, the exact speedup doesn't matter - it runs in the blink of an eye.Changes to other Python functions:
du = np.diff(u)
in both memory and speed (see thecompute
function).cluster_matrix_entries
function with chunking. The chunk size is controlled by theworking_memory
parameter. Withworking_memory
set to 100, the peak allocation memory (ofcluster_matrix_entries
and, therefore, ASSET itself since it was the most memory consuming part of ASSET after the bug in pmat is fixed in Memory efficient and faster implementation of ASSET pmat analytical #399) is reduced to, compared to the master branch:It's also possible to install pyopencl with pip if you've installed Intel GPU driver manually. You need to make sure that pyopencl sees your system-wide libOpenCL.so*. First, find the location of libOpenCL.so and then provide its directory path as
LD_LIBRARY_PATH
environment flag. Here are the commands I used:I didn't put it into the
install.rst
because it's advanced stuff and conda-forge works reliably.CUDA part is not tested because Travis does not provide GPUs. Here are the steps to test it manually on Google Colab:
pip install -e git+https://github.com/INM-6/elephant#egg=elephant[extras,cuda]
python /content/src/elephant/elephant/test/test_asset.py