Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyCUDA and PyOpenCL backends for ASSET joint prob. matrix calculation #404

Merged
merged 52 commits into from
Feb 25, 2021

Conversation

dizcza
Copy link
Member

@dizcza dizcza commented Feb 16, 2021

Benchmarks (compared to the original Python implementation):

  • PyCUDA: x1000 and more
  • PyOpenCL: x100 and more

Changes to _JSFUniformOrderStat3D class:

  1. Changed the default precision from double to float - I find floats perform reasonably well as doubles with my built-in Intel graphics card and yet floats are x4 faster than doubles for both backends. In either case, the users can easily change the precision manually at any time.
  2. Added pycuda() backend that allows copying arrays from RAM (virtual Python memory) directly to CUDA global memory.
  3. Added pyopencl() backend - suited for all laptops with built-in Intel GPU card. As in the PyCUDA backend, it allows copying directly to (Intel) GPU global memory.
  4. Renamed cuda() backend to _cuda(). This backend was the breakthrough in accelerating ASSET with CUDA, but it suffers from disk I/O operations. The _cuda() backend and joint_pmat_old.cu file should be removed once the PyCUDA backend becomes no longer an experimental feature.
  5. Added a watchdog that barks when the computed values of a joint prob. matrix are outside the valid [0, 1] interval. For this reason, the tolerance parameter is added in the joint_probability_matrix function.
  6. Added description of how to install CUDA and OpenCL support in Elephant installation documentation. In particular, if PyOpenCL backend is used, the users need to disable GPU Hangcheck (described in install.rst) to prevent kernel resets when computation takes a long time to finish. Doing so, of course, makes the system unresponsive until the compute program terminates.

Changes to _pmat_neighbors function:

  1. Accelerated with CUDA and OpenCL. Rewrote to _PMatNeighbors class to facilitate different backends. It became thousands of times faster, the exact speedup doesn't matter - it runs in the blink of an eye.

Changes to other Python functions:

  1. Optimized du = np.diff(u) in both memory and speed (see the compute function).
  2. x5 less memory footprint of the cluster_matrix_entries function with chunking. The chunk size is controlled by the working_memory parameter. With working_memory set to 100, the peak allocation memory (of cluster_matrix_entries and, therefore, ASSET itself since it was the most memory consuming part of ASSET after the bug in pmat is fixed in Memory efficient and faster implementation of ASSET pmat analytical #399) is reduced to, compared to the master branch:
mmat.shape No chunking, Mb Chunked, Mb
(150, 150) 4300 815
(170, 170) 7600 1440
(200, 200) MemErr 3000
(250, 250) MemErr 8000
(270, 270) MemErr 11200

It's also possible to install pyopencl with pip if you've installed Intel GPU driver manually. You need to make sure that pyopencl sees your system-wide libOpenCL.so*. First, find the location of libOpenCL.so and then provide its directory path as LD_LIBRARY_PATH environment flag. Here are the commands I used:

$ ldconfig -p | grep OpenCL
        libOpenCL.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libOpenCL.so.1
$ LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu pip install pyopencl

I didn't put it into the install.rst because it's advanced stuff and conda-forge works reliably.


CUDA part is not tested because Travis does not provide GPUs. Here are the steps to test it manually on Google Colab:

  1. Runtime -> Change runtime type -> GPU
  2. pip install -e git+https://github.com/INM-6/elephant#egg=elephant[extras,cuda]
  3. Restart the kernel.
  4. python /content/src/elephant/elephant/test/test_asset.py

@coveralls
Copy link
Collaborator

coveralls commented Feb 16, 2021

Coverage Status

Coverage decreased (-0.9%) to 88.749% when pulling abe146a on INM-6:cuda/asset into 5e95f77 on NeuralEnsemble:master.

@pep8speaks
Copy link

pep8speaks commented Feb 19, 2021

Hello @dizcza! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-02-22 13:48:36 UTC

@dizcza dizcza merged commit e56b1ac into NeuralEnsemble:master Feb 25, 2021
@dizcza dizcza deleted the cuda/asset branch February 25, 2021 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants