-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Add Nvcomp python bindings to kvikio. #24
Conversation
Thanks @thomcom ! As part of this PR will you also include example notebooks and the benchmark plots you built earlier. The plots don't need to be committed in the PR but uploaded in the PR comments |
… bit. Adding pure python wrappers for Cascade and Lz4 at this time.
Thanks @quasiben I've added the files, I'll move them around later. |
This is basically ready for review. Tomorrow I will add the notebook, benchmarks, and get the style updated correctly. |
Do we need to do something to make sure nvCOMP is available on CI? |
I don't believe so as nvcomp tooling is tested as part of cuDF/libcuDF @thomcom looks like there are some style check failures |
How is the nvCOMP binding code here tested? |
@jakirkham the python bindings are tested here in compress->decompress tests for Cascaded and LZ4. python/tests/test_nvcomp.pyJ |
Thanks. Was more asking are we running those on gpuCI here? Relatedly are there any gpuCI changes we need to include here to run the tests. |
Oh of course. I don't know if kvikio is running with any gpuCI yet, but there are tests! The nvcomp tests run automatically with the kvikio tests, so if kvikio is being run in gpuCI then so will nvcomp. |
These CI failures are mysterious to me. |
Maybe try merging in |
rerun tests |
Think this CI error is capturing the issue that I'm wondering about
Do we need to be cloning or downloading nvCOMP somewhere on CI first? |
Thinking we need to add this code and then call it somewhere like this. |
I think we're already depending on cudf in this, are we not? There's a piece in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks Thomson! 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for spamming the same suggestion, but I think some docstrings would be good. I doesn't have to anything lengthy. Just a short description, list of arguments/types, and maybe an url to the nvcomp doc when relevant.
Otherwise it looks good!
@madsbk I added the docs that I forgot this morning. Thanks! I also dropped BatchedSnappy which has some issues I won't be working on presently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @thomcom, great work!
I took the liberty to rename nvcomp Cascade Benchmarks.ipynb -> nvcomp_cascade_benchmarks.ipynb
, hope that is okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for not reviewing sooner, I was waiting for repo permissions before I did much. I started making some comments on the pynvcomp files, but then I realized that those are taken from somewhere else. @thomcom did we modify them at all, or did we just copy them in and you just added nvcomp.py? I see at least a few discussions on those files around the Array type and the Enum import fix, so we've made nonzero modifications, but I'm not sure how much we intend to change them.
python/kvikio/nvcomp.py
Outdated
self.dtype = dtype | ||
self.config = config | ||
self.compressor = _lib._CascadedCompressor( | ||
cp_to_nvcomp_dtype(self.dtype).value, | ||
config.num_RLEs, | ||
config.num_deltas, | ||
config.use_bp, | ||
) | ||
self.decompressor = _lib._CascadedDecompressor() | ||
self.s = cp.cuda.Stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want all of these attributes to be visible to users, or are some of them internal details? Should they be marked as internal with underscores? Maybe we want immutable property-based access to some of them (i.e. a property without a setter)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think immutable is a good idea here, the fact that they are properties of the class is really just making them available as queryable parameters. They've been set in the underlying C++ class and are no longer accessible by any API, so I keep a record here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for future work: Depending on whether users might actually need to access these properties, I would recommend either renaming them with underscores (e.g. self._decompressor
) or turning them into properties instead of raw attributes.
self.use_bp = use_bp | ||
|
||
|
||
class CascadedCompressor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another note you can feel free to push to future work, but it certainly seems like all of these compressors should inherit from an abstract parent class with a concrete constructor to standardize initializing the dtype, config, and stream parameters and abstract compress
and decompress
methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I'd like to push it till later. :)
@thomcom my last request here would be to move |
Hey @vyasr, your comment is interesting to me. My approach with .pxd and .pyx naming has always been that the .pxd file defines the interface to the existing C/C++ code, and the pyx implements the python bindings for that interface. In this case, the pure cython in |
You might be interested in looking at the first section on this page. Specifically, it's valuable to distinguish between use-cases 1 and 3. Use-case 1 is really what the pxd file in this PR is doing (exposing C++ code). In RAPIDS we rarely write pure Cython, it's almost always just a wrapper around C++, so we don't run into use-case 3 as much, but there are some good examples in cudf like column.pxd and copying.pxd. Regarding the naming, I suggested the |
I think this is a good idea. Then, if we want to change to another naming scheme like @vyasr suggest, we can change the name of both libraries at the same time. |
…rds. Change CascadedOptions to defauled named args.
All requests have been answered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thomcom I've unresolved a pair of conversations to track potential future work, but otherwise looks good to go to me now!
Thanks @thomcom, it is merged. |
This PR adds the basic bindings for Cascaded and Lz4, and a broken Snappy binding. I need to figure out how to import nvcomp properly before this will work with kvikio.