Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add Nvcomp python bindings to kvikio. #24

Merged
merged 44 commits into from
Mar 29, 2022

Conversation

thomcom
Copy link
Contributor

@thomcom thomcom commented Mar 9, 2022

This PR adds the basic bindings for Cascaded and Lz4, and a broken Snappy binding. I need to figure out how to import nvcomp properly before this will work with kvikio.

@quasiben
Copy link
Member

quasiben commented Mar 9, 2022

Thanks @thomcom ! As part of this PR will you also include example notebooks and the benchmark plots you built earlier. The plots don't need to be committed in the PR but uploaded in the PR comments

@thomcom
Copy link
Contributor Author

thomcom commented Mar 10, 2022

Thanks @quasiben I've added the files, I'll move them around later.

@thomcom
Copy link
Contributor Author

thomcom commented Mar 14, 2022

This is basically ready for review. Tomorrow I will add the notebook, benchmarks, and get the style updated correctly.

@jakirkham
Copy link
Member

Do we need to do something to make sure nvCOMP is available on CI?

@quasiben
Copy link
Member

Do we need to do something to make sure nvCOMP is available on CI?

I don't believe so as nvcomp tooling is tested as part of cuDF/libcuDF

@thomcom looks like there are some style check failures

@jakirkham
Copy link
Member

I don't believe so as nvcomp tooling is tested as part of cuDF/libcuDF

How is the nvCOMP binding code here tested?

@thomcom
Copy link
Contributor Author

thomcom commented Mar 15, 2022

@jakirkham the python bindings are tested here in compress->decompress tests for Cascaded and LZ4. python/tests/test_nvcomp.pyJ

@jakirkham
Copy link
Member

Thanks. Was more asking are we running those on gpuCI here? Relatedly are there any gpuCI changes we need to include here to run the tests.

@thomcom
Copy link
Contributor Author

thomcom commented Mar 15, 2022

Oh of course. I don't know if kvikio is running with any gpuCI yet, but there are tests! The nvcomp tests run automatically with the kvikio tests, so if kvikio is being run in gpuCI then so will nvcomp.

@thomcom
Copy link
Contributor Author

thomcom commented Mar 15, 2022

These CI failures are mysterious to me.

@jakirkham
Copy link
Member

Maybe try merging in branch-22.04 or pushing another commit? Seems like CI failed to checkout the PR. So would just give it another try. If it still fails, we can ask for help :)

@thomcom
Copy link
Contributor Author

thomcom commented Mar 15, 2022

rerun tests

@jakirkham
Copy link
Member

Think this CI error is capturing the issue that I'm wondering about

      building 'kvikio._lib.nvcomp' extension
      /usr/local/gcc9/bin/gcc -Wsign-compare -DNDEBUG -fwrapv -O3 -Wall -fPIC -O3 -I/workspace/.conda-bld/kvikio_1647375755170/work/cpp/include -I/workspace/.conda-bld/kvikio_1647375755170/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/include -I/usr/local/cuda/include -I/workspace/.conda-bld/kvikio_1647375755170/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/include/python3.8 -c kvikio/_lib/nvcomp.cpp -o build/temp.linux-x86_64-3.8/kvikio/_lib/nvcomp.o -std=c++17
      kvikio/_lib/nvcomp.cpp:776:10: fatal error: nvcomp.h: No such file or directory
        776 | #include "nvcomp.h"
            |          ^~~~~~~~~~
      compilation terminated.

Do we need to be cloning or downloading nvCOMP somewhere on CI first?

@jakirkham
Copy link
Member

Thinking we need to add this code and then call it somewhere like this.

@thomcom
Copy link
Contributor Author

thomcom commented Mar 15, 2022

I think we're already depending on cudf in this, are we not? There's a piece in setup.py about loading the headers from NVCOMP_HOME if they aren't installed. I will look again.

Copy link
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks Thomson! 😄

@jakirkham jakirkham requested review from madsbk and vyasr March 24, 2022 20:44
Copy link
Member

@madsbk madsbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for spamming the same suggestion, but I think some docstrings would be good. I doesn't have to anything lengthy. Just a short description, list of arguments/types, and maybe an url to the nvcomp doc when relevant.

Otherwise it looks good!

python/kvikio/nvcomp.py Outdated Show resolved Hide resolved
python/kvikio/nvcomp.py Outdated Show resolved Hide resolved
python/kvikio/nvcomp.py Outdated Show resolved Hide resolved
python/kvikio/nvcomp.py Outdated Show resolved Hide resolved
python/kvikio/nvcomp.py Outdated Show resolved Hide resolved
python/kvikio/nvcomp.py Show resolved Hide resolved
python/kvikio/nvcomp.py Outdated Show resolved Hide resolved
python/kvikio/nvcomp.py Show resolved Hide resolved
python/kvikio/nvcomp.py Outdated Show resolved Hide resolved
python/kvikio/nvcomp.py Outdated Show resolved Hide resolved
@thomcom
Copy link
Contributor Author

thomcom commented Mar 24, 2022

@madsbk I added the docs that I forgot this morning. Thanks! I also dropped BatchedSnappy which has some issues I won't be working on presently.

Copy link
Member

@madsbk madsbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @thomcom, great work!

I took the liberty to rename nvcomp Cascade Benchmarks.ipynb -> nvcomp_cascade_benchmarks.ipynb, hope that is okay.

Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for not reviewing sooner, I was waiting for repo permissions before I did much. I started making some comments on the pynvcomp files, but then I realized that those are taken from somewhere else. @thomcom did we modify them at all, or did we just copy them in and you just added nvcomp.py? I see at least a few discussions on those files around the Array type and the Enum import fix, so we've made nonzero modifications, but I'm not sure how much we intend to change them.

python/cmake/kvikio_python_helpers.cmake Outdated Show resolved Hide resolved
python/cmake/thirdparty/get_nvcomp.cmake Outdated Show resolved Hide resolved
python/kvikio/_lib/pynvcomp.pxd Outdated Show resolved Hide resolved
python/kvikio/_lib/pynvcomp.pxd Outdated Show resolved Hide resolved
python/kvikio/_lib/pynvcomp.pxd Outdated Show resolved Hide resolved
python/kvikio/nvcomp.py Outdated Show resolved Hide resolved
Comment on lines 59 to 68
self.dtype = dtype
self.config = config
self.compressor = _lib._CascadedCompressor(
cp_to_nvcomp_dtype(self.dtype).value,
config.num_RLEs,
config.num_deltas,
config.use_bp,
)
self.decompressor = _lib._CascadedDecompressor()
self.s = cp.cuda.Stream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want all of these attributes to be visible to users, or are some of them internal details? Should they be marked as internal with underscores? Maybe we want immutable property-based access to some of them (i.e. a property without a setter)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think immutable is a good idea here, the fact that they are properties of the class is really just making them available as queryable parameters. They've been set in the underlying C++ class and are no longer accessible by any API, so I keep a record here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for future work: Depending on whether users might actually need to access these properties, I would recommend either renaming them with underscores (e.g. self._decompressor) or turning them into properties instead of raw attributes.

python/kvikio/nvcomp.py Outdated Show resolved Hide resolved
self.use_bp = use_bp


class CascadedCompressor:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another note you can feel free to push to future work, but it certainly seems like all of these compressors should inherit from an abstract parent class with a concrete constructor to standardize initializing the dtype, config, and stream parameters and abstract compress and decompress methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I'd like to push it till later. :)

python/tests/test_nvcomp.py Outdated Show resolved Hide resolved
@vyasr
Copy link
Contributor

vyasr commented Mar 25, 2022

@thomcom my last request here would be to move pynvcomp.pxd to something like libnvcomp.pxd, then cimport it explicitly in pynvcomp.pyx. In Cython pxd and pyx files with the same name are interpreted in a very specific way, and in this case that relationship isn't accurate. In particular, there's no pure Cython in pynvcomp.pyx that you are trying to make cimportable in other Cython files, you're exclusively creating objects that are visible in pure Python. pynvcomp.pxd is not defining an interface for pynvcomp.pyx, it's importing the "symbols" from the C++ library. As such, it's misleading to have the names aligned.

@thomcom
Copy link
Contributor Author

thomcom commented Mar 28, 2022

Hey @vyasr, your comment is interesting to me. My approach with .pxd and .pyx naming has always been that the .pxd file defines the interface to the existing C/C++ code, and the pyx implements the python bindings for that interface. In this case, the pure cython in pynvcomp.pyx is only the typecastings that convert between python objects and c pointers. It's not intended for cimport into other cython files, only to expose the C++ interface to python. Would you put the C++ interface and the python visible bindings into the .pyx file and not have a pxd file at all? I see that in _lib here there's a kvikio_cxx_api.pxd and a libkvikio.pyx, how about I rename into that pattern?

@vyasr
Copy link
Contributor

vyasr commented Mar 28, 2022

You might be interested in looking at the first section on this page. Specifically, it's valuable to distinguish between use-cases 1 and 3. Use-case 1 is really what the pxd file in this PR is doing (exposing C++ code). In RAPIDS we rarely write pure Cython, it's almost always just a wrapper around C++, so we don't run into use-case 3 as much, but there are some good examples in cudf like column.pxd and copying.pxd.

Regarding the naming, I suggested the libFoo.pxd for C++ bindings and pyFoo.pyx or just Foo.pyx for the Cython since I think that's the common convention that I have observed across most other Cython packages that I've worked with. I'm fine with your proposal too since it maintains internal consistency in kvikio. Feel free to check with other devs too in case they express different preferences from mine.

@madsbk
Copy link
Member

madsbk commented Mar 28, 2022

I see that in _lib here there's a kvikio_cxx_api.pxd and a libkvikio.pyx, how about I rename into that pattern?

I think this is a good idea. Then, if we want to change to another naming scheme like @vyasr suggest, we can change the name of both libraries at the same time.

…rds. Change CascadedOptions to defauled named args.
@thomcom
Copy link
Contributor Author

thomcom commented Mar 28, 2022

All requests have been answered.

Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomcom I've unresolved a pair of conversations to track potential future work, but otherwise looks good to go to me now!

@jakirkham jakirkham requested a review from madsbk March 28, 2022 18:56
@madsbk madsbk added improvement Improves an existing functionality non-breaking Introduces a non-breaking change python Affects the Python API of KvikIO labels Mar 29, 2022
@madsbk madsbk merged commit f0d2bed into rapidsai:branch-22.04 Mar 29, 2022
@madsbk
Copy link
Member

madsbk commented Mar 29, 2022

Thanks @thomcom, it is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improves an existing functionality non-breaking Introduces a non-breaking change python Affects the Python API of KvikIO
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants