-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use KvikIO as the GDS backend #10468
Conversation
c5712c6
to
aeba830
Compare
Codecov Report
@@ Coverage Diff @@
## branch-22.06 #10468 +/- ##
================================================
+ Coverage 86.31% 86.33% +0.01%
================================================
Files 140 140
Lines 22300 22300
================================================
+ Hits 19249 19253 +4
+ Misses 3051 3047 -4
Continue to review full report at Codecov.
|
Can we get a gbench on GDS before and after? |
Are |
Regarding the policy: I assume that KvikIO does not interact with the cuFile config? |
For the most part, yes (if not completely). AFAIK, KvikIO includes all the best parts of this implementation :) |
We should work out the dependency tree for CUDF and KvikIO in regards to nvcomp. Looking at rapidsai/kvikio#23 it seems that the current approach would be to use cudf installs for nvcomp, which would make this dependency tree pretty messy. |
@madsbk I think this should be moved to 22.06, IMO we are too close to the code freeze for a PR that affects most of IO. |
Do we want an option to enable the use of `direct_read_source`?
@devavret Is there a guide on how to run this? @vuule I agree, we need an option to enable/disable GDS but KvikIO should work in all cases, we don't need a For now, I suggest that we have:
|
Regarding rapidsai/kvikio#23, it is only for the Python bindings, the C++ API this PR uses doesn't depend on nvcomp. The only hard dependency of the KvikIO's C++ API is CUDAToolkit. |
You can build and run the |
@madsbk, could you remove those as part of this PR? |
Will do, I just need to know how much to remove. It depends on the policy: do we want to change the |
TODO: before merging, this should be changed to KvikIO main repos
I am closing this PR in favor of #10593. |
This PR is a new take on #10468 that is less intrusive. It keeps the existing GDS backend and adds a new option `LIBCUDF_CUFILE_POLICY=KVIKIO` that make cudf use KvikIO. The default policy is still `LIBCUDF_CUFILE_POLICY=GDS` cc. @vuule, @devavret, @GregoryKimball Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Nghia Truong (https://github.com/ttnghia) - Robert Maynard (https://github.com/robertmaynard) - Devavret Makkar (https://github.com/devavret) - Vyas Ramasubramani (https://github.com/vyasr) URL: #10593
Replacing the current GDS
datasource
with KvikIO.Background
KvikIO is a new C++ and Python interface to cuFile that combines the current GDS implementation in cuDF and the current GDS implementation in cuCIM into a common library. The idea is to avoid double work and join forces to implement high performances IO for all of RAPIDS. In addition, KvikIO eliminates the needs for all projects to write a fallback implementation when GDS and/or cuFile isn't available. KvikIO should work in any circumstance even if
cufile.h
is available at compile time.Policy
This PR reads
LIBCUDF_CUFILE_POLICY
to determent if KvikIO should be used."OFF"
disables KvikIO and"GDS"
and"ALWAYS"
enables KivkIO. It doesn't differentiate between"GDS"
and"ALWAYS"
and the question is if it should?The current GDS backend rewrites the cuFile's configuration file in order to set cuFile's compatibility mode explicitly. Do we still want this feature? Isn't it better to just let users configure cuFile via. the dedicated configuration file?
Depend on rapidsai/kvikio#43
cc. @vuule