Skip to content

Commit

Permalink
Parquet writer dictionary encoding refactor (#8476)
Browse files Browse the repository at this point in the history
Replaces previous parquet dictionary encoding code with one that uses `cuCollections`' static map.

Adds [`cuCollections`](https://github.com/NVIDIA/cuCollections) to `libcudf`

Closes #7873
Fixes #8890 

**Currently blocked on Pascal support for static_map in cuCollections**

(More details to be added)

<!--

Thank you for contributing to cuDF :)

Here are some guidelines to help the review process go smoothly.

1. Please write a description in this text box of the changes that are being
   made.

2. Please ensure that you have written units tests for the changes made/features
   added.

3. There are CI checks in place to enforce that committed code follows our style
   and syntax standards. Please see our contribution guide in `CONTRIBUTING.MD`
   in the project root for more information about the checks we perform and how
   you can run them locally.

4. If you are closing an issue please use one of the automatic closing words as
   noted here: https://help.github.com/articles/closing-issues-using-keywords/

5. If your pull request is not ready for review but you want to make use of the
   continuous integration testing facilities please mark your pull request as Draft.
   https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/changing-the-stage-of-a-pull-request#converting-a-pull-request-to-a-draft

6. If your pull request is ready to be reviewed without requiring additional
   work on top of it, then remove it from "Draft" and make it "Ready for Review".
   https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/changing-the-stage-of-a-pull-request#marking-a-pull-request-as-ready-for-review

   If assistance is required to complete the functionality, for example when the
   C/C++ code of a feature is complete but Python bindings are still required,
   then add the label `help wanted` so that others can triage and assist.
   The additional changes then can be implemented on top of the same PR.
   If the assistance is done by members of the rapidsAI team, then no
   additional actions are required by the creator of the original PR for this,
   otherwise the original author of the PR needs to give permission to the
   person(s) assisting to commit to their personal fork of the project. If that
   doesn't happen then a new PR based on the code of the original PR can be
   opened by the person assisting, which then will be the PR that will be
   merged.

7. Once all work has been done and review has taken place please do not add
   features or make changes out of the scope of those requested by the reviewer
   (doing this just add delays as already reviewed code ends up having to be
   re-reviewed/it is hard to tell what is new etc!). Further, please do not
   rebase your branch on the target branch, force push, or rewrite history.
   Doing any of these causes the context of any comments made by reviewers to be lost.
   If conflicts occur against the target branch they should be resolved by
   merging the target branch into the branch used for making the pull request.

Many thanks in advance for your cooperation!

-->

Authors:
  - Devavret Makkar (https://github.com/devavret)
  - Mark Harris (https://github.com/harrism)

Approvers:
  - Robert Maynard (https://github.com/robertmaynard)
  - Nghia Truong (https://github.com/ttnghia)
  - Vukasin Milovanovic (https://github.com/vuule)

URL: #8476
  • Loading branch information
devavret authored Aug 19, 2021
1 parent 417b34d commit f95b43e
Show file tree
Hide file tree
Showing 9 changed files with 708 additions and 739 deletions.
8 changes: 6 additions & 2 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,9 @@ include(cmake/thirdparty/CUDF_GetArrow.cmake)
include(cmake/thirdparty/CUDF_GetDLPack.cmake)
# find libcu++
include(cmake/thirdparty/CUDF_GetLibcudacxx.cmake)
# find cuCollections
# Should come after including thrust and libcudacxx
include(cmake/thirdparty/CUDF_GetcuCollections.cmake)
# find or install GoogleTest
include(cmake/thirdparty/CUDF_GetGTest.cmake)
# preprocess jitify-able kernels
Expand Down Expand Up @@ -285,7 +288,7 @@ add_library(cudf
src/io/orc/writer_impl.cu
src/io/parquet/compact_protocol_writer.cpp
src/io/parquet/page_data.cu
src/io/parquet/page_dict.cu
src/io/parquet/chunk_dict.cu
src/io/parquet/page_enc.cu
src/io/parquet/page_hdr.cu
src/io/parquet/parquet.cpp
Expand Down Expand Up @@ -527,7 +530,8 @@ target_link_libraries(cudf
PUBLIC ZLIB::ZLIB
${ARROW_LIBRARIES}
cudf::Thrust
rmm::rmm)
rmm::rmm
PRIVATE cuco::cuco)

if(CUDA_STATIC_RUNTIME)
# Tell CMake what CUDA language runtime to use
Expand Down
38 changes: 38 additions & 0 deletions cpp/cmake/thirdparty/CUDF_GetcuCollections.cmake
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
#=============================================================================
# Copyright (c) 2021, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#=============================================================================

function(find_and_configure_cucollections)

if(TARGET cuco::cuco)
return()
endif()

# Find or install cuCollections
CPMFindPackage(NAME cuco
GITHUB_REPOSITORY NVIDIA/cuCollections
GIT_TAG 0d602ae21ea4f38d23ed816aa948453d97b2ee4e
OPTIONS "BUILD_TESTS OFF"
"BUILD_BENCHMARKS OFF"
"BUILD_EXAMPLES OFF"
)

set(CUCO_INCLUDE_DIR "${cuco_SOURCE_DIR}/include" PARENT_SCOPE)

# Make sure consumers of cudf can also see cuco::cuco target
fix_cmake_global_defaults(cuco::cuco)
endfunction()

find_and_configure_cucollections()
Loading

0 comments on commit f95b43e

Please sign in to comment.