diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index ae895daf28a..2c5ecf68690 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -19,15 +19,18 @@ Here are some guidelines to help the review process go smoothly. noted here: https://help.github.com/articles/closing-issues-using-keywords/ 5. If your pull request is not ready for review but you want to make use of the - continuous integration testing facilities please label it with `[WIP]`. + continuous integration testing facilities please mark your pull request as Draft. + https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/changing-the-stage-of-a-pull-request#converting-a-pull-request-to-a-draft 6. If your pull request is ready to be reviewed without requiring additional - work on top of it, then remove the `[WIP]` label (if present) and replace - it with `[REVIEW]`. If assistance is required to complete the functionality, - for example when the C/C++ code of a feature is complete but Python bindings - are still required, then add the label `[HELP-REQ]` so that others can triage - and assist. The additional changes then can be implemented on top of the - same PR. If the assistance is done by members of the rapidsAI team, then no + work on top of it, then remove it from "Draft" and make it "Ready for Review". + https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/changing-the-stage-of-a-pull-request#marking-a-pull-request-as-ready-for-review + + If assistance is required to complete the functionality, for example when the + C/C++ code of a feature is complete but Python bindings are still required, + then add the label `help wanted` so that others can triage and assist. + The additional changes then can be implemented on top of the same PR. + If the assistance is done by members of the rapidsAI team, then no additional actions are required by the creator of the original PR for this, otherwise the original author of the PR needs to give permission to the person(s) assisting to commit to their personal fork of the project. If that @@ -39,10 +42,10 @@ Here are some guidelines to help the review process go smoothly. features or make changes out of the scope of those requested by the reviewer (doing this just add delays as already reviewed code ends up having to be re-reviewed/it is hard to tell what is new etc!). Further, please do not - rebase your branch on main/force push/rewrite history, doing any of these - causes the context of any comments made by reviewers to be lost. If - conflicts occur against main they should be resolved by merging main - into the branch used for making the pull request. + rebase your branch on the target branch, force push, or rewrite history. + Doing any of these causes the context of any comments made by reviewers to be lost. + If conflicts occur against the target branch they should be resolved by + merging the target branch into the branch used for making the pull request. Many thanks in advance for your cooperation! diff --git a/CHANGELOG.md b/CHANGELOG.md index 21ab8ed3274..df002654aa7 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,7 @@ +# cuDF 0.20.0 (Date TBD) + +Please see https://github.com/rapidsai/cudf/releases/tag/v0.20.0a for the latest changes to this development branch. + # cuDF 0.19.0 (Date TBD) Please see https://github.com/rapidsai/cudf/releases/tag/v0.19.0a for the latest changes to this development branch. diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 8c332539ec7..4edd6965c4b 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -131,14 +131,14 @@ run each time you commit changes. Compiler requirements: -* `gcc` version 5.4+ -* `nvcc` version 10.0+ -* `cmake` version 3.14.0+ +* `gcc` version 9.3+ +* `nvcc` version 11.0+ +* `cmake` version 3.18.0+ CUDA/GPU requirements: -* CUDA 10.0+ -* NVIDIA driver 410.48+ +* CUDA 11.0+ +* NVIDIA driver 450.80.02+ * Pascal architecture or better You can obtain CUDA from [https://developer.nvidia.com/cuda-downloads](https://developer.nvidia.com/cuda-downloads). @@ -320,7 +320,7 @@ flag. Below is a list of the available arguments and their purpose: | `PYARROW_VERSION` | 1.0.1 | Not supported | set pyarrow version | | `CMAKE_VERSION` | newest | >=3.14 | set cmake version | | `CYTHON_VERSION` | 0.29 | Not supported | set Cython version | -| `PYTHON_VERSION` | 3.6 | 3.7 | set python version | +| `PYTHON_VERSION` | 3.7 | 3.8 | set python version | --- diff --git a/README.md b/README.md index 687d25c200b..044f3bffa1a 100644 --- a/README.md +++ b/README.md @@ -57,35 +57,35 @@ Please see the [Demo Docker Repository](https://hub.docker.com/r/rapidsai/rapids ### CUDA/GPU requirements -* CUDA 10.1+ -* NVIDIA driver 418.39+ +* CUDA 11.0+ +* NVIDIA driver 450.80.02+ * Pascal architecture or better (Compute Capability >=6.0) ### Conda cuDF can be installed with conda ([miniconda](https://conda.io/miniconda.html), or the full [Anaconda distribution](https://www.anaconda.com/download)) from the `rapidsai` channel: -For `cudf version == 0.18` : +For `cudf version == 0.19` : ```bash # for CUDA 10.1 conda install -c rapidsai -c nvidia -c numba -c conda-forge \ - cudf=0.18 python=3.7 cudatoolkit=10.1 + cudf=0.19 python=3.7 cudatoolkit=10.1 # or, for CUDA 10.2 conda install -c rapidsai -c nvidia -c numba -c conda-forge \ - cudf=0.18 python=3.7 cudatoolkit=10.2 + cudf=0.19 python=3.7 cudatoolkit=10.2 ``` For the nightly version of `cudf` : ```bash -# for CUDA 10.1 +# for CUDA 11.0 conda install -c rapidsai-nightly -c nvidia -c numba -c conda-forge \ - cudf python=3.7 cudatoolkit=10.1 + cudf python=3.7 cudatoolkit=11.0 -# or, for CUDA 10.2 +# or, for CUDA 11.2 conda install -c rapidsai-nightly -c nvidia -c numba -c conda-forge \ - cudf python=3.7 cudatoolkit=10.2 + cudf python=3.7 cudatoolkit=11.1 ``` Note: cuDF is supported only on Linux, and with Python versions 3.7 and later. diff --git a/ci/release/update-version.sh b/ci/release/update-version.sh index 819a0dcf6bf..78e85501796 100755 --- a/ci/release/update-version.sh +++ b/ci/release/update-version.sh @@ -47,11 +47,14 @@ function sed_runner() { } # cpp update -sed_runner 's/'"CUDA_DATAFRAME VERSION .* LANGUAGES"'/'"CUDA_DATAFRAME VERSION ${NEXT_FULL_TAG} LANGUAGES"'/g' cpp/CMakeLists.txt +sed_runner 's/'"CUDF VERSION .* LANGUAGES"'/'"CUDF VERSION ${NEXT_FULL_TAG} LANGUAGES"'/g' cpp/CMakeLists.txt # cpp libcudf_kafka update sed_runner 's/'"CUDA_KAFKA VERSION .* LANGUAGES"'/'"CUDA_KAFKA VERSION ${NEXT_FULL_TAG} LANGUAGES"'/g' cpp/libcudf_kafka/CMakeLists.txt +# cpp cudf_jni update +sed_runner 's/'"CUDF_JNI VERSION .* LANGUAGES"'/'"CUDF_JNI VERSION ${NEXT_FULL_TAG} LANGUAGES"'/g' java/src/main/native/CMakeLists.txt + # doxyfile update sed_runner 's/PROJECT_NUMBER = .*/PROJECT_NUMBER = '${NEXT_FULL_TAG}'/g' cpp/doxygen/Doxyfile diff --git a/conda/environments/cudf_dev_cuda10.1.yml b/conda/environments/cudf_dev_cuda10.1.yml index fa0b1126190..3c26dedda20 100644 --- a/conda/environments/cudf_dev_cuda10.1.yml +++ b/conda/environments/cudf_dev_cuda10.1.yml @@ -11,10 +11,10 @@ dependencies: - clang=8.0.1 - clang-tools=8.0.1 - cupy>7.1.0,<9.0.0a0 - - rmm=0.19.* + - rmm=0.20.* - cmake>=3.14 - cmake_setuptools>=0.1.3 - - python>=3.6,<3.8 + - python>=3.7,<3.9 - numba>=0.49.0,!=0.51.0 - numpy - pandas>=1.0,<1.3.0dev0 @@ -36,15 +36,14 @@ dependencies: - pandoc=<2.0.0 - cudatoolkit=10.1 - pip - - partd - flake8=3.8.3 - black=19.10 - isort=5.0.7 - mypy=0.782 - typing_extensions - pre_commit - - dask>=2021.3.1 - - distributed>=2.22.0 + - dask==2021.4.0 + - distributed>=2.22.0,<=2021.4.0 - streamz - dlpack - arrow-cpp=1.0.1 diff --git a/conda/environments/cudf_dev_cuda10.2.yml b/conda/environments/cudf_dev_cuda10.2.yml index 52d82c4f4ef..cc78894a99c 100644 --- a/conda/environments/cudf_dev_cuda10.2.yml +++ b/conda/environments/cudf_dev_cuda10.2.yml @@ -11,10 +11,10 @@ dependencies: - clang=8.0.1 - clang-tools=8.0.1 - cupy>7.1.0,<9.0.0a0 - - rmm=0.19.* + - rmm=0.20.* - cmake>=3.14 - cmake_setuptools>=0.1.3 - - python>=3.6,<3.8 + - python>=3.7,<3.9 - numba>=0.49,!=0.51.0 - numpy - pandas>=1.0,<1.3.0dev0 @@ -36,15 +36,14 @@ dependencies: - pandoc=<2.0.0 - cudatoolkit=10.2 - pip - - partd - flake8=3.8.3 - black=19.10 - isort=5.0.7 - mypy=0.782 - typing_extensions - pre_commit - - dask>=2021.3.1 - - distributed>=2.22.0 + - dask==2021.4.0 + - distributed>=2.22.0,<=2021.4.0 - streamz - dlpack - arrow-cpp=1.0.1 diff --git a/conda/environments/cudf_dev_cuda11.0.yml b/conda/environments/cudf_dev_cuda11.0.yml index 2e64365bdf6..10eb683657b 100644 --- a/conda/environments/cudf_dev_cuda11.0.yml +++ b/conda/environments/cudf_dev_cuda11.0.yml @@ -11,10 +11,10 @@ dependencies: - clang=8.0.1 - clang-tools=8.0.1 - cupy>7.1.0,<9.0.0a0 - - rmm=0.19.* + - rmm=0.20.* - cmake>=3.14 - cmake_setuptools>=0.1.3 - - python>=3.6,<3.8 + - python>=3.7,<3.9 - numba>=0.49,!=0.51.0 - numpy - pandas>=1.0,<1.3.0dev0 @@ -36,15 +36,14 @@ dependencies: - pandoc=<2.0.0 - cudatoolkit=11.0 - pip - - partd - flake8=3.8.3 - black=19.10 - isort=5.0.7 - mypy=0.782 - typing_extensions - pre_commit - - dask>=2021.3.1 - - distributed>=2.22.0 + - dask==2021.4.0 + - distributed>=2.22.0,<=2021.4.0 - streamz - dlpack - arrow-cpp=1.0.1 diff --git a/conda/environments/cudf_dev_cuda11.1.yml b/conda/environments/cudf_dev_cuda11.1.yml new file mode 100644 index 00000000000..30062e38021 --- /dev/null +++ b/conda/environments/cudf_dev_cuda11.1.yml @@ -0,0 +1,67 @@ +# Copyright (c) 2021, NVIDIA CORPORATION. + +name: cudf_dev +channels: + - rapidsai + - nvidia + - rapidsai-nightly + - conda-forge + - defaults +dependencies: + - clang=8.0.1 + - clang-tools=8.0.1 + - cupy>7.1.0,<9.0.0a0 + - rmm=0.20.* + - cmake>=3.14 + - cmake_setuptools>=0.1.3 + - python>=3.7,<3.9 + - numba>=0.49,!=0.51.0 + - numpy + - pandas>=1.0,<1.3.0dev0 + - pyarrow=1.0.1 + - fastavro>=0.22.9 + - notebook>=0.5.0 + - cython>=0.29,<0.30 + - fsspec>=0.6.0 + - pytest + - pytest-benchmark + - pytest-xdist + - sphinx + - sphinx_rtd_theme + - sphinxcontrib-websupport + - nbsphinx + - numpydoc + - ipython + - recommonmark + - pandoc=<2.0.0 + - cudatoolkit=11.1 + - pip + - flake8=3.8.3 + - black=19.10 + - isort=5.0.7 + - mypy=0.782 + - typing_extensions + - pre_commit + - dask==2021.4.0 + - distributed>=2.22.0,<=2021.4.0 + - streamz + - dlpack + - arrow-cpp=1.0.1 + - arrow-cpp-proc * cuda + - boost-cpp>=1.72.0 + - double-conversion + - rapidjson + - flatbuffers + - hypothesis + - sphinx-markdown-tables + - sphinx-copybutton + - mimesis + - packaging + - protobuf + - nvtx>=0.2.1 + - cachetools + - pip: + - git+https://github.com/dask/dask.git@main + - git+https://github.com/dask/distributed.git@main + - git+https://github.com/python-streamz/streamz.git + - pyorc diff --git a/conda/environments/cudf_dev_cuda11.2.yml b/conda/environments/cudf_dev_cuda11.2.yml new file mode 100644 index 00000000000..63821910790 --- /dev/null +++ b/conda/environments/cudf_dev_cuda11.2.yml @@ -0,0 +1,67 @@ +# Copyright (c) 2021, NVIDIA CORPORATION. + +name: cudf_dev +channels: + - rapidsai + - nvidia + - rapidsai-nightly + - conda-forge + - defaults +dependencies: + - clang=8.0.1 + - clang-tools=8.0.1 + - cupy>7.1.0,<9.0.0a0 + - rmm=0.20.* + - cmake>=3.14 + - cmake_setuptools>=0.1.3 + - python>=3.7,<3.9 + - numba>=0.49,!=0.51.0 + - numpy + - pandas>=1.0,<1.3.0dev0 + - pyarrow=1.0.1 + - fastavro>=0.22.9 + - notebook>=0.5.0 + - cython>=0.29,<0.30 + - fsspec>=0.6.0 + - pytest + - pytest-benchmark + - pytest-xdist + - sphinx + - sphinx_rtd_theme + - sphinxcontrib-websupport + - nbsphinx + - numpydoc + - ipython + - recommonmark + - pandoc=<2.0.0 + - cudatoolkit=11.2 + - pip + - flake8=3.8.3 + - black=19.10 + - isort=5.0.7 + - mypy=0.782 + - typing_extensions + - pre_commit + - dask==2021.4.0 + - distributed>=2.22.0,<=2021.4.0 + - streamz + - dlpack + - arrow-cpp=1.0.1 + - arrow-cpp-proc * cuda + - boost-cpp>=1.72.0 + - double-conversion + - rapidjson + - flatbuffers + - hypothesis + - sphinx-markdown-tables + - sphinx-copybutton + - mimesis + - packaging + - protobuf + - nvtx>=0.2.1 + - cachetools + - pip: + - git+https://github.com/dask/dask.git@main + - git+https://github.com/dask/distributed.git@main + - git+https://github.com/python-streamz/streamz.git + - pyorc diff --git a/conda/recipes/cudf/meta.yaml b/conda/recipes/cudf/meta.yaml index a119040bbcf..5635f54ba20 100644 --- a/conda/recipes/cudf/meta.yaml +++ b/conda/recipes/cudf/meta.yaml @@ -28,7 +28,7 @@ requirements: - numba >=0.49.0 - dlpack - pyarrow 1.0.1 - - libcudf {{ version }}=*_{{ GIT_DESCRIBE_NUMBER }} + - libcudf {{ version }} - rmm {{ minor_version }} - cudatoolkit {{ cuda_version }} run: diff --git a/conda/recipes/cudf_kafka/meta.yaml b/conda/recipes/cudf_kafka/meta.yaml index cc3f30091bf..0acd9ec4bb2 100644 --- a/conda/recipes/cudf_kafka/meta.yaml +++ b/conda/recipes/cudf_kafka/meta.yaml @@ -29,12 +29,12 @@ requirements: - python - cython >=0.29,<0.30 - setuptools - - cudf {{ version }}=*_{{ GIT_DESCRIBE_NUMBER }} - - libcudf_kafka {{ version }}=*_{{ GIT_DESCRIBE_NUMBER }} + - cudf {{ version }} + - libcudf_kafka {{ version }} run: - - libcudf_kafka {{ version }}=*_{{ GIT_DESCRIBE_NUMBER }} + - libcudf_kafka {{ version }} - python-confluent-kafka - - cudf {{ version }}=*_{{ GIT_DESCRIBE_NUMBER }} + - cudf {{ version }} test: requires: diff --git a/conda/recipes/custreamz/meta.yaml b/conda/recipes/custreamz/meta.yaml index 8edca7a51d0..f65b3cafbd7 100644 --- a/conda/recipes/custreamz/meta.yaml +++ b/conda/recipes/custreamz/meta.yaml @@ -23,15 +23,15 @@ requirements: host: - python - python-confluent-kafka - - cudf_kafka {{ version }}=*_{{ GIT_DESCRIBE_NUMBER }} + - cudf_kafka {{ version }} run: - python - - streamz - - cudf {{ version }}=*_{{ GIT_DESCRIBE_NUMBER }} - - dask >=2.22.0 - - distributed >=2.22.0 + - streamz + - cudf {{ version }} + - dask >=2.22.0,<=2021.4.0 + - distributed >=2.22.0,<=2021.4.0 - python-confluent-kafka - - cudf_kafka {{ version }}=*_{{ GIT_DESCRIBE_NUMBER }} + - cudf_kafka {{ version }} test: requires: diff --git a/conda/recipes/dask-cudf/meta.yaml b/conda/recipes/dask-cudf/meta.yaml index a8768e26056..8b503840b34 100644 --- a/conda/recipes/dask-cudf/meta.yaml +++ b/conda/recipes/dask-cudf/meta.yaml @@ -22,15 +22,15 @@ build: requirements: host: - python - - cudf {{ version }}=*_{{ GIT_DESCRIBE_NUMBER }} - - dask>=2021.3.1 - - distributed >=2.22.0 + - cudf {{ version }} + - dask==2021.4.0 + - distributed >=2.22.0,<=2021.4.0 run: - python - - cudf {{ version }}=*_{{ GIT_DESCRIBE_NUMBER }} - - dask>=2021.3.1 - - distributed >=2.22.0 - + - cudf {{ version }} + - dask==2021.4.0 + - distributed >=2.22.0,<=2021.4.0 + test: requires: - cudatoolkit {{ cuda_version }}.* diff --git a/conda/recipes/libcudf/meta.yaml b/conda/recipes/libcudf/meta.yaml index 39587b4bd05..75955428eab 100644 --- a/conda/recipes/libcudf/meta.yaml +++ b/conda/recipes/libcudf/meta.yaml @@ -178,12 +178,14 @@ test: - test -f $PREFIX/include/cudf/strings/detail/converters.hpp - test -f $PREFIX/include/cudf/strings/detail/copying.hpp - test -f $PREFIX/include/cudf/strings/detail/fill.hpp + - test -f $PREFIX/include/cudf/strings/detail/json.hpp - test -f $PREFIX/include/cudf/strings/detail/replace.hpp - test -f $PREFIX/include/cudf/strings/detail/utilities.hpp - test -f $PREFIX/include/cudf/strings/extract.hpp - test -f $PREFIX/include/cudf/strings/findall.hpp - test -f $PREFIX/include/cudf/strings/find.hpp - test -f $PREFIX/include/cudf/strings/find_multiple.hpp + - test -f $PREFIX/include/cudf/strings/json.hpp - test -f $PREFIX/include/cudf/strings/padding.hpp - test -f $PREFIX/include/cudf/strings/replace.hpp - test -f $PREFIX/include/cudf/strings/replace_re.hpp diff --git a/conda/recipes/libcudf_kafka/meta.yaml b/conda/recipes/libcudf_kafka/meta.yaml index 81ff922b8d7..5348ec471e9 100644 --- a/conda/recipes/libcudf_kafka/meta.yaml +++ b/conda/recipes/libcudf_kafka/meta.yaml @@ -25,7 +25,7 @@ requirements: build: - cmake >=3.17.0 host: - - libcudf {{ version }}=*_{{ GIT_DESCRIBE_NUMBER }} + - libcudf {{ version }} - librdkafka >=1.5.0,<1.5.3 run: - {{ pin_compatible('librdkafka', max_pin='x.x') }} #TODO: librdkafka should be automatically included here by run_exports but is not diff --git a/cpp/CMakeLists.txt b/cpp/CMakeLists.txt index 5cd82e52180..453707e4559 100644 --- a/cpp/CMakeLists.txt +++ b/cpp/CMakeLists.txt @@ -28,7 +28,7 @@ elseif(CMAKE_CUDA_ARCHITECTURES STREQUAL "") set(CUDF_BUILD_FOR_DETECTED_ARCHS TRUE) endif() -project(CUDF VERSION 0.19.0 LANGUAGES C CXX) +project(CUDF VERSION 0.20.0 LANGUAGES C CXX) # Needed because GoogleBenchmark changes the state of FindThreads.cmake, # causing subsequent runs to have different values for the `Threads::Threads` target. @@ -137,8 +137,8 @@ include(cmake/thirdparty/CUDF_GetDLPack.cmake) include(cmake/thirdparty/CUDF_GetLibcudacxx.cmake) # find or install GoogleTest include(cmake/thirdparty/CUDF_GetGTest.cmake) -# Stringify libcudf and libcudacxx headers used in JIT operations -include(cmake/Modules/StringifyJITHeaders.cmake) +# preprocess jitify-able kernels +include(cmake/Modules/JitifyPreprocessKernels.cmake) # find cuFile include(cmake/Modules/FindcuFile.cmake) @@ -153,9 +153,6 @@ add_library(cudf src/ast/transform.cu src/binaryop/binaryop.cpp src/binaryop/compiled/binary_ops.cu - src/binaryop/jit/code/kernel.cpp - src/binaryop/jit/code/operation.cpp - src/binaryop/jit/code/traits.cpp src/labeling/label_bins.cu src/bitmask/null_mask.cu src/column/column.cu @@ -256,7 +253,6 @@ add_library(cudf src/io/utilities/parsing_utils.cu src/io/utilities/type_conversion.cpp src/jit/cache.cpp - src/jit/launcher.cpp src/jit/parser.cpp src/jit/type.cpp src/join/cross_join.cu @@ -302,8 +298,6 @@ add_library(cudf src/reshape/interleave_columns.cu src/reshape/tile.cu src/rolling/grouped_rolling.cu - src/rolling/jit/code/kernel.cpp - src/rolling/jit/code/operation.cpp src/rolling/rolling.cu src/round/round.cu src/scalar/scalar.cpp @@ -346,6 +340,7 @@ add_library(cudf src/strings/find.cu src/strings/find_multiple.cu src/strings/padding.cu + src/strings/json/json_path.cu src/strings/regex/regcomp.cpp src/strings/regex/regexec.cu src/strings/replace/backref_re.cu @@ -386,7 +381,6 @@ add_library(cudf src/text/tokenize.cu src/transform/bools_to_mask.cu src/transform/encode.cu - src/transform/jit/code/kernel.cpp src/transform/mask_to_bools.cu src/transform/nans_to_nulls.cu src/transform/row_bit_count.cu @@ -401,10 +395,11 @@ add_library(cudf set_target_properties(cudf PROPERTIES BUILD_RPATH "\$ORIGIN" + INSTALL_RPATH "\$ORIGIN" # set target compile options - CXX_STANDARD 14 + CXX_STANDARD 17 CXX_STANDARD_REQUIRED ON - CUDA_STANDARD 14 + CUDA_STANDARD 17 CUDA_STANDARD_REQUIRED ON POSITION_INDEPENDENT_CODE ON INTERFACE_POSITION_INDEPENDENT_CODE ON @@ -464,7 +459,7 @@ endif() target_compile_definitions(cudf PUBLIC "SPDLOG_ACTIVE_LEVEL=SPDLOG_LEVEL_${RMM_LOGGING_LEVEL}") # Compile stringified JIT sources first -add_dependencies(cudf stringify_run) +add_dependencies(cudf jitify_preprocess_run) # Specify the target module library dependencies target_link_libraries(cudf @@ -475,9 +470,15 @@ target_link_libraries(cudf rmm::rmm) if(CUDA_STATIC_RUNTIME) - target_link_libraries(cudf PUBLIC CUDA::nvrtc CUDA::cudart_static CUDA::cuda_driver) + # Tell CMake what CUDA language runtime to use + set_target_properties(cudf PROPERTIES CUDA_RUNTIME_LIBRARY Static) + # Make sure to export to consumers what runtime we used + target_link_libraries(cudf PUBLIC CUDA::cudart_static CUDA::cuda_driver) else() - target_link_libraries(cudf PUBLIC CUDA::nvrtc CUDA::cudart CUDA::cuda_driver) + # Tell CMake what CUDA language runtime to use + set_target_properties(cudf PROPERTIES CUDA_RUNTIME_LIBRARY Shared) + # Make sure to export to consumers what runtime we used + target_link_libraries(cudf PUBLIC CUDA::cudart CUDA::cuda_driver) endif() # Add cuFile interface if available @@ -516,7 +517,7 @@ target_compile_options(cudftestutil ) target_compile_features(cudftestutil - PUBLIC cxx_std_14 $) + PUBLIC cxx_std_17 $) target_link_libraries(cudftestutil PUBLIC GTest::gmock @@ -582,7 +583,14 @@ install(DIRECTORY DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/libcudf) install(DIRECTORY ${Thrust_SOURCE_DIR}/ - DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/libcudf/Thrust) + DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/libcudf/Thrust + PATTERN "*.py" EXCLUDE + PATTERN "benchmark" EXCLUDE + PATTERN "build" EXCLUDE + PATTERN "doc" EXCLUDE + PATTERN "examples" EXCLUDE + PATTERN "test" EXCLUDE + PATTERN "testing" EXCLUDE) include(CMakePackageConfigHelpers) diff --git a/cpp/benchmarks/CMakeLists.txt b/cpp/benchmarks/CMakeLists.txt index 5aa7e0132f8..78cb35865e9 100644 --- a/cpp/benchmarks/CMakeLists.txt +++ b/cpp/benchmarks/CMakeLists.txt @@ -17,7 +17,7 @@ find_package(Threads REQUIRED) add_library(cudf_datagen STATIC common/generate_benchmark_input.cpp) -target_compile_features(cudf_datagen PUBLIC cxx_std_14 cuda_std_14) +target_compile_features(cudf_datagen PUBLIC cxx_std_17 cuda_std_17) target_compile_options(cudf_datagen PUBLIC "$<$:${CUDF_CXX_FLAGS}>" @@ -202,3 +202,8 @@ ConfigureBench(STRINGS_BENCH string/substring_benchmark.cpp string/translate_benchmark.cpp string/url_decode_benchmark.cpp) + +################################################################################################### +# - json benchmark ------------------------------------------------------------------- +ConfigureBench(JSON_BENCH + string/json_benchmark.cpp) diff --git a/cpp/benchmarks/column/concatenate_benchmark.cpp b/cpp/benchmarks/column/concatenate_benchmark.cpp index b04cfba7d07..3634b2f08a2 100644 --- a/cpp/benchmarks/column/concatenate_benchmark.cpp +++ b/cpp/benchmarks/column/concatenate_benchmark.cpp @@ -1,5 +1,5 @@ /* - * Copyright (c) 2020, NVIDIA CORPORATION. + * Copyright (c) 2020-2021, NVIDIA CORPORATION. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -62,7 +62,7 @@ static void BM_concatenate(benchmark::State& state) CHECK_CUDA(0); for (auto _ : state) { - cuda_event_timer raii(state, true, 0); + cuda_event_timer raii(state, true, rmm::cuda_stream_default); auto result = cudf::concatenate(column_views); } @@ -124,7 +124,7 @@ static void BM_concatenate_tables(benchmark::State& state) CHECK_CUDA(0); for (auto _ : state) { - cuda_event_timer raii(state, true, 0); + cuda_event_timer raii(state, true, rmm::cuda_stream_default); auto result = cudf::concatenate(table_views); } @@ -184,7 +184,7 @@ static void BM_concatenate_strings(benchmark::State& state) CHECK_CUDA(0); for (auto _ : state) { - cuda_event_timer raii(state, true, 0); + cuda_event_timer raii(state, true, rmm::cuda_stream_default); auto result = cudf::concatenate(column_views); } diff --git a/cpp/benchmarks/join/join_benchmark.cu b/cpp/benchmarks/join/join_benchmark.cu index fa6afdd908c..d1c11696ddd 100644 --- a/cpp/benchmarks/join/join_benchmark.cu +++ b/cpp/benchmarks/join/join_benchmark.cu @@ -1,5 +1,5 @@ /* - * Copyright (c) 2019, NVIDIA CORPORATION. + * Copyright (c) 2019-2021, NVIDIA CORPORATION. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -103,7 +103,7 @@ static void BM_join(benchmark::State &state) // Benchmark the inner join operation for (auto _ : state) { - cuda_event_timer raii(state, true, 0); + cuda_event_timer raii(state, true, rmm::cuda_stream_default); auto result = cudf::inner_join( probe_table, build_table, columns_to_join, columns_to_join, cudf::null_equality::UNEQUAL); diff --git a/cpp/benchmarks/sort/sort_benchmark.cpp b/cpp/benchmarks/sort/sort_benchmark.cpp index fb74469e7c0..fe68ddd0051 100644 --- a/cpp/benchmarks/sort/sort_benchmark.cpp +++ b/cpp/benchmarks/sort/sort_benchmark.cpp @@ -61,7 +61,7 @@ static void BM_sort(benchmark::State& state, bool nulls) auto input = cudf::table_view(column_views); for (auto _ : state) { - cuda_event_timer raii(state, true, 0); + cuda_event_timer raii(state, true, rmm::cuda_stream_default); auto result = (stable) ? cudf::stable_sorted_order(input) : cudf::sorted_order(input); } diff --git a/cpp/benchmarks/sort/sort_strings_benchmark.cpp b/cpp/benchmarks/sort/sort_strings_benchmark.cpp index 54e85b7ea8c..f5effcafcfb 100644 --- a/cpp/benchmarks/sort/sort_strings_benchmark.cpp +++ b/cpp/benchmarks/sort/sort_strings_benchmark.cpp @@ -1,5 +1,5 @@ /* - * Copyright (c) 2020, NVIDIA CORPORATION. + * Copyright (c) 2020-2021, NVIDIA CORPORATION. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -32,7 +32,7 @@ static void BM_sort(benchmark::State& state) auto const table = create_random_table({cudf::type_id::STRING}, 1, row_count{n_rows}); for (auto _ : state) { - cuda_event_timer raii(state, true, 0); + cuda_event_timer raii(state, true, rmm::cuda_stream_default); cudf::sort(table->view()); } } diff --git a/cpp/benchmarks/string/case_benchmark.cpp b/cpp/benchmarks/string/case_benchmark.cpp index 9c1c81da22a..508ae49e093 100644 --- a/cpp/benchmarks/string/case_benchmark.cpp +++ b/cpp/benchmarks/string/case_benchmark.cpp @@ -32,7 +32,7 @@ static void BM_case(benchmark::State& state) cudf::strings_column_view input(table->view().column(0)); for (auto _ : state) { - cuda_event_timer raii(state, true, 0); + cuda_event_timer raii(state, true, rmm::cuda_stream_default); cudf::strings::to_lower(input); } diff --git a/cpp/benchmarks/string/combine_benchmark.cpp b/cpp/benchmarks/string/combine_benchmark.cpp index 2a5013a9ae7..7dabd32e874 100644 --- a/cpp/benchmarks/string/combine_benchmark.cpp +++ b/cpp/benchmarks/string/combine_benchmark.cpp @@ -43,7 +43,7 @@ static void BM_combine(benchmark::State& state) cudf::string_scalar separator("+"); for (auto _ : state) { - cuda_event_timer raii(state, true, 0); + cuda_event_timer raii(state, true, rmm::cuda_stream_default); cudf::strings::concatenate(table->view(), separator); } diff --git a/cpp/benchmarks/string/contains_benchmark.cpp b/cpp/benchmarks/string/contains_benchmark.cpp index 1a2ac8ad602..79bdda77634 100644 --- a/cpp/benchmarks/string/contains_benchmark.cpp +++ b/cpp/benchmarks/string/contains_benchmark.cpp @@ -35,7 +35,7 @@ static void BM_contains(benchmark::State& state, contains_type ct) cudf::strings_column_view input(table->view().column(0)); for (auto _ : state) { - cuda_event_timer raii(state, true, 0); + cuda_event_timer raii(state, true, rmm::cuda_stream_default); // contains_re(), matches_re(), and count_re() all have similar functions // with count_re() being the most regex intensive switch (ct) { diff --git a/cpp/benchmarks/string/copy_benchmark.cpp b/cpp/benchmarks/string/copy_benchmark.cpp index af9f5b4fa4a..b49bc878ca7 100644 --- a/cpp/benchmarks/string/copy_benchmark.cpp +++ b/cpp/benchmarks/string/copy_benchmark.cpp @@ -54,7 +54,7 @@ static void BM_copy(benchmark::State& state, copy_type ct) host_map_data.end()); for (auto _ : state) { - cuda_event_timer raii(state, true, 0); + cuda_event_timer raii(state, true, rmm::cuda_stream_default); switch (ct) { case gather: cudf::gather(source->view(), index_map); break; case scatter: cudf::scatter(source->view(), index_map, target->view()); break; diff --git a/cpp/benchmarks/string/extract_benchmark.cpp b/cpp/benchmarks/string/extract_benchmark.cpp index dbae18dde3b..aa1e59a22bf 100644 --- a/cpp/benchmarks/string/extract_benchmark.cpp +++ b/cpp/benchmarks/string/extract_benchmark.cpp @@ -14,6 +14,8 @@ * limitations under the License. */ +#include "string_bench_args.hpp" + #include #include #include @@ -23,43 +25,55 @@ #include #include -#include "string_bench_args.hpp" +#include class StringExtract : public cudf::benchmark { }; -static void BM_extract(benchmark::State& state, int re_instructions) +static void BM_extract(benchmark::State& state, int groups) { - cudf::size_type const n_rows{static_cast(state.range(0))}; - cudf::size_type const max_str_length{static_cast(state.range(1))}; - data_profile table_profile; - table_profile.set_distribution_params( - cudf::type_id::STRING, distribution_id::NORMAL, 0, max_str_length); - auto const table = - create_random_table({cudf::type_id::STRING}, 1, row_count{n_rows}, table_profile); - cudf::strings_column_view input(table->view().column(0)); - std::string const raw_pattern = - "1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234" - "5678901234567890123456789012345678901234567890"; - std::string const pattern = "(" + raw_pattern.substr(0, re_instructions) + ")"; + auto const n_rows = static_cast(state.range(0)); + auto const n_length = static_cast(state.range(1)); + + std::default_random_engine generator; + std::uniform_int_distribution words_dist(0, 999); + + std::vector samples(100); // 100 unique rows of data to reuse + std::generate(samples.begin(), samples.end(), [&]() { + std::string row; // build a row of random tokens + while (static_cast(row.size()) < n_length) { + row += std::to_string(words_dist(generator)) + " "; + } + return row; + }); + + std::string pattern; + while (static_cast(pattern.size()) < groups) { pattern += "(\\d+) "; } + + std::uniform_int_distribution distribution(0, samples.size() - 1); + auto elements = cudf::detail::make_counting_transform_iterator( + 0, [&](auto idx) { return samples.at(distribution(generator)); }); + cudf::test::strings_column_wrapper input(elements, elements + n_rows); + cudf::strings_column_view view(input); for (auto _ : state) { - cuda_event_timer raii(state, true, 0); - auto results = cudf::strings::extract(input, pattern); + cuda_event_timer raii(state, true); + auto results = cudf::strings::extract(view, pattern); } - state.SetBytesProcessed(state.iterations() * input.chars_size()); + state.SetBytesProcessed(state.iterations() * view.chars_size()); } static void generate_bench_args(benchmark::internal::Benchmark* b) { - int const min_rows = 1 << 12; - int const max_rows = 1 << 24; - int const row_mult = 8; - int const min_rowlen = 1 << 5; - int const max_rowlen = 1 << 13; - int const len_mult = 4; - generate_string_bench_args(b, min_rows, max_rows, row_mult, min_rowlen, max_rowlen, len_mult); + int const min_rows = 1 << 12; + int const max_rows = 1 << 24; + int const row_multiplier = 8; + int const min_row_length = 1 << 5; + int const max_row_length = 1 << 13; + int const length_multiplier = 4; + generate_string_bench_args( + b, min_rows, max_rows, row_multiplier, min_row_length, max_row_length, length_multiplier); } #define STRINGS_BENCHMARK_DEFINE(name, instructions) \ @@ -70,6 +84,6 @@ static void generate_bench_args(benchmark::internal::Benchmark* b) ->UseManualTime() \ ->Unit(benchmark::kMillisecond); -STRINGS_BENCHMARK_DEFINE(small, 4) -STRINGS_BENCHMARK_DEFINE(medium, 48) -STRINGS_BENCHMARK_DEFINE(large, 128) +STRINGS_BENCHMARK_DEFINE(small, 2) +STRINGS_BENCHMARK_DEFINE(medium, 10) +STRINGS_BENCHMARK_DEFINE(large, 30) diff --git a/cpp/benchmarks/string/factory_benchmark.cu b/cpp/benchmarks/string/factory_benchmark.cu index 6c5dceffaa8..802ca949976 100644 --- a/cpp/benchmarks/string/factory_benchmark.cu +++ b/cpp/benchmarks/string/factory_benchmark.cu @@ -63,7 +63,7 @@ static void BM_factory(benchmark::State& state) string_view_to_pair{}); for (auto _ : state) { - cuda_event_timer raii(state, true, 0); + cuda_event_timer raii(state, true, rmm::cuda_stream_default); cudf::make_strings_column(pairs); } diff --git a/cpp/benchmarks/string/filter_benchmark.cpp b/cpp/benchmarks/string/filter_benchmark.cpp index 123c5597df9..d510ca9baed 100644 --- a/cpp/benchmarks/string/filter_benchmark.cpp +++ b/cpp/benchmarks/string/filter_benchmark.cpp @@ -50,7 +50,7 @@ static void BM_filter_chars(benchmark::State& state, FilterAPI api) {cudf::char_utf8{'a'}, cudf::char_utf8{'c'}}}; for (auto _ : state) { - cuda_event_timer raii(state, true, 0); + cuda_event_timer raii(state, true, rmm::cuda_stream_default); switch (api) { case filter: cudf::strings::filter_characters_of_type(input, types); break; case filter_chars: cudf::strings::filter_characters(input, filter_table); break; diff --git a/cpp/benchmarks/string/find_benchmark.cpp b/cpp/benchmarks/string/find_benchmark.cpp index 200527d606e..fd7c515eb0b 100644 --- a/cpp/benchmarks/string/find_benchmark.cpp +++ b/cpp/benchmarks/string/find_benchmark.cpp @@ -46,7 +46,7 @@ static void BM_find_scalar(benchmark::State& state, FindAPI find_api) cudf::test::strings_column_wrapper targets({"+", "-"}); for (auto _ : state) { - cuda_event_timer raii(state, true, 0); + cuda_event_timer raii(state, true, rmm::cuda_stream_default); switch (find_api) { case find: cudf::strings::find(input, target); break; case find_multi: diff --git a/cpp/benchmarks/string/json_benchmark.cpp b/cpp/benchmarks/string/json_benchmark.cpp new file mode 100644 index 00000000000..6fb6a07a8d0 --- /dev/null +++ b/cpp/benchmarks/string/json_benchmark.cpp @@ -0,0 +1,140 @@ +/* + * Copyright (c) 2021, NVIDIA CORPORATION. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include +#include +#include +#include + +#include +#include + +#include +#include + +class JsonPath : public cudf::benchmark { +}; + +float frand() { return static_cast(rand()) / static_cast(RAND_MAX); } + +int rand_range(int min, int max) { return min + static_cast(frand() * (max - min)); } + +std::vector Books{ + "{\n\"category\": \"reference\",\n\"author\": \"Nigel Rees\",\n\"title\": \"Sayings of the " + "Century\",\n\"price\": 8.95\n}", + "{\n\"category\": \"fiction\",\n\"author\": \"Evelyn Waugh\",\n\"title\": \"Sword of " + "Honour\",\n\"price\": 12.99\n}", + "{\n\"category\": \"fiction\",\n\"author\": \"Herman Melville\",\n\"title\": \"Moby " + "Dick\",\n\"isbn\": \"0-553-21311-3\",\n\"price\": 8.99\n}", + "{\n\"category\": \"fiction\",\n\"author\": \"J. R. R. Tolkien\",\n\"title\": \"The Lord of the " + "Rings\",\n\"isbn\": \"0-395-19395-8\",\n\"price\": 22.99\n}"}; +constexpr int Approx_book_size = 110; +std::vector Bicycles{ + "{\"color\": \"red\", \"price\": 9.95}", + "{\"color\": \"green\", \"price\": 29.95}", + "{\"color\": \"blue\", \"price\": 399.95}", + "{\"color\": \"yellow\", \"price\": 99.95}", + "{\"color\": \"mauve\", \"price\": 199.95}", +}; +constexpr int Approx_bicycle_size = 33; +std::string Misc{"\n\"expensive\": 10\n"}; +std::string generate_field(std::vector const& values, int num_values) +{ + std::string res; + for (int idx = 0; idx < num_values; idx++) { + if (idx > 0) { res += std::string(",\n"); } + int vindex = std::min(static_cast(floor(frand() * values.size())), + static_cast(values.size() - 1)); + res += values[vindex]; + } + return res; +} + +std::string build_row(int desired_bytes) +{ + // always have at least 2 books and 2 bikes + int num_books = 2; + int num_bicycles = 2; + int remaining_bytes = + desired_bytes - ((num_books * Approx_book_size) + (num_bicycles * Approx_bicycle_size)); + + // divide up the remainder between books and bikes + float book_pct = frand(); + float bicycle_pct = 1.0f - book_pct; + num_books += (remaining_bytes * book_pct) / Approx_book_size; + num_bicycles += (remaining_bytes * bicycle_pct) / Approx_bicycle_size; + + std::string books = "\"book\": [\n" + generate_field(Books, num_books) + "]\n"; + std::string bicycles = "\"bicycle\": [\n" + generate_field(Bicycles, num_bicycles) + "]\n"; + + std::string store = "\"store\": {\n"; + if (frand() <= 0.5f) { + store += books + std::string(",\n") + bicycles; + } else { + store += bicycles + std::string(",\n") + books; + } + store += std::string("}\n"); + + std::string row = std::string("{\n"); + if (frand() <= 0.5f) { + row += store + std::string(",\n") + Misc; + } else { + row += Misc + std::string(",\n") + store; + } + row += std::string("}\n"); + return row; +} + +template +static void BM_case(benchmark::State& state, QueryArg&&... query_arg) +{ + srand(5236); + auto iter = thrust::make_transform_iterator( + thrust::make_counting_iterator(0), + [desired_bytes = state.range(1)](int index) { return build_row(desired_bytes); }); + int num_rows = state.range(0); + cudf::test::strings_column_wrapper input(iter, iter + num_rows); + cudf::strings_column_view scv(input); + size_t num_chars = scv.chars().size(); + + std::string json_path(query_arg...); + + for (auto _ : state) { + cuda_event_timer raii(state, true, 0); + auto result = cudf::strings::get_json_object(scv, json_path); + cudaStreamSynchronize(0); + } + + // this isn't strictly 100% accurate. a given query isn't necessarily + // going to visit every single incoming character. but in spirit it does. + state.SetBytesProcessed(state.iterations() * num_chars); +} + +#define JSON_BENCHMARK_DEFINE(name, query) \ + BENCHMARK_CAPTURE(BM_case, name, query) \ + ->ArgsProduct({{100, 1000, 100000, 400000}, {300, 600, 4096}}) \ + ->UseManualTime() \ + ->Unit(benchmark::kMillisecond); + +JSON_BENCHMARK_DEFINE(query0, "$"); +JSON_BENCHMARK_DEFINE(query1, "$.store"); +JSON_BENCHMARK_DEFINE(query2, "$.store.book"); +JSON_BENCHMARK_DEFINE(query3, "$.store.*"); +JSON_BENCHMARK_DEFINE(query4, "$.store.book[*]"); +JSON_BENCHMARK_DEFINE(query5, "$.store.book[*].category"); +JSON_BENCHMARK_DEFINE(query6, "$.store['bicycle']"); +JSON_BENCHMARK_DEFINE(query7, "$.store.book[*]['isbn']"); +JSON_BENCHMARK_DEFINE(query8, "$.store.bicycle[1]"); diff --git a/cpp/benchmarks/string/replace_benchmark.cpp b/cpp/benchmarks/string/replace_benchmark.cpp index 968b8f5abb0..0d785fd25aa 100644 --- a/cpp/benchmarks/string/replace_benchmark.cpp +++ b/cpp/benchmarks/string/replace_benchmark.cpp @@ -49,7 +49,7 @@ static void BM_replace(benchmark::State& state, replace_type rt) cudf::test::strings_column_wrapper repls({"", ""}); for (auto _ : state) { - cuda_event_timer raii(state, true, 0); + cuda_event_timer raii(state, true, rmm::cuda_stream_default); switch (rt) { case scalar: cudf::strings::replace(input, target, repl); break; case slice: cudf::strings::replace_slice(input, repl, 1, 10); break; diff --git a/cpp/benchmarks/string/replace_re_benchmark.cpp b/cpp/benchmarks/string/replace_re_benchmark.cpp index 616e2c0f22c..18ec28371e3 100644 --- a/cpp/benchmarks/string/replace_re_benchmark.cpp +++ b/cpp/benchmarks/string/replace_re_benchmark.cpp @@ -43,7 +43,7 @@ static void BM_replace(benchmark::State& state, replace_type rt) cudf::test::strings_column_wrapper repls({"#", ""}); for (auto _ : state) { - cuda_event_timer raii(state, true, 0); + cuda_event_timer raii(state, true, rmm::cuda_stream_default); switch (rt) { case replace_type::replace_re: // contains_re and matches_re use the same main logic cudf::strings::replace_re(input, "\\d+"); diff --git a/cpp/benchmarks/string/split_benchmark.cpp b/cpp/benchmarks/string/split_benchmark.cpp index 35bedb1b767..0494fba7642 100644 --- a/cpp/benchmarks/string/split_benchmark.cpp +++ b/cpp/benchmarks/string/split_benchmark.cpp @@ -44,7 +44,7 @@ static void BM_split(benchmark::State& state, split_type rt) cudf::string_scalar target("+"); for (auto _ : state) { - cuda_event_timer raii(state, true, 0); + cuda_event_timer raii(state, true, rmm::cuda_stream_default); switch (rt) { case split: cudf::strings::split(input, target); break; case split_ws: cudf::strings::split(input); break; diff --git a/cpp/benchmarks/string/substring_benchmark.cpp b/cpp/benchmarks/string/substring_benchmark.cpp index d47c42e45be..e8a66f7b323 100644 --- a/cpp/benchmarks/string/substring_benchmark.cpp +++ b/cpp/benchmarks/string/substring_benchmark.cpp @@ -54,7 +54,7 @@ static void BM_substring(benchmark::State& state, substring_type rt) cudf::test::strings_column_wrapper delimiters(delim_itr, delim_itr + n_rows); for (auto _ : state) { - cuda_event_timer raii(state, true, 0); + cuda_event_timer raii(state, true, rmm::cuda_stream_default); switch (rt) { case position: cudf::strings::slice_strings(input, 1, max_str_length / 2); break; case multi_position: cudf::strings::slice_strings(input, starts, stops); break; diff --git a/cpp/benchmarks/string/translate_benchmark.cpp b/cpp/benchmarks/string/translate_benchmark.cpp index c49a986d744..49396b0ce71 100644 --- a/cpp/benchmarks/string/translate_benchmark.cpp +++ b/cpp/benchmarks/string/translate_benchmark.cpp @@ -54,7 +54,7 @@ static void BM_translate(benchmark::State& state, int entry_count) }); for (auto _ : state) { - cuda_event_timer raii(state, true, 0); + cuda_event_timer raii(state, true, rmm::cuda_stream_default); cudf::strings::translate(input, entries); } diff --git a/cpp/benchmarks/string/url_decode_benchmark.cpp b/cpp/benchmarks/string/url_decode_benchmark.cpp index 26c23ea23b4..fbb99bf3e8f 100644 --- a/cpp/benchmarks/string/url_decode_benchmark.cpp +++ b/cpp/benchmarks/string/url_decode_benchmark.cpp @@ -80,7 +80,7 @@ void BM_url_decode(benchmark::State& state) auto strings_view = cudf::strings_column_view(column); for (auto _ : state) { - cuda_event_timer raii(state, true, 0); + cuda_event_timer raii(state, true, rmm::cuda_stream_default); auto result = cudf::strings::url_decode(strings_view); } diff --git a/cpp/benchmarks/text/normalize_benchmark.cpp b/cpp/benchmarks/text/normalize_benchmark.cpp index 32c4fb7dcde..bb872fee0b3 100644 --- a/cpp/benchmarks/text/normalize_benchmark.cpp +++ b/cpp/benchmarks/text/normalize_benchmark.cpp @@ -41,7 +41,7 @@ static void BM_normalize(benchmark::State& state, bool to_lower) cudf::strings_column_view input(table->view().column(0)); for (auto _ : state) { - cuda_event_timer raii(state, true, 0); + cuda_event_timer raii(state, true, rmm::cuda_stream_default); nvtext::normalize_characters(input, to_lower); } diff --git a/cpp/benchmarks/text/normalize_spaces_benchmark.cpp b/cpp/benchmarks/text/normalize_spaces_benchmark.cpp index dcabb0c225c..6260bb02c55 100644 --- a/cpp/benchmarks/text/normalize_spaces_benchmark.cpp +++ b/cpp/benchmarks/text/normalize_spaces_benchmark.cpp @@ -42,7 +42,7 @@ static void BM_normalize(benchmark::State& state) cudf::strings_column_view input(table->view().column(0)); for (auto _ : state) { - cuda_event_timer raii(state, true, 0); + cuda_event_timer raii(state, true, rmm::cuda_stream_default); nvtext::normalize_spaces(input); } diff --git a/cpp/benchmarks/text/tokenize_benchmark.cpp b/cpp/benchmarks/text/tokenize_benchmark.cpp index f9e742f0f31..7bb84e11a4a 100644 --- a/cpp/benchmarks/text/tokenize_benchmark.cpp +++ b/cpp/benchmarks/text/tokenize_benchmark.cpp @@ -46,7 +46,7 @@ static void BM_tokenize(benchmark::State& state, tokenize_type tt) cudf::test::strings_column_wrapper delimiters({" ", "+", "-"}); for (auto _ : state) { - cuda_event_timer raii(state, true, 0); + cuda_event_timer raii(state, true, rmm::cuda_stream_default); switch (tt) { case tokenize_type::single: nvtext::tokenize(input); break; case tokenize_type::multi: diff --git a/cpp/cmake/Modules/FindcuFile.cmake b/cpp/cmake/Modules/FindcuFile.cmake index 4f67e186f42..880ad773369 100644 --- a/cpp/cmake/Modules/FindcuFile.cmake +++ b/cpp/cmake/Modules/FindcuFile.cmake @@ -62,6 +62,7 @@ find_path(cuFile_INCLUDE_DIR cufile.h HINTS ${PKG_cuFile_INCLUDE_DIRS} + /usr/local/cuda/include /usr/local/cuda/lib64 ) diff --git a/cpp/cmake/Modules/JitifyPreprocessKernels.cmake b/cpp/cmake/Modules/JitifyPreprocessKernels.cmake new file mode 100644 index 00000000000..eb1ade61440 --- /dev/null +++ b/cpp/cmake/Modules/JitifyPreprocessKernels.cmake @@ -0,0 +1,66 @@ +#============================================================================= +# Copyright (c) 2021, NVIDIA CORPORATION. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +#============================================================================= + +# Create `jitify_preprocess` executable +add_executable(jitify_preprocess "${JITIFY_INCLUDE_DIR}/jitify2_preprocess.cpp") + +target_link_libraries(jitify_preprocess CUDA::cudart ${CMAKE_DL_LIBS}) + +function(jit_preprocess_files) + cmake_parse_arguments(ARG + "" + "SOURCE_DIRECTORY" + "FILES" + ${ARGN} + ) + + foreach(ARG_FILE ${ARG_FILES}) + set(ARG_OUTPUT ${CUDF_GENERATED_INCLUDE_DIR}/include/jit_preprocessed_files/${ARG_FILE}.jit.hpp) + get_filename_component(jit_output_directory "${ARG_OUTPUT}" DIRECTORY ) + list(APPEND JIT_PREPROCESSED_FILES "${ARG_OUTPUT}") + add_custom_command(WORKING_DIRECTORY ${ARG_SOURCE_DIRECTORY} + DEPENDS jitify_preprocess "${ARG_SOURCE_DIRECTORY}/${ARG_FILE}" + OUTPUT ${ARG_OUTPUT} + VERBATIM + COMMAND ${CMAKE_COMMAND} -E make_directory "${jit_output_directory}" + COMMAND jitify_preprocess ${ARG_FILE} + -o ${CUDF_GENERATED_INCLUDE_DIR}/include/jit_preprocessed_files + -i + -m + -std=c++17 + -remove-unused-globals + -D__CUDACC_RTC__ + -I${CUDF_SOURCE_DIR}/include + -I${CUDF_SOURCE_DIR}/src + -I${LIBCUDACXX_INCLUDE_DIR} + -I${CUDAToolkit_INCLUDE_DIRS} + --no-preinclude-workarounds + --no-replace-pragma-once + ) + endforeach() + set(JIT_PREPROCESSED_FILES "${JIT_PREPROCESSED_FILES}" PARENT_SCOPE) +endfunction() + +jit_preprocess_files(SOURCE_DIRECTORY ${CUDF_SOURCE_DIR}/src + FILES binaryop/jit/kernel.cu + transform/jit/kernel.cu + rolling/jit/kernel.cu + ) + +add_custom_target(jitify_preprocess_run DEPENDS ${JIT_PREPROCESSED_FILES}) + +file(COPY "${LIBCUDACXX_INCLUDE_DIR}/" DESTINATION "${CUDF_GENERATED_INCLUDE_DIR}/include/libcudacxx") +file(COPY "${LIBCXX_INCLUDE_DIR}" DESTINATION "${CUDF_GENERATED_INCLUDE_DIR}/include/libcxx") diff --git a/cpp/cmake/Modules/SetGPUArchs.cmake b/cpp/cmake/Modules/SetGPUArchs.cmake index f09d5ead8e2..8ab3c14d671 100644 --- a/cpp/cmake/Modules/SetGPUArchs.cmake +++ b/cpp/cmake/Modules/SetGPUArchs.cmake @@ -38,16 +38,6 @@ if(NOT DEFINED CUDAToolkit_VERSION AND CMAKE_CUDA_COMPILER) unset(NVCC_OUT) endif() -if(CUDAToolkit_VERSION_MAJOR LESS 11) - list(REMOVE_ITEM SUPPORTED_CUDA_ARCHITECTURES "80") -endif() -if(CUDAToolkit_VERSION_MAJOR LESS 10) - list(REMOVE_ITEM SUPPORTED_CUDA_ARCHITECTURES "75") -endif() -if(CUDAToolkit_VERSION_MAJOR LESS 9) - list(REMOVE_ITEM SUPPORTED_CUDA_ARCHITECTURES "70") -endif() - if(${PROJECT_NAME}_BUILD_FOR_ALL_ARCHS) set(CMAKE_CUDA_ARCHITECTURES ${SUPPORTED_CUDA_ARCHITECTURES}) diff --git a/cpp/cmake/Modules/StringifyJITHeaders.cmake b/cpp/cmake/Modules/StringifyJITHeaders.cmake deleted file mode 100644 index 0bfb37773dc..00000000000 --- a/cpp/cmake/Modules/StringifyJITHeaders.cmake +++ /dev/null @@ -1,168 +0,0 @@ -#============================================================================= -# Copyright (c) 2018-2021, NVIDIA CORPORATION. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -#============================================================================= - -file(MAKE_DIRECTORY "${CUDF_GENERATED_INCLUDE_DIR}/include") - -# Create `stringify` executable -add_executable(stringify "${JITIFY_INCLUDE_DIR}/stringify.cpp") - -execute_process(WORKING_DIRECTORY ${CUDF_GENERATED_INCLUDE_DIR} - COMMAND ${CMAKE_COMMAND} -E make_directory - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include - ) - -# Use `stringify` to convert types.h to c-str for use in JIT code -add_custom_command(WORKING_DIRECTORY ${CUDF_SOURCE_DIR}/include - COMMENT "Stringify headers for use in JIT compiled code" - DEPENDS stringify - OUTPUT ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/types.h.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/types.hpp.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/bit.hpp.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/timestamps.hpp.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/fixed_point.hpp.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/durations.hpp.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/chrono.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/climits.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/cstddef.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/cstdint.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/ctime.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/limits.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/ratio.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/type_traits.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/version.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/__config.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/__pragma_pop.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/__pragma_push.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/__config.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/__pragma_pop.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/__pragma_push.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/__undef_macros.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/chrono.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/climits.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/cstddef.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/cstdint.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/ctime.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/limits.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/ratio.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/type_traits.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/version.jit - MAIN_DEPENDENCY ${CUDF_SOURCE_DIR}/include/cudf/types.h - ${CUDF_SOURCE_DIR}/include/cudf/types.hpp - ${CUDF_SOURCE_DIR}/include/cudf/utilities/bit.hpp - ${CUDF_SOURCE_DIR}/include/cudf/wrappers/timestamps.hpp - ${CUDF_SOURCE_DIR}/include/cudf/fixed_point/fixed_point.hpp - ${CUDF_SOURCE_DIR}/include/cudf/wrappers/durations.hpp - ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/chrono - ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/climits - ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/cstddef - ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/cstdint - ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/ctime - ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/limits - ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/ratio - ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/type_traits - ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/version - ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/detail/__config - ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/detail/__pragma_pop - ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/detail/__pragma_push - ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/detail/libcxx/include/__config - ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/detail/libcxx/include/__pragma_pop - ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/detail/libcxx/include/__pragma_push - ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/detail/libcxx/include/__undef_macros - ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/detail/libcxx/include/chrono - ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/detail/libcxx/include/climits - ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/detail/libcxx/include/cstddef - ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/detail/libcxx/include/cstdint - ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/detail/libcxx/include/ctime - ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/detail/libcxx/include/limits - ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/detail/libcxx/include/ratio - ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/detail/libcxx/include/type_traits - ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/detail/libcxx/include/version - - # stringified headers are placed underneath the bin include jit directory and end in ".jit" - COMMAND ${CUDF_BINARY_DIR}/stringify cudf/types.h > ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/types.h.jit - COMMAND ${CUDF_BINARY_DIR}/stringify cudf/types.hpp > ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/types.hpp.jit - COMMAND ${CUDF_BINARY_DIR}/stringify cudf/utilities/bit.hpp > ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/bit.hpp.jit - COMMAND ${CUDF_BINARY_DIR}/stringify ../src/rolling/rolling_jit_detail.hpp > ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/rolling_jit_detail.hpp.jit - COMMAND ${CUDF_BINARY_DIR}/stringify cudf/wrappers/timestamps.hpp > ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/timestamps.hpp.jit - COMMAND ${CUDF_BINARY_DIR}/stringify cudf/fixed_point/fixed_point.hpp > ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/fixed_point.hpp.jit - COMMAND ${CUDF_BINARY_DIR}/stringify cudf/wrappers/durations.hpp > ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/durations.hpp.jit - COMMAND ${CUDF_BINARY_DIR}/stringify ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/chrono cuda_std_chrono > ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/chrono.jit - COMMAND ${CUDF_BINARY_DIR}/stringify ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/climits cuda_std_climits > ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/climits.jit - COMMAND ${CUDF_BINARY_DIR}/stringify ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/cstddef cuda_std_cstddef > ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/cstddef.jit - COMMAND ${CUDF_BINARY_DIR}/stringify ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/cstdint cuda_std_cstdint > ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/cstdint.jit - COMMAND ${CUDF_BINARY_DIR}/stringify ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/ctime cuda_std_ctime > ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/ctime.jit - COMMAND ${CUDF_BINARY_DIR}/stringify ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/limits cuda_std_limits > ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/limits.jit - COMMAND ${CUDF_BINARY_DIR}/stringify ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/ratio cuda_std_ratio > ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/ratio.jit - COMMAND ${CUDF_BINARY_DIR}/stringify ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/type_traits cuda_std_type_traits > ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/type_traits.jit - COMMAND ${CUDF_BINARY_DIR}/stringify ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/version cuda_std_version > ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/version.jit - COMMAND ${CUDF_BINARY_DIR}/stringify ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/detail/__config cuda_std_detail___config > ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/__config.jit - COMMAND ${CUDF_BINARY_DIR}/stringify ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/detail/__pragma_pop cuda_std_detail___pragma_pop > ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/__pragma_pop.jit - COMMAND ${CUDF_BINARY_DIR}/stringify ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/detail/__pragma_push cuda_std_detail___pragma_push > ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/__pragma_push.jit - COMMAND ${CUDF_BINARY_DIR}/stringify ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/detail/libcxx/include/__config cuda_std_detail_libcxx_include___config > ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/__config.jit - COMMAND ${CUDF_BINARY_DIR}/stringify ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/detail/libcxx/include/__pragma_pop cuda_std_detail_libcxx_include___pragma_pop > ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/__pragma_pop.jit - COMMAND ${CUDF_BINARY_DIR}/stringify ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/detail/libcxx/include/__pragma_push cuda_std_detail_libcxx_include___pragma_push > ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/__pragma_push.jit - COMMAND ${CUDF_BINARY_DIR}/stringify ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/detail/libcxx/include/__undef_macros cuda_std_detail_libcxx_include___undef_macros > ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/__undef_macros.jit - COMMAND ${CUDF_BINARY_DIR}/stringify ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/detail/libcxx/include/chrono cuda_std_detail_libcxx_include_chrono > ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/chrono.jit - COMMAND ${CUDF_BINARY_DIR}/stringify ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/detail/libcxx/include/climits cuda_std_detail_libcxx_include_climits > ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/climits.jit - COMMAND ${CUDF_BINARY_DIR}/stringify ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/detail/libcxx/include/cstddef cuda_std_detail_libcxx_include_cstddef > ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/cstddef.jit - COMMAND ${CUDF_BINARY_DIR}/stringify ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/detail/libcxx/include/cstdint cuda_std_detail_libcxx_include_cstdint > ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/cstdint.jit - COMMAND ${CUDF_BINARY_DIR}/stringify ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/detail/libcxx/include/ctime cuda_std_detail_libcxx_include_ctime > ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/ctime.jit - COMMAND ${CUDF_BINARY_DIR}/stringify ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/detail/libcxx/include/limits cuda_std_detail_libcxx_include_limits > ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/limits.jit - COMMAND ${CUDF_BINARY_DIR}/stringify ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/detail/libcxx/include/ratio cuda_std_detail_libcxx_include_ratio > ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/ratio.jit - COMMAND ${CUDF_BINARY_DIR}/stringify ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/detail/libcxx/include/type_traits cuda_std_detail_libcxx_include_type_traits > ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/type_traits.jit - COMMAND ${CUDF_BINARY_DIR}/stringify ${LIBCUDACXX_INCLUDE_DIR}/cuda/std/detail/libcxx/include/version cuda_std_detail_libcxx_include_version > ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/version.jit - ) - -add_custom_target(stringify_run DEPENDS - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/types.h.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/types.hpp.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/bit.hpp.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/timestamps.hpp.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/fixed_point.hpp.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/durations.hpp.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/chrono.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/climits.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/cstddef.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/cstdint.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/ctime.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/limits.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/ratio.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/type_traits.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/version.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/__config.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/__pragma_pop.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/__pragma_push.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/__config.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/__pragma_pop.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/__pragma_push.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/__undef_macros.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/chrono.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/climits.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/cstddef.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/cstdint.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/ctime.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/limits.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/ratio.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/type_traits.jit - ${CUDF_GENERATED_INCLUDE_DIR}/include/jit/libcudacxx/cuda/std/detail/libcxx/include/version.jit - ) - -################################################################################################### -# - copy libcu++ ---------------------------------------------------------------------------------- - -# `${LIBCUDACXX_INCLUDE_DIR}/` specifies that the contents of this directory will be installed (not the directory itself) -file(COPY "${LIBCUDACXX_INCLUDE_DIR}/" DESTINATION "${CUDF_GENERATED_INCLUDE_DIR}/include/libcudacxx") -file(COPY "${LIBCXX_INCLUDE_DIR}" DESTINATION "${CUDF_GENERATED_INCLUDE_DIR}/include/libcxx") diff --git a/cpp/cmake/thirdparty/CUDF_GetArrow.cmake b/cpp/cmake/thirdparty/CUDF_GetArrow.cmake index 002085c2973..c1c29a693d5 100644 --- a/cpp/cmake/thirdparty/CUDF_GetArrow.cmake +++ b/cpp/cmake/thirdparty/CUDF_GetArrow.cmake @@ -43,6 +43,7 @@ function(find_and_configure_arrow VERSION BUILD_STATIC) GIT_SHALLOW TRUE SOURCE_SUBDIR cpp OPTIONS "CMAKE_VERBOSE_MAKEFILE ON" + "CUDA_USE_STATIC_CUDA_RUNTIME ${CUDA_STATIC_RUNTIME}" "ARROW_IPC ON" "ARROW_CUDA ON" "ARROW_DATASET ON" diff --git a/cpp/cmake/thirdparty/CUDF_GetCPM.cmake b/cpp/cmake/thirdparty/CUDF_GetCPM.cmake index 19c07933d42..d0fe88eb398 100644 --- a/cpp/cmake/thirdparty/CUDF_GetCPM.cmake +++ b/cpp/cmake/thirdparty/CUDF_GetCPM.cmake @@ -1,6 +1,8 @@ -set(CPM_DOWNLOAD_VERSION 3b404296b539e596f39421c4e92bc803b299d964) # v0.27.5 +set(CPM_DOWNLOAD_VERSION 4fad2eac0a3741df3d9c44b791f9163b74aa7b07) # 0.32.0 if(CPM_SOURCE_CACHE) + # Expand relative path. This is important if the provided path contains a tilde (~) + get_filename_component(CPM_SOURCE_CACHE ${CPM_SOURCE_CACHE} ABSOLUTE) set(CPM_DOWNLOAD_LOCATION "${CPM_SOURCE_CACHE}/cpm/CPM_${CPM_DOWNLOAD_VERSION}.cmake") elseif(DEFINED ENV{CPM_SOURCE_CACHE}) set(CPM_DOWNLOAD_LOCATION "$ENV{CPM_SOURCE_CACHE}/cpm/CPM_${CPM_DOWNLOAD_VERSION}.cmake") @@ -12,7 +14,7 @@ if(NOT (EXISTS ${CPM_DOWNLOAD_LOCATION})) message(VERBOSE "CUDF: Downloading CPM.cmake to ${CPM_DOWNLOAD_LOCATION}") file( DOWNLOAD - https://raw.githubusercontent.com/TheLartians/CPM.cmake/${CPM_DOWNLOAD_VERSION}/cmake/CPM.cmake + https://raw.githubusercontent.com/cpm-cmake/CPM.cmake/${CPM_DOWNLOAD_VERSION}/cmake/CPM.cmake ${CPM_DOWNLOAD_LOCATION}) endif() diff --git a/cpp/cmake/thirdparty/CUDF_GetJitify.cmake b/cpp/cmake/thirdparty/CUDF_GetJitify.cmake index e041be26d64..6e853816ec5 100644 --- a/cpp/cmake/thirdparty/CUDF_GetJitify.cmake +++ b/cpp/cmake/thirdparty/CUDF_GetJitify.cmake @@ -1,5 +1,5 @@ #============================================================================= -# Copyright (c) 2020, NVIDIA CORPORATION. +# Copyright (c) 2020-2021, NVIDIA CORPORATION. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -18,9 +18,9 @@ function(find_and_configure_jitify) CPMFindPackage(NAME jitify - VERSION 1.0.0 + VERSION 2.0.0 GIT_REPOSITORY https://github.com/rapidsai/jitify.git - GIT_TAG cudf_0.16 + GIT_TAG cudf_0.19 GIT_SHALLOW TRUE DOWNLOAD_ONLY TRUE) set(JITIFY_INCLUDE_DIR "${jitify_SOURCE_DIR}" PARENT_SCOPE) diff --git a/cpp/cmake/thirdparty/CUDF_GetRMM.cmake b/cpp/cmake/thirdparty/CUDF_GetRMM.cmake index 136947674f9..9f6221d5d1f 100644 --- a/cpp/cmake/thirdparty/CUDF_GetRMM.cmake +++ b/cpp/cmake/thirdparty/CUDF_GetRMM.cmake @@ -14,19 +14,6 @@ # limitations under the License. #============================================================================= -function(cudf_save_if_enabled var) - if(CUDF_${var}) - unset(${var} PARENT_SCOPE) - unset(${var} CACHE) - endif() -endfunction() - -function(cudf_restore_if_enabled var) - if(CUDF_${var}) - set(${var} ON CACHE INTERNAL "" FORCE) - endif() -endfunction() - function(find_and_configure_rmm VERSION) if(TARGET rmm::rmm) @@ -37,9 +24,6 @@ function(find_and_configure_rmm VERSION) # 1. Pass `-D CPM_rmm_SOURCE=/path/to/rmm` to build a local RMM source tree # 2. Pass `-D CMAKE_PREFIX_PATH=/path/to/rmm/build` to use an existing local # RMM build directory as the install location for find_package(rmm) - cudf_save_if_enabled(BUILD_TESTS) - cudf_save_if_enabled(BUILD_BENCHMARKS) - CPMFindPackage(NAME rmm VERSION ${VERSION} GIT_REPOSITORY https://github.com/rapidsai/rmm.git @@ -50,8 +34,6 @@ function(find_and_configure_rmm VERSION) "CUDA_STATIC_RUNTIME ${CUDA_STATIC_RUNTIME}" "DISABLE_DEPRECATION_WARNING ${DISABLE_DEPRECATION_WARNING}" ) - cudf_restore_if_enabled(BUILD_TESTS) - cudf_restore_if_enabled(BUILD_BENCHMARKS) # Make sure consumers of cudf can also see rmm::rmm fix_cmake_global_defaults(rmm::rmm) diff --git a/cpp/cmake/thirdparty/CUDF_GetThrust.cmake b/cpp/cmake/thirdparty/CUDF_GetThrust.cmake index 5a304f234d2..daafe4a33a5 100644 --- a/cpp/cmake/thirdparty/CUDF_GetThrust.cmake +++ b/cpp/cmake/thirdparty/CUDF_GetThrust.cmake @@ -15,12 +15,23 @@ #============================================================================= function(find_and_configure_thrust VERSION) + # We only want to set `UPDATE_DISCONNECTED` while + # the GIT tag hasn't moved from the last time we cloned + set(cpm_thrust_disconnect_update "UPDATE_DISCONNECTED TRUE") + set(CPM_THRUST_CURRENT_VERSION ${VERSION} CACHE STRING "version of thrust we checked out") + if(NOT VERSION VERSION_EQUAL CPM_THRUST_CURRENT_VERSION) + set(CPM_THRUST_CURRENT_VERSION ${VERSION} CACHE STRING "version of thrust we checked out" FORCE) + set(cpm_thrust_disconnect_update "") + endif() + CPMAddPackage(NAME Thrust VERSION ${VERSION} GIT_REPOSITORY https://github.com/NVIDIA/thrust.git GIT_TAG ${VERSION} GIT_SHALLOW TRUE - PATCH_COMMAND patch -p1 -N < ${CUDF_SOURCE_DIR}/cmake/thrust.patch || true) + ${cpm_thrust_disconnect_update} + PATCH_COMMAND patch --reject-file=- -p1 -N < ${CUDF_SOURCE_DIR}/cmake/thrust.patch || true + ) thrust_create_target(cudf::Thrust FROM_OPTIONS) set(THRUST_LIBRARY "cudf::Thrust" PARENT_SCOPE) diff --git a/cpp/doxygen/Doxyfile b/cpp/doxygen/Doxyfile index 8fde8098bd3..eaa632860e5 100644 --- a/cpp/doxygen/Doxyfile +++ b/cpp/doxygen/Doxyfile @@ -38,7 +38,7 @@ PROJECT_NAME = "libcudf" # could be handy for archiving the generated documentation or if some version # control system is used. -PROJECT_NUMBER = 0.19.0 +PROJECT_NUMBER = 0.20.0 # Using the PROJECT_BRIEF tag one can provide an optional one line description # for a project that appears at the top of each page and should give viewer a @@ -2167,7 +2167,7 @@ SKIP_FUNCTION_MACROS = YES # the path). If a tag file is not located in the directory in which doxygen is # run, you must also specify the path to the tagfile here. -TAGFILES = rmm.tag=https://docs.rapids.ai/api/librmm/0.19 +TAGFILES = rmm.tag=https://docs.rapids.ai/api/librmm/0.20 # When a file name is specified after GENERATE_TAGFILE, doxygen will create a # tag file that is based on the input files it reads. See section "Linking to diff --git a/cpp/include/cudf/detail/gather.cuh b/cpp/include/cudf/detail/gather.cuh index bf488621d52..7a560e4c048 100644 --- a/cpp/include/cudf/detail/gather.cuh +++ b/cpp/include/cudf/detail/gather.cuh @@ -142,7 +142,11 @@ void gather_helper(InputItr source_itr, // Error case when no other overload or specialization is available template struct column_gatherer_impl { - std::unique_ptr operator()(...) { CUDF_FAIL("Unsupported type in gather."); } + template + std::unique_ptr operator()(Args&&...) + { + CUDF_FAIL("Unsupported type in gather."); + } }; /** @@ -466,15 +470,20 @@ struct column_gatherer_impl { mr); }); - gather_bitmask( - // Table view of struct column. - cudf::table_view{ - std::vector{structs_column.child_begin(), structs_column.child_end()}}, - gather_map_begin, - output_struct_members, - nullify_out_of_bounds ? gather_bitmask_op::NULLIFY : gather_bitmask_op::DONT_CHECK, - stream, - mr); + auto const nullable = std::any_of(structs_column.child_begin(), + structs_column.child_end(), + [](auto const& col) { return col.nullable(); }); + if (nullable) { + gather_bitmask( + // Table view of struct column. + cudf::table_view{ + std::vector{structs_column.child_begin(), structs_column.child_end()}}, + gather_map_begin, + output_struct_members, + nullify_out_of_bounds ? gather_bitmask_op::NULLIFY : gather_bitmask_op::DONT_CHECK, + stream, + mr); + } return cudf::make_structs_column( gather_map_size, @@ -652,11 +661,15 @@ std::unique_ptr gather( mr)); } - gather_bitmask_op const op = bounds_policy == out_of_bounds_policy::NULLIFY - ? gather_bitmask_op::NULLIFY - : gather_bitmask_op::DONT_CHECK; - - gather_bitmask(source_table, gather_map_begin, destination_columns, op, stream, mr); + auto const nullable = bounds_policy == out_of_bounds_policy::NULLIFY || + std::any_of(source_table.begin(), source_table.end(), [](auto const& col) { + return col.nullable(); + }); + if (nullable) { + auto const op = bounds_policy == out_of_bounds_policy::NULLIFY ? gather_bitmask_op::NULLIFY + : gather_bitmask_op::DONT_CHECK; + gather_bitmask(source_table, gather_map_begin, destination_columns, op, stream, mr); + } return std::make_unique
(std::move(destination_columns)); } diff --git a/cpp/include/cudf/detail/scatter.cuh b/cpp/include/cudf/detail/scatter.cuh index 30764b9b89f..d069ed06cae 100644 --- a/cpp/include/cudf/detail/scatter.cuh +++ b/cpp/include/cudf/detail/scatter.cuh @@ -25,6 +25,7 @@ #include #include #include +#include #include #include #include @@ -32,6 +33,8 @@ #include #include +#include + namespace cudf { namespace detail { @@ -42,10 +45,9 @@ namespace detail { * function using the PASSTHROUGH op since the resulting map may contain index * values outside the target's range. * - * First, the gather-map is initialized with invalid entries. - * The gather_rows is used since it should always be outside the target size. - * - * Then, the `output[scatter_map[i]] = i`. + * First, the gather-map is initialized with an invalid index. + * The value `numeric_limits::lowest()` is used since it should always be outside the target size. + * Then, `output[scatter_map[i]] = i` for each `i`. * * @tparam MapIterator Iterator type of the input scatter map. * @param scatter_map_begin Beginning of scatter map. @@ -62,11 +64,16 @@ auto scatter_to_gather(MapIterator scatter_map_begin, { using MapValueType = typename thrust::iterator_traits::value_type; - // The gather_map is initialized with gather_rows value to identify pass-through entries - // when calling the gather_bitmask() which applies a pass-through whenever it finds a + // The gather_map is initialized with `numeric_limits::lowest()` value to identify pass-through + // entries when calling the gather_bitmask() which applies a pass-through whenever it finds a // value outside the range of the target column. - // We'll use the gather_rows value for this since it should always be outside the valid range. - auto gather_map = rmm::device_vector(gather_rows, gather_rows); + // We'll use the `numeric_limits::lowest()` value for this since it should always be outside the + // valid range. + auto gather_map = rmm::device_uvector(gather_rows, stream); + thrust::uninitialized_fill(rmm::exec_policy(stream), + gather_map.begin(), + gather_map.end(), + std::numeric_limits::lowest()); // Convert scatter map to a gather map thrust::scatter( @@ -79,9 +86,46 @@ auto scatter_to_gather(MapIterator scatter_map_begin, return gather_map; } +/** + * @brief Create a complement map of `scatter_to_gather` map + * + * The purpose of this map is to create an identity-mapping for the rows that are not + * touched by the `scatter_map`. + * + * The output result of this mapping is firstly initialized as an identity-mapping + * (i.e., `output[i] = i`). Then, for each value `idx` from `scatter_map`, the value `output[idx]` + * is set to `numeric_limits::lowest()`, which is an invalid, out-of-bound index to identify the + * pass-through entries when calling the `gather_bitmask()` function. + * + */ +template +auto scatter_to_gather_complement(MapIterator scatter_map_begin, + MapIterator scatter_map_end, + size_type gather_rows, + rmm::cuda_stream_view stream) +{ + auto gather_map = rmm::device_uvector(gather_rows, stream); + thrust::sequence(rmm::exec_policy(stream), gather_map.begin(), gather_map.end(), 0); + + auto const out_of_bounds_begin = + thrust::make_constant_iterator(std::numeric_limits::lowest()); + auto const out_of_bounds_end = + out_of_bounds_begin + thrust::distance(scatter_map_begin, scatter_map_end); + thrust::scatter(rmm::exec_policy(stream), + out_of_bounds_begin, + out_of_bounds_end, + scatter_map_begin, + gather_map.begin()); + return gather_map; +} + template struct column_scatterer_impl { - std::unique_ptr operator()(...) const { CUDF_FAIL("Unsupported type for scatter."); } + template + std::unique_ptr operator()(Args&&...) const + { + CUDF_FAIL("Unsupported type for scatter."); + } }; template @@ -214,6 +258,89 @@ struct column_scatterer { } }; +template <> +struct column_scatterer_impl { + template + std::unique_ptr operator()(column_view const& source, + MapItRoot scatter_map_begin, + MapItRoot scatter_map_end, + column_view const& target, + rmm::cuda_stream_view stream, + rmm::mr::device_memory_resource* mr) const + { + CUDF_EXPECTS(source.num_children() == target.num_children(), + "Scatter source and target are not of the same type."); + + auto const scatter_map_size = std::distance(scatter_map_begin, scatter_map_end); + if (scatter_map_size == 0) { return std::make_unique(target, stream, mr); } + + structs_column_view const structs_src(source); + structs_column_view const structs_target(target); + std::vector> output_struct_members(structs_src.num_children()); + + std::transform(structs_src.child_begin(), + structs_src.child_end(), + structs_target.child_begin(), + output_struct_members.begin(), + [&scatter_map_begin, &scatter_map_end, stream, mr](auto const& source_col, + auto const& target_col) { + return type_dispatcher(source_col.type(), + column_scatterer{}, + source_col, + scatter_map_begin, + scatter_map_end, + target_col, + stream, + mr); + }); + + // We still need to call `gather_bitmask` even when the source's children are not nullable, + // as if the target's children have null_masks, those null_masks need to be updated after + // being scattered onto + auto const child_nullable = std::any_of(structs_src.child_begin(), + structs_src.child_end(), + [](auto const& col) { return col.nullable(); }) or + std::any_of(structs_target.child_begin(), + structs_target.child_end(), + [](auto const& col) { return col.nullable(); }); + if (child_nullable) { + auto const gather_map = + scatter_to_gather(scatter_map_begin, scatter_map_end, source.size(), stream); + gather_bitmask(cudf::table_view{std::vector{structs_src.child_begin(), + structs_src.child_end()}}, + gather_map.begin(), + output_struct_members, + gather_bitmask_op::PASSTHROUGH, + stream, + mr); + } + + // Need to put the result column in a vector to call `gather_bitmask` + std::vector> result; + result.emplace_back(cudf::make_structs_column(source.size(), + std::move(output_struct_members), + 0, + rmm::device_buffer{0, stream, mr}, + stream, + mr)); + + // Only gather bitmask from the target column for the rows that have not been scattered onto + // The bitmask from the source column will be gathered at the top level `scatter()` call + if (target.nullable()) { + auto const gather_map = + scatter_to_gather_complement(scatter_map_begin, scatter_map_end, target.size(), stream); + gather_bitmask(table_view{std::vector{target}}, + gather_map.begin(), + result, + gather_bitmask_op::PASSTHROUGH, + stream, + mr); + } + + return std::move(result.front()); + } +}; + /** * @brief Scatters the rows of the source table into a copy of the target table * according to a scatter map. @@ -278,10 +405,8 @@ std::unique_ptr
scatter( // Transform negative indices to index + target size auto updated_scatter_map_begin = thrust::make_transform_iterator(scatter_map_begin, index_converter{target.num_rows()}); - auto updated_scatter_map_end = thrust::make_transform_iterator(scatter_map_end, index_converter{target.num_rows()}); - auto result = std::vector>(target.num_columns()); std::transform(source.begin(), @@ -299,11 +424,16 @@ std::unique_ptr
scatter( mr); }); - auto gather_map = scatter_to_gather( - updated_scatter_map_begin, updated_scatter_map_end, target.num_rows(), stream); - - gather_bitmask(source, gather_map.begin(), result, gather_bitmask_op::PASSTHROUGH, stream, mr); - + // We still need to call `gather_bitmask` even when the source columns are not nullable, + // as if the target has null_mask, that null_mask needs to be updated after scattering + auto const nullable = + std::any_of(source.begin(), source.end(), [](auto const& col) { return col.nullable(); }) or + std::any_of(target.begin(), target.end(), [](auto const& col) { return col.nullable(); }); + if (nullable) { + auto const gather_map = scatter_to_gather( + updated_scatter_map_begin, updated_scatter_map_end, target.num_rows(), stream); + gather_bitmask(source, gather_map.begin(), result, gather_bitmask_op::PASSTHROUGH, stream, mr); + } return std::make_unique
(std::move(result)); } } // namespace detail diff --git a/cpp/include/cudf/detail/utilities/hash_functions.cuh b/cpp/include/cudf/detail/utilities/hash_functions.cuh index 31533a69487..7f3c05134e2 100644 --- a/cpp/include/cudf/detail/utilities/hash_functions.cuh +++ b/cpp/include/cudf/detail/utilities/hash_functions.cuh @@ -20,6 +20,7 @@ #include #include #include +#include #include using hash_value_type = uint32_t; @@ -231,6 +232,9 @@ MD5ListHasher::operator()(column_device_view data_col, } struct MD5Hash { + MD5Hash() = default; + constexpr MD5Hash(uint32_t seed) : m_seed(seed) {} + void __device__ finalize(md5_intermediate_data* hash_state, char* result_location) const { auto const full_length = (static_cast(hash_state->message_length)) << 3; @@ -302,6 +306,9 @@ struct MD5Hash { { md5_process(col.element(row_index), hash_state); } + + private: + uint32_t m_seed{cudf::DEFAULT_HASH_SEED}; }; template <> @@ -372,7 +379,7 @@ struct MurmurHash3_32 { using result_type = hash_value_type; MurmurHash3_32() = default; - CUDA_HOST_DEVICE_CALLABLE MurmurHash3_32(uint32_t seed) : m_seed(seed) {} + constexpr MurmurHash3_32(uint32_t seed) : m_seed(seed) {} CUDA_DEVICE_CALLABLE uint32_t rotl32(uint32_t x, int8_t r) const { @@ -469,7 +476,7 @@ struct MurmurHash3_32 { } private: - uint32_t m_seed{0}; + uint32_t m_seed{cudf::DEFAULT_HASH_SEED}; }; template <> @@ -542,13 +549,29 @@ hash_value_type CUDA_DEVICE_CALLABLE MurmurHash3_32::operator()(double c return this->compute_floating_point(key); } +template <> +hash_value_type CUDA_DEVICE_CALLABLE +MurmurHash3_32::operator()(cudf::list_view const& key) const +{ + cudf_assert(false && "List column hashing is not supported"); + return 0; +} + +template <> +hash_value_type CUDA_DEVICE_CALLABLE +MurmurHash3_32::operator()(cudf::struct_view const& key) const +{ + cudf_assert(false && "Direct hashing of struct_view is not supported"); + return 0; +} + template struct SparkMurmurHash3_32 { using argument_type = Key; using result_type = hash_value_type; SparkMurmurHash3_32() = default; - CUDA_HOST_DEVICE_CALLABLE SparkMurmurHash3_32(uint32_t seed) : m_seed(seed) {} + constexpr SparkMurmurHash3_32(uint32_t seed) : m_seed(seed) {} CUDA_DEVICE_CALLABLE uint32_t rotl32(uint32_t x, int8_t r) const { @@ -620,7 +643,7 @@ struct SparkMurmurHash3_32 { } private: - uint32_t m_seed{0}; + uint32_t m_seed{cudf::DEFAULT_HASH_SEED}; }; template <> @@ -671,6 +694,22 @@ SparkMurmurHash3_32::operator()(numeric::decimal64 const& ke return this->compute(key.value()); } +template <> +hash_value_type CUDA_DEVICE_CALLABLE +SparkMurmurHash3_32::operator()(cudf::list_view const& key) const +{ + cudf_assert(false && "List column hashing is not supported"); + return 0; +} + +template <> +hash_value_type CUDA_DEVICE_CALLABLE +SparkMurmurHash3_32::operator()(cudf::struct_view const& key) const +{ + cudf_assert(false && "Direct hashing of struct_view is not supported"); + return 0; +} + /** * @brief Specialization of MurmurHash3_32 operator for strings. */ @@ -740,6 +779,8 @@ SparkMurmurHash3_32::operator()(double const& key) const template struct IdentityHash { using result_type = hash_value_type; + IdentityHash() = default; + constexpr IdentityHash(uint32_t seed) : m_seed(seed) {} /** * @brief Combines two hash values into a new single hash value. Called @@ -752,7 +793,7 @@ struct IdentityHash { * * @returns A hash value that intelligently combines the lhs and rhs hash values */ - CUDA_HOST_DEVICE_CALLABLE result_type hash_combine(result_type lhs, result_type rhs) const + constexpr result_type hash_combine(result_type lhs, result_type rhs) const { result_type combined{lhs}; @@ -762,19 +803,22 @@ struct IdentityHash { } template - CUDA_HOST_DEVICE_CALLABLE std::enable_if_t::value, return_type> - operator()(Key const& key) const + constexpr std::enable_if_t::value, return_type> operator()( + Key const& key) const { cudf_assert(false && "IdentityHash does not support this data type"); return 0; } template - CUDA_HOST_DEVICE_CALLABLE std::enable_if_t::value, return_type> - operator()(Key const& key) const + constexpr std::enable_if_t::value, return_type> operator()( + Key const& key) const { return static_cast(key); } + + private: + uint32_t m_seed{cudf::DEFAULT_HASH_SEED}; }; template diff --git a/cpp/include/cudf/detail/utilities/vector_factories.hpp b/cpp/include/cudf/detail/utilities/vector_factories.hpp index 030d2c331c5..90e6a5c9643 100644 --- a/cpp/include/cudf/detail/utilities/vector_factories.hpp +++ b/cpp/include/cudf/detail/utilities/vector_factories.hpp @@ -14,6 +14,8 @@ * limitations under the License. */ +#pragma once + /** * @brief Convenience factories for creating device vectors from host spans * @file vector_factories.hpp @@ -231,6 +233,93 @@ rmm::device_uvector make_device_uvector_sync( return make_device_uvector_sync(device_span{c}, stream, mr); } +/** + * @brief Asynchronously construct a `std::vector` containing a copy of data from a + * `device_span` + * + * @note This function does not synchronize `stream`. + * + * @tparam T The type of the data to copy + * @param source_data The device data to copy + * @param stream The stream on which to perform the copy + * @return The data copied to the host + */ +template +std::vector make_std_vector_async(device_span v, + rmm::cuda_stream_view stream = rmm::cuda_stream_default) +{ + std::vector result(v.size()); + CUDA_TRY(cudaMemcpyAsync( + result.data(), v.data(), v.size() * sizeof(T), cudaMemcpyDeviceToHost, stream.value())); + return result; +} + +/** + * @brief Asynchronously construct a `std::vector` containing a copy of data from a device + * container + * + * @note This function synchronizes `stream`. + * + * @tparam Container The type of the container to copy from + * @tparam T The type of the data to copy + * @param c The input device container from which to copy + * @param stream The stream on which to perform the copy + * @return The data copied to the host + */ +template < + typename Container, + std::enable_if_t< + std::is_convertible>::value>* = + nullptr> +std::vector make_std_vector_async( + Container const& c, rmm::cuda_stream_view stream = rmm::cuda_stream_default) +{ + return make_std_vector_async(device_span{c}, stream); +} + +/** + * @brief Synchronously construct a `std::vector` containing a copy of data from a + * `device_span` + * + * @note This function does a synchronize on `stream`. + * + * @tparam T The type of the data to copy + * @param source_data The device data to copy + * @param stream The stream on which to perform the copy + * @return The data copied to the host + */ +template +std::vector make_std_vector_sync(device_span v, + rmm::cuda_stream_view stream = rmm::cuda_stream_default) +{ + auto result = make_std_vector_async(v, stream); + stream.synchronize(); + return result; +} + +/** + * @brief Synchronously construct a `std::vector` containing a copy of data from a device + * container + * + * @note This function synchronizes `stream`. + * + * @tparam Container The type of the container to copy from + * @tparam T The type of the data to copy + * @param c The input device container from which to copy + * @param stream The stream on which to perform the copy + * @return The data copied to the host + */ +template < + typename Container, + std::enable_if_t< + std::is_convertible>::value>* = + nullptr> +std::vector make_std_vector_sync( + Container const& c, rmm::cuda_stream_view stream = rmm::cuda_stream_default) +{ + return make_std_vector_sync(device_span{c}, stream); +} + } // namespace detail } // namespace cudf diff --git a/cpp/include/cudf/hashing.hpp b/cpp/include/cudf/hashing.hpp index 3f95b8b417b..0fb5002a953 100644 --- a/cpp/include/cudf/hashing.hpp +++ b/cpp/include/cudf/hashing.hpp @@ -39,7 +39,7 @@ std::unique_ptr hash( table_view const& input, hash_id hash_function = hash_id::HASH_MURMUR3, std::vector const& initial_hash = {}, - uint32_t seed = 0, + uint32_t seed = DEFAULT_HASH_SEED, rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource()); /** @} */ // end of group diff --git a/cpp/include/cudf/join.hpp b/cpp/include/cudf/join.hpp index fcc0bcd444e..5a2c913d4c3 100644 --- a/cpp/include/cudf/join.hpp +++ b/cpp/include/cudf/join.hpp @@ -41,13 +41,14 @@ namespace cudf { * the matched row indices from the right table. * * @code{.pseudo} - * Left: {{0, 1, 2}} - * Right: {{1, 2, 3}} - * Result: {{1, 2}, {0, 1}} + * Left: {{0, 1, 2}} + * Right: {{1, 2, 3}} + * Result: {{1, 2}, {0, 1}} * - * Left: {{0, 1, 2}, {3, 4, 5}} - * Right: {{1, 2, 3}, {4, 6, 7}} - * Result: {{1}, {0}} + * Left: {{0, 1, 2}, {3, 4, 5}} + * Right: {{1, 2, 3}, {4, 6, 7}} + * Result: {{1}, {0}} + * @endcode * * @throw cudf::logic_error if number of elements in `left_keys` or `right_keys` * mismatch. @@ -77,10 +78,10 @@ inner_join(cudf::table_view const& left_keys, * in the columns being joined on match. * * @code{.pseudo} - * Left: {{0, 1, 2}} - * Right: {{4, 9, 3}, {1, 2, 5}} - * left_on: {0} - * right_on: {1} + * Left: {{0, 1, 2}} + * Right: {{4, 9, 3}, {1, 2, 5}} + * left_on: {0} + * right_on: {1} * Result: {{1, 2}, {4, 9}, {1, 2}} * @endcode * @@ -125,13 +126,14 @@ std::unique_ptr inner_join( * out-of-bounds value. * * @code{.pseudo} - * Left: {{0, 1, 2}} - * Right: {{1, 2, 3}} - * Result: {{0, 1, 2}, {None, 0, 1}} + * Left: {{0, 1, 2}} + * Right: {{1, 2, 3}} + * Result: {{0, 1, 2}, {None, 0, 1}} * - * Left: {{0, 1, 2}, {3, 4, 5}} - * Right: {{1, 2, 3}, {4, 6, 7}} - * Result: {{0, 1, 2}, {None, 0, None}} + * Left: {{0, 1, 2}, {3, 4, 5}} + * Right: {{1, 2, 3}, {4, 6, 7}} + * Result: {{0, 1, 2}, {None, 0, None}} + * @endcode * * @throw cudf::logic_error if number of elements in `left_keys` or `right_keys` * mismatch. @@ -163,16 +165,16 @@ left_join(cudf::table_view const& left_keys, * values in the left columns will be null. * * @code{.pseudo} - * Left: {{0, 1, 2}} - * Right: {{1, 2, 3}, {1, 2 ,5}} - * left_on: {0} - * right_on: {1} + * Left: {{0, 1, 2}} + * Right: {{1, 2, 3}, {1, 2 ,5}} + * left_on: {0} + * right_on: {1} * Result: { {0, 1, 2}, {NULL, 1, 2}, {NULL, 1, 2} } * - * Left: {{0, 1, 2}} - * Right {{1, 2, 3}, {1, 2, 5}} - * left_on: {0} - * right_on: {0} + * Left: {{0, 1, 2}} + * Right {{1, 2, 3}, {1, 2, 5}} + * left_on: {0} + * right_on: {0} * Result: { {0, 1, 2}, {NULL, 1, 2}, {NULL, 1, 2} } * @endcode * @@ -216,13 +218,14 @@ std::unique_ptr left_join( * representing a row from one table without a match in the other. * * @code{.pseudo} - * Left: {{0, 1, 2}} - * Right: {{1, 2, 3}} - * Result: {{0, 1, 2, None}, {None, 0, 1, 2}} + * Left: {{0, 1, 2}} + * Right: {{1, 2, 3}} + * Result: {{0, 1, 2, None}, {None, 0, 1, 2}} * - * Left: {{0, 1, 2}, {3, 4, 5}} - * Right: {{1, 2, 3}, {4, 6, 7}} - * Result: {{0, 1, 2, None, None}, {None, 0, None, 1, 2}} + * Left: {{0, 1, 2}, {3, 4, 5}} + * Right: {{1, 2, 3}, {4, 6, 7}} + * Result: {{0, 1, 2, None, None}, {None, 0, None, 1, 2}} + * @endcode * * @throw cudf::logic_error if number of elements in `left_keys` or `right_keys` * mismatch. @@ -254,16 +257,16 @@ full_join(cudf::table_view const& left_keys, * values in the left columns will be null. * * @code{.pseudo} - * Left: {{0, 1, 2}} - * Right: {{1, 2, 3}, {1, 2, 5}} - * left_on: {0} - * right_on: {1} + * Left: {{0, 1, 2}} + * Right: {{1, 2, 3}, {1, 2, 5}} + * left_on: {0} + * right_on: {1} * Result: { {0, 1, 2, NULL}, {NULL, 1, 2, 3}, {NULL, 1, 2, 5} } * - * Left: {{0, 1, 2}} - * Right: {{1, 2, 3}, {1, 2, 5}} - * left_on: {0} - * right_on: {0} + * Left: {{0, 1, 2}} + * Right: {{1, 2, 3}, {1, 2, 5}} + * left_on: {0} + * right_on: {0} * Result: { {0, 1, 2, NULL}, {NULL, 1, 2, 3}, {NULL, 1, 2, 5} } * @endcode * @@ -305,9 +308,9 @@ std::unique_ptr full_join( * for which there is a matching row in the right table. * * @code{.pseudo} - * TableA: {{0, 1, 2}} - * TableB: {{1, 2, 3}} - * right_on: {1} + * TableA: {{0, 1, 2}} + * TableB: {{1, 2, 3}} + * right_on: {1} * Result: {1, 2} * @endcode * @@ -338,16 +341,16 @@ std::unique_ptr> left_semi_join( * returns rows that exist in the right table. * * @code{.pseudo} - * TableA: {{0, 1, 2}} - * TableB: {{1, 2, 3}, {1, 2, 5}} - * left_on: {0} - * right_on: {1} + * TableA: {{0, 1, 2}} + * TableB: {{1, 2, 3}, {1, 2, 5}} + * left_on: {0} + * right_on: {1} * Result: { {1, 2} } * - * TableA {{0, 1, 2}, {1, 2, 5}} - * TableB {{1, 2, 3}} - * left_on: {0} - * right_on: {0} + * TableA {{0, 1, 2}, {1, 2, 5}} + * TableB {{1, 2, 3}} + * left_on: {0} + * right_on: {0} * Result: { {1, 2}, {2, 5} } * @endcode * @@ -386,8 +389,8 @@ std::unique_ptr left_semi_join( * for which there is no matching row in the right table. * * @code{.pseudo} - * TableA: {{0, 1, 2}} - * TableB: {{1, 2, 3}} + * TableA: {{0, 1, 2}} + * TableB: {{1, 2, 3}} * Result: {0} * @endcode * @@ -417,16 +420,16 @@ std::unique_ptr> left_anti_join( * returns rows that do not exist in the right table. * * @code{.pseudo} - * TableA: {{0, 1, 2}} - * TableB: {{1, 2, 3}, {1, 2, 5}} - * left_on: {0} - * right_on: {1} + * TableA: {{0, 1, 2}} + * TableB: {{1, 2, 3}, {1, 2, 5}} + * left_on: {0} + * right_on: {1} * Result: {{0}, {1}} * - * TableA: {{0, 1, 2}, {1, 2, 5}} - * TableB: {{1, 2, 3}} - * left_on: {0} - * right_on: {0} + * TableA: {{0, 1, 2}, {1, 2, 5}} + * TableB: {{1, 2, 3}} + * left_on: {0} + * right_on: {0} * Result: { {0} {1} } * @endcode * @@ -469,8 +472,8 @@ std::unique_ptr left_anti_join( * equal to `left.num_rows() * right.num_rows()`. Use with caution. * * @code{.pseudo} - * Left a: {0, 1, 2} - * Right b: {3, 4, 5} + * Left a: {0, 1, 2} + * Right b: {3, 4, 5} * Result: { a: {0, 0, 0, 1, 1, 1, 2, 2, 2}, b: {3, 4, 5, 3, 4, 5, 3, 4, 5} } * @endcode diff --git a/cpp/include/cudf/partitioning.hpp b/cpp/include/cudf/partitioning.hpp index ddde26ec762..6b1ad7db08b 100644 --- a/cpp/include/cudf/partitioning.hpp +++ b/cpp/include/cudf/partitioning.hpp @@ -83,6 +83,9 @@ std::pair, std::vector> partition( * @param input The table to partition * @param columns_to_hash Indices of input columns to hash * @param num_partitions The number of partitions to use + * @param hash_function Optional hash id that chooses the hash function to use + * @param seed Optional seed value to the hash function + * @param stream CUDA stream used for device memory operations and kernel launches * @param mr Device memory resource used to allocate the returned table's device memory. * * @returns An output table and a vector of row offsets to each partition @@ -92,6 +95,7 @@ std::pair, std::vector> hash_partition( std::vector const& columns_to_hash, int num_partitions, hash_id hash_function = hash_id::HASH_MURMUR3, + uint32_t seed = DEFAULT_HASH_SEED, rmm::cuda_stream_view stream = rmm::cuda_stream_default, rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource()); diff --git a/cpp/include/cudf/scalar/scalar.hpp b/cpp/include/cudf/scalar/scalar.hpp index ded833f4ca0..745f88572b4 100644 --- a/cpp/include/cudf/scalar/scalar.hpp +++ b/cpp/include/cudf/scalar/scalar.hpp @@ -151,7 +151,7 @@ class fixed_width_scalar : public scalar { /** * @brief Implicit conversion operator to get the value of the scalar on the host */ - explicit operator value_type() const { return this->value(0); } + explicit operator value_type() const { return this->value(rmm::cuda_stream_default); } /** * @brief Get the value of the scalar @@ -449,7 +449,7 @@ class string_scalar : public scalar { /** * @brief Implicit conversion operator to get the value of the scalar in a host std::string */ - explicit operator std::string() const { return this->to_string(0); } + explicit operator std::string() const { return this->to_string(rmm::cuda_stream_default); } /** * @brief Get the value of the scalar in a host std::string diff --git a/cpp/include/cudf/strings/detail/json.hpp b/cpp/include/cudf/strings/detail/json.hpp new file mode 100644 index 00000000000..e6a0b49f102 --- /dev/null +++ b/cpp/include/cudf/strings/detail/json.hpp @@ -0,0 +1,40 @@ +/* + * Copyright (c) 2021, NVIDIA CORPORATION. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#pragma once + +#include + +#include + +namespace cudf { +namespace strings { +namespace detail { + +/** + * @copydoc cudf::strings::get_json_object + * + * @param stream CUDA stream used for device memory operations and kernel launches + */ +std::unique_ptr get_json_object( + cudf::strings_column_view const& col, + cudf::string_scalar const& json_path, + rmm::cuda_stream_view stream = rmm::cuda_stream_default, + rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource()); + +} // namespace detail +} // namespace strings +} // namespace cudf diff --git a/cpp/include/cudf/strings/json.hpp b/cpp/include/cudf/strings/json.hpp new file mode 100644 index 00000000000..b39e4a2027c --- /dev/null +++ b/cpp/include/cudf/strings/json.hpp @@ -0,0 +1,50 @@ +/* + * Copyright (c) 2019-2021, NVIDIA CORPORATION. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +#pragma once + +#include + +namespace cudf { +namespace strings { + +/** + * @addtogroup strings_json + * @{ + * @file + */ + +/** + * @brief Apply a JSONPath string to all rows in an input strings column. + * + * Applies a JSONPath string to an incoming strings column where each row in the column + * is a valid json string. The output is returned by row as a strings column. + * + * https://tools.ietf.org/id/draft-goessner-dispatch-jsonpath-00.html + * Implements only the operators: $ . [] * + * + * @param col The input strings column. Each row must contain a valid json string + * @param json_path The JSONPath string to be applied to each row + * @param mr Resource for allocating device memory. + * @return New strings column containing the retrieved json object strings + */ +std::unique_ptr get_json_object( + cudf::strings_column_view const& col, + cudf::string_scalar const& json_path, + rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource()); + +/** @} */ // end of doxygen group +} // namespace strings +} // namespace cudf diff --git a/cpp/include/cudf/strings/strings_column_view.hpp b/cpp/include/cudf/strings/strings_column_view.hpp index 0c7270b3ba8..4d3c2dcdc56 100644 --- a/cpp/include/cudf/strings/strings_column_view.hpp +++ b/cpp/include/cudf/strings/strings_column_view.hpp @@ -19,7 +19,7 @@ #include #include -#include +#include /** * @file @@ -86,23 +86,6 @@ class strings_column_view : private column_view { //! Strings column APIs. namespace strings { -/** - * @brief Prints the strings to stdout. - * - * @param strings Strings instance for this operation. - * @param start Index of first string to print. - * @param end Index of last string to print. Specify -1 for all strings. - * @param max_width Maximum number of characters to print per string. - * Specify -1 to print all characters. - * @param delimiter The chars to print between each string. - * Default is new-line character. - */ -void print(strings_column_view const& strings, - size_type start = 0, - size_type end = -1, - size_type max_width = -1, - const char* delimiter = "\n"); - /** * @brief Create output per Arrow strings format. * @@ -110,10 +93,10 @@ void print(strings_column_view const& strings, * * @param strings Strings instance for this operation. * @param stream CUDA stream used for device memory operations and kernel launches. - * @param mr Device memory resource used to allocate the returned device_vectors. + * @param mr Device memory resource used to allocate the returned device vectors. * @return Pair containing a vector of chars and a vector of offsets. */ -std::pair, rmm::device_vector> create_offsets( +std::pair, rmm::device_uvector> create_offsets( strings_column_view const& strings, rmm::cuda_stream_view stream = rmm::cuda_stream_default, rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource()); diff --git a/cpp/include/cudf/table/row_operators.cuh b/cpp/include/cudf/table/row_operators.cuh index decd2879f54..61d714c5538 100644 --- a/cpp/include/cudf/table/row_operators.cuh +++ b/cpp/include/cudf/table/row_operators.cuh @@ -428,6 +428,7 @@ template