Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the hashing detail default allocator #14827

Closed

Conversation

PointKernel
Copy link
Member

@PointKernel PointKernel commented Jan 22, 2024

Description

Related to #11176

This PR removes the custom default_allocator class located in the global namespace since it's the same as rmm::mr::polymorphic_allocator. It also moves all free functions in hashing/detail/helper_functions.cuh into cudf::hashing::detail namespace.

There are no actual code changes but using cudf::hashing::detail::default_allocator as an alias of rmm::mr::polymorphic_allocator<char> to replace the previous default_allocator class.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@PointKernel PointKernel added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. tech debt improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 22, 2024
@PointKernel PointKernel self-assigned this Jan 22, 2024
@PointKernel PointKernel added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Jan 23, 2024
@PointKernel PointKernel marked this pull request as ready for review January 23, 2024 17:04
@PointKernel PointKernel requested a review from a team as a code owner January 23, 2024 17:04
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're here can we look into #11176? We don't have to address it in this PR, but the changes are related since this is all around the refactoring effort to use the new cuco data structures and removing the default_allocator is a step towards #11176. CC @ttnghia

The new namespace is a bit verbose to type everywhere. Not sure there's a great solution for that though, it does seem like the right place for the functionality.

@PointKernel
Copy link
Member Author

While we're here can we look into #11176? We don't have to address it in this PR, but the changes are related since this is all around the refactoring effort to use the new cuco data structures and removing the default_allocator is a step towards #11176. CC @ttnghia

Thanks for bringing this up.

I have that particular PR in my mind when working on this and feel cuco_helpers.cuh is a too-specific name while hashing/detail/helper_functions.cuh can provide more generic descriptions of what's contained in the file. Several things I'm planning to put in this file: size type sentinel, pair type alias, hash adaptor, and desired occupancy (in the file already but seems not widely used yet), etc. Happy to hear your and other reviewers' thoughts on this.

@ttnghia
Copy link
Contributor

ttnghia commented Jan 24, 2024

I have that particular PR in my mind when working on this and feel cuco_helpers.cuh is a too-specific name while hashing/detail/helper_functions.cuh can provide more generic descriptions of what's contained in the file.

I see that using hashing::detail:: is fine, as long as it contains the hashing-related functions. If we call cuco_helpers then what is the namespace? cudf::cuco_helpers::?

@ttnghia
Copy link
Contributor

ttnghia commented Jan 24, 2024

In addition, calling helper_functions.cuh is vague if we mostly put in the cuco stuff there. So maybe cuco_adapter.cuh, but still use namespace cudf::hashing::detail in it?

And I think we should add CUCO_ prefix to everything use exclusively for cuco objects. For example, instead of DEFAULT_HASH_TABLE_OCCUPANCY, call it CUCO_DEFAULT_HASH_TABLE_OCCUPANCY, and cuco_default_allocator instead of default_allocator. This can explicitly tell what they are, and prevent accidentally using it somewhere else.

mroeschke and others added 7 commits January 24, 2024 17:54
…sai#14758)

For methods that essentially do

```python
def select_by_foo(self, ...):
    ...
    return self.__class__(data={subset of self._data})
```

The `return` would perform validation on the returned subset of column, but I think that's unnecessary since that was done during initialization

Additionally
* Removed `_create_unsafe` in favor of a `verify=True|False` keyword in the constructor
* `_column_length` == `nrows` so removed `_column_length`
* Renamed `_compare_keys` to `_keys_equal`
* Remove seldom used/unnecessary methods

Authors:
  - Matthew Roeschke (https://github.com/mroeschke)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: rapidsai#14758
…et (rapidsai#14833)

This PR uses rapids-cmake to handle per-target CMake linking to cudart.

Replaces rapidsai#13543 and rapidsai#11641.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Mark Harris (https://github.com/harrism)
  - Robert Maynard (https://github.com/robertmaynard)

URL: rapidsai#14833
)

Follow-up PR to rapidsai#14771.

I noticed the strings example code still had a deprecated function call:

```
-- Build files have been written to: /opt/conda/conda-bld/work/cpp/examples/strings/build
[1/8] Building CXX object CMakeFiles/libcudf_apis.dir/libcudf_apis.cpp.o
[2/8] Linking CXX executable libcudf_apis
[3/8] Building CUDA object CMakeFiles/custom_prealloc.dir/custom_prealloc.cu.o
[4/8] Building CUDA object CMakeFiles/custom_with_malloc.dir/custom_with_malloc.cu.o
[5/8] Linking CUDA executable custom_prealloc
[6/8] Linking CUDA executable custom_with_malloc
[7/8] Building CUDA object CMakeFiles/custom_optimized.dir/custom_optimized.cu.o
/opt/conda/conda-bld/work/cpp/examples/strings/custom_optimized.cu: In function 'std::unique_ptr<cudf::column> redact_strings(const cudf::column_view&, const cudf::column_view&)':
/opt/conda/conda-bld/work/cpp/examples/strings/custom_optimized.cu:158:40: warning: 'std::unique_ptr<cudf::column> cudf::make_strings_column(cudf::size_type, rmm::device_uvector<int>&&, rmm::device_uvector<char>&&, rmm::device_buffer&&, cudf::size_type)' is deprecated [-Wdeprecated-declarations]
  158 |   auto result =
      |               ~                        ^                                                          
/opt/conda/conda-bld/work/cpp/include/cudf/column/column_factories.hpp:510:42: note: declared here
  510 | [[deprecated]] std::unique_ptr<column> make_strings_column(size_type num_strings,
      |                                          ^~~~~~~~~~~~~~~~~~~
[8/8] Linking CUDA executable custom_optimized
```

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - David Wendt (https://github.com/davidwendt)
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Mark Harris (https://github.com/harrism)

URL: rapidsai#14838
…14832)

Part of rapidsai/rmm#1388. This removes now-optional and soon-to-be deprecated functions from cuDF's custom device_memory_resource implementations:
 * `supports_get_mem_info()`
 * `do_get_mem_info()`

Authors:
  - Mark Harris (https://github.com/harrism)

Approvers:
  - Jason Lowe (https://github.com/jlowe)
  - Nghia Truong (https://github.com/ttnghia)
  - Vukasin Milovanovic (https://github.com/vuule)

URL: rapidsai#14832
The goal of this PR is to address [10004](rapidsai#10004) by supporting parsing of JSON files containing single quotes for field/value strings. This is a follow-up work to the POC [PR 14545](rapidsai#14545)

Authors:
  - Shruti Shivakumar (https://github.com/shrshi)

Approvers:
  - Andy Grove (https://github.com/andygrove)
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Vukasin Milovanovic (https://github.com/vuule)
  - Elias Stehle (https://github.com/elstehle)
  - Robert (Bobby) Evans (https://github.com/revans2)

URL: rapidsai#14729
Fixes some merge issues with outdated versions from rapidsai#14768. I also made a minor tweak to `update-version.sh` that double-quotes some outputs to make pre-commit happier.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Jake Awe (https://github.com/AyodeAwe)

URL: rapidsai#14854
vuule and others added 2 commits January 24, 2024 17:54
…apidsai#14785)

Adds the APIs that control the stripe/row group size when using the chunked writer. This functions are already present in to_orc (non-chunked version of the same API).

Adding this options to facilitate smaller unit tests.

Authors:
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#14785
@PointKernel PointKernel requested review from a team as code owners January 25, 2024 01:54
@PointKernel PointKernel requested a review from mroeschke January 25, 2024 01:54
@github-actions github-actions bot added Python Affects Python cuDF API. CMake CMake build issue Java Affects Java cuDF API. ci labels Jan 25, 2024
@PointKernel
Copy link
Member Author

Oh sorry for messing up the git log. Will open another PR to address this work.

@PointKernel PointKernel deleted the remove-default-allocator branch January 25, 2024 01:57
@harrism
Copy link
Member

harrism commented Jan 25, 2024

I look forward to it. This is the only place that libcudf does "raw" allocation using mr->allocate / deallocate. Removing it helps with our move to async_resource_ref.

@PointKernel PointKernel mentioned this pull request Jan 25, 2024
3 tasks
rapids-bot bot pushed a commit that referenced this pull request Jan 30, 2024
Surpass #14827

Related to #11176

This PR adds a new `cudf::detail::cuco_allocator` to deprecate and replace the old `default_allocator` in the global namespace. Following the comments in #14827 (review), the new `cudf::detail::cuco_allocator` class is moved to `detail/cuco_helpers.cuh`. Free functions in `hashing/detail/helper_functions.cuh` are left in the global namespace without changes due to the verbose nested namespace expression.

Authors:
  - Yunsong Wang (https://github.com/PointKernel)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #14877
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CMake CMake build issue improvement Improvement / enhancement to an existing function Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants