Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable hugepage for arrow host allocations #13914

Merged
merged 8 commits into from
Aug 24, 2023

Conversation

madsbk
Copy link
Member

@madsbk madsbk commented Aug 18, 2023

This PR enables Transparent Huge Pages (THP) for large (>4MB) arrow allocations (host memory only).

Performance results on a DGX-1 (dgx14)

8MB 80MB 800MB 8GB Method
cudf-native 0.006 0.049 0.485 4.787 df.to_arrow() (branch-23.10)
Dask-serialize 0.004 0.032 0.310 3.122 distributed.protocol.serialize(df)
cudf-hugepage 0.004 0.030 0.299 3.046 df.to_arrow() (this PR)
speedup 1.5 1.63 1.62 1.57 cudf-native vs. cudf-hugepage

Notice, Dask-serialize also use THP, which is why its performance is on par with cudf-hugepage.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

cc. @pentschev (I think you did something similar in NumPy)

@madsbk madsbk added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Aug 18, 2023
@github-actions github-actions bot added libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. labels Aug 18, 2023
@github-actions github-actions bot removed the Python Affects Python cuDF API. label Aug 18, 2023
@madsbk madsbk marked this pull request as ready for review August 18, 2023 11:45
@madsbk madsbk requested a review from a team as a code owner August 18, 2023 11:45
@madsbk madsbk requested review from harrism and vuule August 18, 2023 11:45
Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment.

@@ -30,7 +61,7 @@ std::unique_ptr<arrow::Buffer> allocate_arrow_buffer(int64_t const size, arrow::
*/
auto result = arrow::AllocateBuffer(size, ar_mr);
CUDF_EXPECTS(result.ok(), "Failed to allocate Arrow buffer");
return std::move(result).ValueOrDie();
return enable_hugepage(std::move(result).ValueOrDie());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for std::move here? I'm not sure you need that.

Generally return std::move(...) is an antipattern. When you are returning a local variable, usually the copy will be elided anyway. If there is a move ctor, it can be used automatically.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice, we use std::move on the intermediate result, not the value returned by ValueOrDie(). This moves the value (std::unique_ptr<arrow::Buffer>) out of result .
I have added explicit types to make this more clear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I was assuming you just copied the std::move from the code that was there before, which just had return std::move, which I think was unnecessary (and disables the compiler from eliding the move with RVO, IIUC).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler does not allow return std::move(retval) in libcudf because it disables RVO. But std::move(result).ValueOrDie() is slightly different, as it does not move the return value, just casts result to rvalue, presumably so that ValueOrDie can move the value out of result. The moved value is most likely then RVOd to the caller (not sure how this is actually called :)).

Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few questions/suggestions

cpp/src/interop/detail/arrow_allocator.cpp Outdated Show resolved Hide resolved
significantly, see <https://github.com/rapidsai/cudf/pull/13914>.
*/
template <typename T>
T enable_hugepage(T buf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we limit the type here, since we don't want to ever copy T when calling the function?

Suggested change
T enable_hugepage(T buf)
T enable_hugepage(T&& buf)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires the use of return std::move(buf) otherwise it fails:

/home/nfs/mkristensen/repos/cudf/cpp/src/interop/detail/arrow_allocator.cpp:36:12: error: use of deleted function 'std::unique_ptr<_Tp, _Dp>::unique_ptr(const std::unique_ptr<_Tp, _Dp>&) [with _Tp = arrow::Buffer; _Dp = std::default_delete<arrow::Buffer>]'
   36 |     return buf;
      |            ^~~
In file included from /datasets/mkristensen/miniconda3/envs/cudf-0818/x86_64-conda-linux-gnu/include/c++/11.4.0/memory:76,
                 from /datasets/mkristensen/miniconda3/envs/cudf-0818/include/arrow/array/array_base.h:22,
                 from /datasets/mkristensen/miniconda3/envs/cudf-0818/include/arrow/array.h:41,
                 from /datasets/mkristensen/miniconda3/envs/cudf-0818/include/arrow/api.h:22,
                 from /home/nfs/mkristensen/repos/cudf/cpp/include/cudf/detail/interop.hpp:26,
                 from /home/nfs/mkristensen/repos/cudf/cpp/src/interop/detail/arrow_allocator.cpp:17:
/datasets/mkristensen/miniconda3/envs/cudf-0818/x86_64-conda-linux-gnu/include/c++/11.4.0/bits/unique_ptr.h:468:7: note: declared here
  468 |       unique_ptr(const unique_ptr&) = delete;
      |       ^~~~~~~~~~
/home/nfs/mkristensen/repos/cudf/cpp/src/interop/detail/arrow_allocator.cpp:42:33: error: use of deleted function 'std::unique_ptr<_Tp, _Dp>::unique_ptr(const std::unique_ptr<_Tp, _Dp>&) [with _Tp = arrow::Buffer; _Dp = std::default_delete<arrow::Buffer>]'
   42 |   if (addr == nullptr) { return buf; }
      |                                 ^~~
In file included from /datasets/mkristensen/miniconda3/envs/cudf-0818/x86_64-conda-linux-gnu/include/c++/11.4.0/memory:76,
                 from /datasets/mkristensen/miniconda3/envs/cudf-0818/include/arrow/array/array_base.h:22,
                 from /datasets/mkristensen/miniconda3/envs/cudf-0818/include/arrow/array.h:41,
                 from /datasets/mkristensen/miniconda3/envs/cudf-0818/include/arrow/api.h:22,
                 from /home/nfs/mkristensen/repos/cudf/cpp/include/cudf/detail/interop.hpp:26,
                 from /home/nfs/mkristensen/repos/cudf/cpp/src/interop/detail/arrow_allocator.cpp:17:
/datasets/mkristensen/miniconda3/envs/cudf-0818/x86_64-conda-linux-gnu/include/c++/11.4.0/bits/unique_ptr.h:468:7: note: declared here
  468 |       unique_ptr(const unique_ptr&) = delete;
      |       ^~~~~~~~~~
/home/nfs/mkristensen/repos/cudf/cpp/src/interop/detail/arrow_allocator.cpp:50:10: error: use of deleted function 'std::unique_ptr<_Tp, _Dp>::unique_ptr(const std::unique_ptr<_Tp, _Dp>&) [with _Tp = arrow::Buffer; _Dp = std::default_delete<arrow::Buffer>]'
   50 |   return buf;
      |          ^~~
In file included from /datasets/mkristensen/miniconda3/envs/cudf-0818/x86_64-conda-linux-gnu/include/c++/11.4.0/memory:76,
                 from /datasets/mkristensen/miniconda3/envs/cudf-0818/include/arrow/array/array_base.h:22,
                 from /datasets/mkristensen/miniconda3/envs/cudf-0818/include/arrow/array.h:41,
                 from /datasets/mkristensen/miniconda3/envs/cudf-0818/include/arrow/api.h:22,
                 from /home/nfs/mkristensen/repos/cudf/cpp/include/cudf/detail/interop.hpp:26,
                 from /home/nfs/mkristensen/repos/cudf/cpp/src/interop/detail/arrow_allocator.cpp:17:
/datasets/mkristensen/miniconda3/envs/cudf-0818/x86_64-conda-linux-gnu/include/c++/11.4.0/bits/unique_ptr.h:468:7: note: declared here
  468 |       unique_ptr(const unique_ptr&) = delete;
      |       ^~~~~~~~~~

cpp/src/interop/detail/arrow_allocator.cpp Show resolved Hide resolved
zhuoxunyi referenced this pull request Aug 23, 2023
Fixes: #13864 

This PR fixes an issue with `loc` indexer where some special handling needs to be done when `columns` is of type `MultiIndex`.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)

URL: #13929
@vuule
Copy link
Contributor

vuule commented Aug 23, 2023

Looks like RVO is not happy with the recent parameter type change to T&&?

significantly, see <https://github.com/rapidsai/cudf/pull/13914>.
*/
template <typename T>
T enable_hugepage(T&& buf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we have enable_hugepage just take a host_span? Do we really need to forward std::unique_ptr<arrow::Buffer> through the function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to but it means we can avoid an in-place function and have a simply return wrapper return enable_hugepage(...).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't see the benefit, and I'm fine with that.

@madsbk
Copy link
Member Author

madsbk commented Aug 24, 2023

/merge

@rapids-bot rapids-bot bot merged commit f70f2cd into rapidsai:branch-23.10 Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants