-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable hugepage for arrow host allocations #13914
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment.
@@ -30,7 +61,7 @@ std::unique_ptr<arrow::Buffer> allocate_arrow_buffer(int64_t const size, arrow:: | |||
*/ | |||
auto result = arrow::AllocateBuffer(size, ar_mr); | |||
CUDF_EXPECTS(result.ok(), "Failed to allocate Arrow buffer"); | |||
return std::move(result).ValueOrDie(); | |||
return enable_hugepage(std::move(result).ValueOrDie()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for std::move
here? I'm not sure you need that.
Generally return std::move(...)
is an antipattern. When you are returning a local variable, usually the copy will be elided anyway. If there is a move ctor, it can be used automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notice, we use std::move
on the intermediate result
, not the value returned by ValueOrDie()
. This moves the value (std::unique_ptr<arrow::Buffer>
) out of result
.
I have added explicit types to make this more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I was assuming you just copied the std::move from the code that was there before, which just had return std::move
, which I think was unnecessary (and disables the compiler from eliding the move with RVO, IIUC).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compiler does not allow return std::move(retval)
in libcudf because it disables RVO. But std::move(result).ValueOrDie()
is slightly different, as it does not move the return value, just casts result
to rvalue, presumably so that ValueOrDie
can move the value out of result
. The moved value is most likely then RVOd to the caller (not sure how this is actually called :)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few questions/suggestions
significantly, see <https://github.com/rapidsai/cudf/pull/13914>. | ||
*/ | ||
template <typename T> | ||
T enable_hugepage(T buf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we limit the type here, since we don't want to ever copy T when calling the function?
T enable_hugepage(T buf) | |
T enable_hugepage(T&& buf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires the use of return std::move(buf)
otherwise it fails:
/home/nfs/mkristensen/repos/cudf/cpp/src/interop/detail/arrow_allocator.cpp:36:12: error: use of deleted function 'std::unique_ptr<_Tp, _Dp>::unique_ptr(const std::unique_ptr<_Tp, _Dp>&) [with _Tp = arrow::Buffer; _Dp = std::default_delete<arrow::Buffer>]'
36 | return buf;
| ^~~
In file included from /datasets/mkristensen/miniconda3/envs/cudf-0818/x86_64-conda-linux-gnu/include/c++/11.4.0/memory:76,
from /datasets/mkristensen/miniconda3/envs/cudf-0818/include/arrow/array/array_base.h:22,
from /datasets/mkristensen/miniconda3/envs/cudf-0818/include/arrow/array.h:41,
from /datasets/mkristensen/miniconda3/envs/cudf-0818/include/arrow/api.h:22,
from /home/nfs/mkristensen/repos/cudf/cpp/include/cudf/detail/interop.hpp:26,
from /home/nfs/mkristensen/repos/cudf/cpp/src/interop/detail/arrow_allocator.cpp:17:
/datasets/mkristensen/miniconda3/envs/cudf-0818/x86_64-conda-linux-gnu/include/c++/11.4.0/bits/unique_ptr.h:468:7: note: declared here
468 | unique_ptr(const unique_ptr&) = delete;
| ^~~~~~~~~~
/home/nfs/mkristensen/repos/cudf/cpp/src/interop/detail/arrow_allocator.cpp:42:33: error: use of deleted function 'std::unique_ptr<_Tp, _Dp>::unique_ptr(const std::unique_ptr<_Tp, _Dp>&) [with _Tp = arrow::Buffer; _Dp = std::default_delete<arrow::Buffer>]'
42 | if (addr == nullptr) { return buf; }
| ^~~
In file included from /datasets/mkristensen/miniconda3/envs/cudf-0818/x86_64-conda-linux-gnu/include/c++/11.4.0/memory:76,
from /datasets/mkristensen/miniconda3/envs/cudf-0818/include/arrow/array/array_base.h:22,
from /datasets/mkristensen/miniconda3/envs/cudf-0818/include/arrow/array.h:41,
from /datasets/mkristensen/miniconda3/envs/cudf-0818/include/arrow/api.h:22,
from /home/nfs/mkristensen/repos/cudf/cpp/include/cudf/detail/interop.hpp:26,
from /home/nfs/mkristensen/repos/cudf/cpp/src/interop/detail/arrow_allocator.cpp:17:
/datasets/mkristensen/miniconda3/envs/cudf-0818/x86_64-conda-linux-gnu/include/c++/11.4.0/bits/unique_ptr.h:468:7: note: declared here
468 | unique_ptr(const unique_ptr&) = delete;
| ^~~~~~~~~~
/home/nfs/mkristensen/repos/cudf/cpp/src/interop/detail/arrow_allocator.cpp:50:10: error: use of deleted function 'std::unique_ptr<_Tp, _Dp>::unique_ptr(const std::unique_ptr<_Tp, _Dp>&) [with _Tp = arrow::Buffer; _Dp = std::default_delete<arrow::Buffer>]'
50 | return buf;
| ^~~
In file included from /datasets/mkristensen/miniconda3/envs/cudf-0818/x86_64-conda-linux-gnu/include/c++/11.4.0/memory:76,
from /datasets/mkristensen/miniconda3/envs/cudf-0818/include/arrow/array/array_base.h:22,
from /datasets/mkristensen/miniconda3/envs/cudf-0818/include/arrow/array.h:41,
from /datasets/mkristensen/miniconda3/envs/cudf-0818/include/arrow/api.h:22,
from /home/nfs/mkristensen/repos/cudf/cpp/include/cudf/detail/interop.hpp:26,
from /home/nfs/mkristensen/repos/cudf/cpp/src/interop/detail/arrow_allocator.cpp:17:
/datasets/mkristensen/miniconda3/envs/cudf-0818/x86_64-conda-linux-gnu/include/c++/11.4.0/bits/unique_ptr.h:468:7: note: declared here
468 | unique_ptr(const unique_ptr&) = delete;
| ^~~~~~~~~~
Fixes: #13864 This PR fixes an issue with `loc` indexer where some special handling needs to be done when `columns` is of type `MultiIndex`. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #13929
Co-authored-by: Vukasin Milovanovic <[email protected]>
Looks like RVO is not happy with the recent parameter type change to T&&? |
significantly, see <https://github.com/rapidsai/cudf/pull/13914>. | ||
*/ | ||
template <typename T> | ||
T enable_hugepage(T&& buf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we have enable_hugepage just take a host_span? Do we really need to forward std::unique_ptr<arrow::Buffer>
through the function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to but it means we can avoid an in-place function and have a simply return wrapper return enable_hugepage(...)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't see the benefit, and I'm fine with that.
/merge |
This PR enables Transparent Huge Pages (THP) for large (>4MB) arrow allocations (host memory only).
Performance results on a DGX-1 (
dgx14
)df.to_arrow()
(branch-23.10)distributed.protocol.serialize(df)
df.to_arrow()
(this PR)Notice, Dask-serialize also use THP, which is why its performance is on par with cudf-hugepage.
Checklist
cc. @pentschev (I think you did something similar in NumPy)