Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable hugepage for arrow host allocations #13914

Merged
merged 8 commits into from
Aug 24, 2023
Merged
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 35 additions & 4 deletions cpp/src/interop/detail/arrow_allocator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,40 @@

#include <cudf/detail/interop.hpp>

#include <memory>
#include <sys/mman.h>
#include <unistd.h>

namespace cudf {
namespace detail {

/*
Enable Transparent Huge Pages (THP) for large (>4MB) allocations.
`buf` is returned untouched.
Enabling THP can improve performance of device-host memory transfers
significantly, see <https://github.com/rapidsai/cudf/pull/13914>.
*/
template <typename T>
T enable_hugepage(T&& buf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we have enable_hugepage just take a host_span? Do we really need to forward std::unique_ptr<arrow::Buffer> through the function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to but it means we can avoid an in-place function and have a simply return wrapper return enable_hugepage(...).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't see the benefit, and I'm fine with that.

{
if (buf->size() < (1u << 22u)) { // Smaller than 4 MB
return buf;
}

#ifdef MADV_HUGEPAGE
const auto pagesize = sysconf(_SC_PAGESIZE);
void* addr = const_cast<uint8_t*>(buf->data());
if (addr == nullptr) { return buf; }
auto length{static_cast<std::size_t>(buf->size())};
if (std::align(pagesize, pagesize, addr, length)) {
// Intentionally not checking for errors that may be returned by older kernel versions;
// optimistically tries enabling huge pages.
madvise(addr, length, MADV_HUGEPAGE);
}
#endif
return buf;
}

std::unique_ptr<arrow::Buffer> allocate_arrow_buffer(int64_t const size, arrow::MemoryPool* ar_mr)
{
/*
Expand All @@ -28,9 +59,9 @@ std::unique_ptr<arrow::Buffer> allocate_arrow_buffer(int64_t const size, arrow::
To work around this issue we compile an allocation shim in C++ and use
that from our cuda sources
*/
auto result = arrow::AllocateBuffer(size, ar_mr);
arrow::Result<std::unique_ptr<arrow::Buffer>> result = arrow::AllocateBuffer(size, ar_mr);
CUDF_EXPECTS(result.ok(), "Failed to allocate Arrow buffer");
return std::move(result).ValueOrDie();
return enable_hugepage(std::move(result).ValueOrDie());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for std::move here? I'm not sure you need that.

Generally return std::move(...) is an antipattern. When you are returning a local variable, usually the copy will be elided anyway. If there is a move ctor, it can be used automatically.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice, we use std::move on the intermediate result, not the value returned by ValueOrDie(). This moves the value (std::unique_ptr<arrow::Buffer>) out of result .
I have added explicit types to make this more clear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I was assuming you just copied the std::move from the code that was there before, which just had return std::move, which I think was unnecessary (and disables the compiler from eliding the move with RVO, IIUC).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler does not allow return std::move(retval) in libcudf because it disables RVO. But std::move(result).ValueOrDie() is slightly different, as it does not move the return value, just casts result to rvalue, presumably so that ValueOrDie can move the value out of result. The moved value is most likely then RVOd to the caller (not sure how this is actually called :)).

}

std::shared_ptr<arrow::Buffer> allocate_arrow_bitmap(int64_t const size, arrow::MemoryPool* ar_mr)
Expand All @@ -42,9 +73,9 @@ std::shared_ptr<arrow::Buffer> allocate_arrow_bitmap(int64_t const size, arrow::
To work around this issue we compile an allocation shim in C++ and use
that from our cuda sources
*/
auto result = arrow::AllocateBitmap(size, ar_mr);
arrow::Result<std::shared_ptr<arrow::Buffer>> result = arrow::AllocateBitmap(size, ar_mr);
vuule marked this conversation as resolved.
Show resolved Hide resolved
CUDF_EXPECTS(result.ok(), "Failed to allocate Arrow bitmap");
return std::move(result).ValueOrDie();
return enable_hugepage(std::move(result).ValueOrDie());
}

} // namespace detail
Expand Down