-
Notifications
You must be signed in to change notification settings - Fork 917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch memcpy the last offsets for output buffers of str and list cols in PQ reader #16905
Merged
rapids-bot
merged 29 commits into
rapidsai:branch-24.12
from
mhaseeb123:fea-batch-memcpy-list-str-output-buff-offsets
Oct 3, 2024
Merged
Changes from 21 commits
Commits
Show all changes
29 commits
Select commit
Hold shift + click to select a range
74ee6ae
Add capability to batch memcpy the last offsets to str and list out_bufs
mhaseeb123 cab885d
Move `WriteFinalOffsetsBatched` out of the for loop
mhaseeb123 b15e3d3
Generalize the API and ORC changes by @vuule
mhaseeb123 50dcd71
Use make_zeroed_device_uvector_async instead
mhaseeb123 bd44ca0
Merge branch 'branch-24.12' into fea-batch-memcpy-list-str-output-buf…
mhaseeb123 800b271
Add gtest for batched_memcpy
mhaseeb123 31a755b
Update cpp/include/cudf/io/detail/batched_memcpy.hpp
mhaseeb123 b29329b
Update cpp/include/cudf/io/detail/batched_memcpy.hpp
mhaseeb123 4efb989
Comments update
mhaseeb123 cc2829f
Address reviewer comments
mhaseeb123 78d68a8
Style fix
mhaseeb123 d42da45
Remove the unnecessary iterator
mhaseeb123 8d5640d
Move batched_memxxx to include/detail/utilities
mhaseeb123 9e063af
Minor changes from reviews
mhaseeb123 cf98118
Merge branch 'branch-24.12' into fea-batch-memcpy-list-str-output-buf…
mhaseeb123 2372fbb
Minor updates
mhaseeb123 6100c94
Merge branch 'fea-batch-memcpy-list-str-output-buff-offsets' of https…
mhaseeb123 4ea0930
Minor comment update
mhaseeb123 3eea6e2
Minor comment update
mhaseeb123 6d078c2
Style fix and add to CI.
mhaseeb123 1cc4e1f
Revert erroneous commit
mhaseeb123 042cfc0
Update cpp/include/cudf/detail/utilities/batched_memcpy.hpp
mhaseeb123 eee6f6d
Apply suggestions from review
mhaseeb123 828e0ac
Minor updates from review
mhaseeb123 ecc4252
Minor
mhaseeb123 4bd83db
Merge branch 'branch-24.12' into fea-batch-memcpy-list-str-output-buf…
mhaseeb123 871854b
Update cpp/src/io/parquet/page_data.cu
mhaseeb123 3e30777
Comments update.
mhaseeb123 16540a1
Merge branch 'branch-24.12' into fea-batch-memcpy-list-str-output-buf…
mhaseeb123 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
/* | ||
* Copyright (c) 2024, NVIDIA CORPORATION. | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
#include <cudf/detail/iterator.cuh> | ||
#include <cudf/utilities/memory_resource.hpp> | ||
|
||
#include <rmm/cuda_stream_view.hpp> | ||
#include <rmm/device_buffer.hpp> | ||
|
||
#include <cub/device/device_memcpy.cuh> | ||
#include <cuda/functional> | ||
#include <thrust/iterator/constant_iterator.h> | ||
|
||
namespace CUDF_EXPORT cudf { | ||
namespace detail { | ||
|
||
/** | ||
* @brief A helper function that copies a vector of vectors from source to destination addresses in | ||
* a batched manner. | ||
* | ||
* @tparam SrcIterator The type of the source address iterator | ||
* @tparam DstIterator The type of the destination address iterator | ||
* @tparam SizeIterator The type of the buffer size iterator | ||
mhaseeb123 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* | ||
* @param[in] src_iter Iterator to source addresses | ||
* @param[in] dst_iter Iterator to destination addresses | ||
* @param[in] size_iter Iterator to the vector sizes (in bytes) | ||
* @param[in] num_buffs Number of buffers to be copied | ||
* @param[in] stream CUDA stream to use | ||
mhaseeb123 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
*/ | ||
template <typename SrcIterator, typename DstIterator, typename SizeIterator> | ||
void batched_memcpy_async(SrcIterator src_iter, | ||
DstIterator dst_iter, | ||
SizeIterator size_iter, | ||
size_t num_buffs, | ||
rmm::cuda_stream_view stream) | ||
{ | ||
// Get temp storage needed for cub::DeviceMemcpy::Batched | ||
size_t temp_storage_bytes = 0; | ||
cub::DeviceMemcpy::Batched( | ||
nullptr, temp_storage_bytes, src_iter, dst_iter, size_iter, num_buffs, stream.value()); | ||
|
||
// Allocate temporary storage | ||
rmm::device_buffer d_temp_storage{temp_storage_bytes, stream.value()}; | ||
|
||
// Perform copies | ||
mhaseeb123 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
cub::DeviceMemcpy::Batched(d_temp_storage.data(), | ||
temp_storage_bytes, | ||
src_iter, | ||
dst_iter, | ||
size_iter, | ||
num_buffs, | ||
stream.value()); | ||
} | ||
|
||
} // namespace detail | ||
} // namespace CUDF_EXPORT cudf |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is simply reimpl of one of the benchmarks in
parquet_reader_input.cpp
as is so removing altogether.