-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-33317: [C++] Utility method to ensure an array object meetings an alignment requirement #14758
Conversation
6633ac8
to
1e68873
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good start but we need to be more thorough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In most cases the alignment will already be correct. We want this method to be really fast in that scenario. So this means that methods should look like...
std::shared_ptr<Array> Foo(std::shared_ptr<Array> thing) {
if (needs_alignment(...)) {
// a copy needs to be made
} else {
return std::move(thing);
}
}
This way we can call it like so...
std::shared_ptr<Array> aligned = EnsureAlignment(std::move(unaligned), ...);
Then, if unaligned
is already aligned it will be a straight move into aligned
without ever making any copies (not even copies of shared_ptr
)
0d5f265
to
aa41e7c
Compare
The current implementation might be a bit complicated, so comments and suggestions for optimized approaches will be very helpful. Thanks in advance! Currently, we have |
aa41e7c
to
cb05f62
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach looks ok. I have a few suggestions.
cpp/src/arrow/util/align_util.cc
Outdated
|
||
bool CheckAlignment(const ChunkedArray& array, const int64_t& alignment, | ||
std::vector<bool>& needs_alignment, const int& offset) { | ||
needs_alignment.resize(needs_alignment.size() + array.num_chunks(), false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why needs_alignment.size() + array.num_chunks()
and not offset + array.num_chunks()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The resize is basically to add space for the extra elements that should be added for the check.
For example, if we have a table made up of 2 chunked arrays, which then are made of 2 arrays each. Then, initially needs_alignment
will be of size 2 (because of 2 chunked arrays). When the CheckAlignment()
is called for the first chunked array, needs_alignment
will be resized to 4 (/*needs_alignment.size()/* 2 + /*array.num_chunks()*/ 2)
, thus now in needs_alignment, first two bits indicate status of the 2 arrays of the first chunk array, the third one for the ChunkArray as a whole, and the last one for the other chunk array which yet to be checked. The similar operation gets repeated for the other chunked array.
Instead if we use offset, then for the first iteration here, offset will be 0. So, the resize will enforce its size to 2, which will not be enough for storing the alignment check bits of the first two arrays of the first chunk array, and the second chunk array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, can you make two small changes in the test? They aren't strictly needed but they will help prevent future surprises for maintainers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this utility!
|
Benchmark runs are scheduled for baseline = f4680cd and contender = 1b439b0. 1b439b0 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
@sanjibansg It seems that this broke our nightly CI. Could you check this?
|
@kou cc: @westonpace |
…gs an alignment requirement (apache#14758) This PR adds a utility function which is responsible for ensuring that all the buffers of an arrow object are properly aligned. It checks all the buffers in an arrow object for alignment, and if not aligned properly, then allocates a buffer by specifying the required alignment and copies data from the previous buffer. * Closes: apache#33317 Authored-by: Sanjiban Sengupta <[email protected]> Signed-off-by: Weston Pace <[email protected]>
Thanks! |
FYI: We can test this case by commenting |
@github-actions crossbow submit amazon-linux-2-amd64 |
Revision: 221426d Submitted crossbow builds: ursacomputing/crossbow @ actions-a124ab2af3
|
@kou It is passing now! #34754 cc: @westonpace |
This PR should fix the nightly builds CI error which occurred after merging #14758. In the EnsureAlignment utility for a Buffer, the modified buffer should be returned by `std::move`. * Closes: #34753 Authored-by: Sanjiban Sengupta <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
…gs an alignment requirement (apache#14758) This PR adds a utility function which is responsible for ensuring that all the buffers of an arrow object are properly aligned. It checks all the buffers in an arrow object for alignment, and if not aligned properly, then allocates a buffer by specifying the required alignment and copies data from the previous buffer. * Closes: apache#33317 Authored-by: Sanjiban Sengupta <[email protected]> Signed-off-by: Weston Pace <[email protected]>
…ache#34754) This PR should fix the nightly builds CI error which occurred after merging apache#14758. In the EnsureAlignment utility for a Buffer, the modified buffer should be returned by `std::move`. * Closes: apache#34753 Authored-by: Sanjiban Sengupta <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
This PR adds a utility function which is responsible for ensuring that all the buffers of an arrow object are properly aligned. It checks all the buffers in an arrow object for alignment, and if not aligned properly, then allocates a buffer by specifying the required alignment and copies data from the previous buffer.