Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-33317: [C++] Utility method to ensure an array object meetings an alignment requirement #14758

Merged
merged 19 commits into from
Mar 24, 2023

Conversation

sanjibansg
Copy link
Contributor

@sanjibansg sanjibansg commented Nov 29, 2022

This PR adds a utility function which is responsible for ensuring that all the buffers of an arrow object are properly aligned. It checks all the buffers in an arrow object for alignment, and if not aligned properly, then allocates a buffer by specifying the required alignment and copies data from the previous buffer.

@github-actions
Copy link

@sanjibansg sanjibansg force-pushed the ensure-alignment branch 2 times, most recently from 6633ac8 to 1e68873 Compare December 30, 2022 09:27
@westonpace westonpace self-requested a review January 3, 2023 05:08
Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good start but we need to be more thorough.

Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In most cases the alignment will already be correct. We want this method to be really fast in that scenario. So this means that methods should look like...

std::shared_ptr<Array> Foo(std::shared_ptr<Array> thing) {
  if (needs_alignment(...)) {
    // a copy needs to be made
  } else {
    return std::move(thing);
  }
}

This way we can call it like so...

std::shared_ptr<Array> aligned = EnsureAlignment(std::move(unaligned), ...);

Then, if unaligned is already aligned it will be a straight move into aligned without ever making any copies (not even copies of shared_ptr)

@sanjibansg
Copy link
Contributor Author

In most cases the alignment will already be correct. We want this method to be really fast in that scenario. So this means that methods should look like...

std::shared_ptr<Array> Foo(std::shared_ptr<Array> thing) {
  if (needs_alignment(...)) {
    // a copy needs to be made
  } else {
    return std::move(thing);
  }
}

This way we can call it like so...

std::shared_ptr<Array> aligned = EnsureAlignment(std::move(unaligned), ...);

Then, if unaligned is already aligned it will be a straight move into aligned without ever making any copies (not even copies of shared_ptr)

The current implementation might be a bit complicated, so comments and suggestions for optimized approaches will be very helpful. Thanks in advance!

Currently, we have Check functions for various Arrow objects, which check their constituents for alignments. For ChunkedArray, RecordBatch, and Table, we use a boolean vector to track the constituents which require alignment thus avoiding unnecessary function calls.

@sanjibansg sanjibansg requested a review from westonpace March 17, 2023 10:42
Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach looks ok. I have a few suggestions.


bool CheckAlignment(const ChunkedArray& array, const int64_t& alignment,
std::vector<bool>& needs_alignment, const int& offset) {
needs_alignment.resize(needs_alignment.size() + array.num_chunks(), false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why needs_alignment.size() + array.num_chunks() and not offset + array.num_chunks()?

Copy link
Contributor Author

@sanjibansg sanjibansg Mar 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resize is basically to add space for the extra elements that should be added for the check.
For example, if we have a table made up of 2 chunked arrays, which then are made of 2 arrays each. Then, initially needs_alignment will be of size 2 (because of 2 chunked arrays). When the CheckAlignment() is called for the first chunked array, needs_alignment will be resized to 4 (/*needs_alignment.size()/* 2 + /*array.num_chunks()*/ 2), thus now in needs_alignment, first two bits indicate status of the 2 arrays of the first chunk array, the third one for the ChunkArray as a whole, and the last one for the other chunk array which yet to be checked. The similar operation gets repeated for the other chunked array.

Instead if we use offset, then for the first iteration here, offset will be 0. So, the resize will enforce its size to 2, which will not be enough for storing the alignment check bits of the first two arrays of the first chunk array, and the second chunk array.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Mar 21, 2023
Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, can you make two small changes in the test? They aren't strictly needed but they will help prevent future surprises for maintainers.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Mar 24, 2023
@sanjibansg sanjibansg requested a review from westonpace March 24, 2023 16:58
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Mar 24, 2023
Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this utility!

@westonpace westonpace changed the title ARROW-18119: [C++] Utility method to ensure an array object meetings an alignment requirement GH-33317: [C++] Utility method to ensure an array object meetings an alignment requirement Mar 24, 2023
@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Mar 24, 2023
@github-actions
Copy link

@github-actions
Copy link

⚠️ GitHub issue #33317 has been automatically assigned in GitHub to PR creator.

@westonpace westonpace merged commit 1b439b0 into apache:main Mar 24, 2023
@sanjibansg sanjibansg deleted the ensure-alignment branch March 24, 2023 17:54
@ursabot
Copy link

ursabot commented Mar 25, 2023

Benchmark runs are scheduled for baseline = f4680cd and contender = 1b439b0. 1b439b0 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.86% ⬆️0.03%] test-mac-arm
[Finished ⬇️0.51% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.25% ⬆️0.65%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 1b439b07 ec2-t3-xlarge-us-east-2
[Finished] 1b439b07 test-mac-arm
[Finished] 1b439b07 ursa-i9-9960x
[Finished] 1b439b07 ursa-thinkcentre-m75q
[Finished] f4680cd7 ec2-t3-xlarge-us-east-2
[Failed] f4680cd7 test-mac-arm
[Finished] f4680cd7 ursa-i9-9960x
[Finished] f4680cd7 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@kou
Copy link
Member

kou commented Mar 25, 2023

@sanjibansg It seems that this broke our nightly CI. Could you check this?

https://github.com/ursacomputing/crossbow/actions/runs/4518357593/jobs/7958180632#step:6:1375

[ 34%] Building CXX object src/arrow/CMakeFiles/arrow_objlib.dir/util/align_util.cc.o
cd /root/rpmbuild/BUILD/apache-arrow-12.0.0.dev290/cpp/build/src/arrow && /usr/lib64/ccache/c++  -DARROW_EXPORTING -DARROW_HAVE_RUNTIME_AVX2 -DARROW_HAVE_RUNTIME_AVX512 -DARROW_HAVE_RUNTIME_BMI2 -DARROW_HAVE_RUNTIME_SSE4_2 -DARROW_HAVE_SSE4_2 -DARROW_HDFS -DARROW_WITH_BACKTRACE -DARROW_WITH_BROTLI -DARROW_WITH_BZ2 -DARROW_WITH_LZ4 -DARROW_WITH_RE2 -DARROW_WITH_SNAPPY -DARROW_WITH_TIMING_TESTS -DARROW_WITH_UTF8PROC -DARROW_WITH_ZLIB -DARROW_WITH_ZSTD -DURI_STATIC_BUILD -DUTF8PROC_STATIC -I/root/rpmbuild/BUILD/apache-arrow-12.0.0.dev290/cpp/build/src -I/root/rpmbuild/BUILD/apache-arrow-12.0.0.dev290/cpp/src -I/root/rpmbuild/BUILD/apache-arrow-12.0.0.dev290/cpp/src/generated -isystem /root/rpmbuild/BUILD/apache-arrow-12.0.0.dev290/cpp/thirdparty/flatbuffers/include -isystem /root/rpmbuild/BUILD/apache-arrow-12.0.0.dev290/cpp/thirdparty/hadoop/include -isystem /root/rpmbuild/BUILD/apache-arrow-12.0.0.dev290/cpp/build/boost_ep-prefix/src/boost_ep -isystem /root/rpmbuild/BUILD/apache-arrow-12.0.0.dev290/cpp/build/orc_ep-install/include -isystem /root/rpmbuild/BUILD/apache-arrow-12.0.0.dev290/cpp/build/zstd_ep-install/include -isystem /root/rpmbuild/BUILD/apache-arrow-12.0.0.dev290/cpp/build/protobuf_ep-install/include -isystem /root/rpmbuild/BUILD/apache-arrow-12.0.0.dev290/cpp/build/utf8proc_ep-install/include -isystem /root/rpmbuild/BUILD/apache-arrow-12.0.0.dev290/cpp/build/re2_ep-install/include -isystem /root/rpmbuild/BUILD/apache-arrow-12.0.0.dev290/cpp/build/xsimd_ep/src/xsimd_ep-install/include -isystem /root/rpmbuild/BUILD/apache-arrow-12.0.0.dev290/cpp/build/jemalloc_ep-prefix/src  -Wno-noexcept-type -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches    -m64 -mtune=generic -fdiagnostics-color=always  -Wall -fno-semantic-interposition -msse4.2  -DNDEBUG -ftree-vectorize -fPIC   -pthread -std=c++1z -o CMakeFiles/arrow_objlib.dir/util/align_util.cc.o -c /root/rpmbuild/BUILD/apache-arrow-12.0.0.dev290/cpp/src/arrow/util/align_util.cc
...
/root/rpmbuild/BUILD/apache-arrow-12.0.0.dev290/cpp/src/arrow/util/align_util.cc: In function 'arrow::Result<std::shared_ptr<arrow::Buffer> > arrow::util::EnsureAlignment(std::shared_ptr<arrow::Buffer>, int64_t, arrow::MemoryPool*)':
/root/rpmbuild/BUILD/apache-arrow-12.0.0.dev290/cpp/src/arrow/util/align_util.cc:104:12: error: could not convert 'new_buffer' from 'std::unique_ptr<arrow::Buffer>' to 'arrow::Result<std::shared_ptr<arrow::Buffer> >'
     return new_buffer;
            ^~~~~~~~~~
make[2]: *** [src/arrow/CMakeFiles/arrow_objlib.dir/util/align_util.cc.o] Error 1

@sanjibansg
Copy link
Contributor Author

sanjibansg commented Mar 27, 2023

@sanjibansg It seems that this broke our nightly CI. Could you check this?

https://github.com/ursacomputing/crossbow/actions/runs/4518357593/jobs/7958180632#step:6:1375

[ 34%] Building CXX object src/arrow/CMakeFiles/arrow_objlib.dir/util/align_util.cc.o
cd /root/rpmbuild/BUILD/apache-arrow-12.0.0.dev290/cpp/build/src/arrow && /usr/lib64/ccache/c++  -DARROW_EXPORTING -DARROW_HAVE_RUNTIME_AVX2 -DARROW_HAVE_RUNTIME_AVX512 -DARROW_HAVE_RUNTIME_BMI2 -DARROW_HAVE_RUNTIME_SSE4_2 -DARROW_HAVE_SSE4_2 -DARROW_HDFS -DARROW_WITH_BACKTRACE -DARROW_WITH_BROTLI -DARROW_WITH_BZ2 -DARROW_WITH_LZ4 -DARROW_WITH_RE2 -DARROW_WITH_SNAPPY -DARROW_WITH_TIMING_TESTS -DARROW_WITH_UTF8PROC -DARROW_WITH_ZLIB -DARROW_WITH_ZSTD -DURI_STATIC_BUILD -DUTF8PROC_STATIC -I/root/rpmbuild/BUILD/apache-arrow-12.0.0.dev290/cpp/build/src -I/root/rpmbuild/BUILD/apache-arrow-12.0.0.dev290/cpp/src -I/root/rpmbuild/BUILD/apache-arrow-12.0.0.dev290/cpp/src/generated -isystem /root/rpmbuild/BUILD/apache-arrow-12.0.0.dev290/cpp/thirdparty/flatbuffers/include -isystem /root/rpmbuild/BUILD/apache-arrow-12.0.0.dev290/cpp/thirdparty/hadoop/include -isystem /root/rpmbuild/BUILD/apache-arrow-12.0.0.dev290/cpp/build/boost_ep-prefix/src/boost_ep -isystem /root/rpmbuild/BUILD/apache-arrow-12.0.0.dev290/cpp/build/orc_ep-install/include -isystem /root/rpmbuild/BUILD/apache-arrow-12.0.0.dev290/cpp/build/zstd_ep-install/include -isystem /root/rpmbuild/BUILD/apache-arrow-12.0.0.dev290/cpp/build/protobuf_ep-install/include -isystem /root/rpmbuild/BUILD/apache-arrow-12.0.0.dev290/cpp/build/utf8proc_ep-install/include -isystem /root/rpmbuild/BUILD/apache-arrow-12.0.0.dev290/cpp/build/re2_ep-install/include -isystem /root/rpmbuild/BUILD/apache-arrow-12.0.0.dev290/cpp/build/xsimd_ep/src/xsimd_ep-install/include -isystem /root/rpmbuild/BUILD/apache-arrow-12.0.0.dev290/cpp/build/jemalloc_ep-prefix/src  -Wno-noexcept-type -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches    -m64 -mtune=generic -fdiagnostics-color=always  -Wall -fno-semantic-interposition -msse4.2  -DNDEBUG -ftree-vectorize -fPIC   -pthread -std=c++1z -o CMakeFiles/arrow_objlib.dir/util/align_util.cc.o -c /root/rpmbuild/BUILD/apache-arrow-12.0.0.dev290/cpp/src/arrow/util/align_util.cc
...
/root/rpmbuild/BUILD/apache-arrow-12.0.0.dev290/cpp/src/arrow/util/align_util.cc: In function 'arrow::Result<std::shared_ptr<arrow::Buffer> > arrow::util::EnsureAlignment(std::shared_ptr<arrow::Buffer>, int64_t, arrow::MemoryPool*)':
/root/rpmbuild/BUILD/apache-arrow-12.0.0.dev290/cpp/src/arrow/util/align_util.cc:104:12: error: could not convert 'new_buffer' from 'std::unique_ptr<arrow::Buffer>' to 'arrow::Result<std::shared_ptr<arrow::Buffer> >'
     return new_buffer;
            ^~~~~~~~~~
make[2]: *** [src/arrow/CMakeFiles/arrow_objlib.dir/util/align_util.cc.o] Error 1

@kou
Thanks for letting me know. I will make a fix.
Also, should I make an Issue first, and then make a PR for the minor fix?

cc: @westonpace

rtpsw pushed a commit to rtpsw/arrow that referenced this pull request Mar 27, 2023
…gs an alignment requirement (apache#14758)

This PR adds a utility function which is responsible for ensuring that all the buffers of an arrow object are properly aligned. It checks all the buffers in an arrow object for alignment, and if not aligned properly, then allocates a buffer by specifying the required alignment and copies data from the previous buffer.
* Closes: apache#33317

Authored-by: Sanjiban Sengupta <[email protected]>
Signed-off-by: Weston Pace <[email protected]>
@kou
Copy link
Member

kou commented Mar 27, 2023

Thanks!
Could you open a new issue?

@kou
Copy link
Member

kou commented Mar 27, 2023

FYI: We can test this case by commenting @github-actions crossbow submit amazon-linux-2-amd64. See #14758 (comment) as an example.

@kou
Copy link
Member

kou commented Mar 27, 2023

@github-actions crossbow submit amazon-linux-2-amd64

@github-actions
Copy link

Revision: 221426d

Submitted crossbow builds: ursacomputing/crossbow @ actions-a124ab2af3

Task Status
amazon-linux-2-amd64 Github Actions

@sanjibansg
Copy link
Contributor Author

@kou It is passing now! #34754

cc: @westonpace

kou pushed a commit that referenced this pull request Mar 28, 2023
This PR should fix the nightly builds CI error which occurred after merging #14758. In the EnsureAlignment utility for a Buffer, the modified buffer should be returned by `std::move`.
* Closes: #34753

Authored-by: Sanjiban Sengupta <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this pull request May 15, 2023
…gs an alignment requirement (apache#14758)

This PR adds a utility function which is responsible for ensuring that all the buffers of an arrow object are properly aligned. It checks all the buffers in an arrow object for alignment, and if not aligned properly, then allocates a buffer by specifying the required alignment and copies data from the previous buffer.
* Closes: apache#33317

Authored-by: Sanjiban Sengupta <[email protected]>
Signed-off-by: Weston Pace <[email protected]>
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this pull request May 15, 2023
…ache#34754)

This PR should fix the nightly builds CI error which occurred after merging apache#14758. In the EnsureAlignment utility for a Buffer, the modified buffer should be returned by `std::move`.
* Closes: apache#34753

Authored-by: Sanjiban Sengupta <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++] Utility method to ensure an array object meetings an alignment requirement
4 participants