forked from rapidsai/cudf
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
merge rapidsai/branch 22.10 #1
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[gpuCI] Forward-merge branch-22.08 to branch-22.10 [skip gpuci]
[gpuCI] Forward-merge branch-22.08 to branch-22.10 [skip gpuci]
Update JNI version to 22.10.0-SNAPSHOT jni cmake change has been covered in df24808#diff-85b071fcb8436e7b5eedefcb44bd66949e9d8062d160f7317e19fc50ace7c887 Authors: - Peixin (https://github.com/pxLi) Approvers: - Jason Lowe (https://github.com/jlowe) URL: #11338
This PR fixes timeouts like the following that are happening across all our s3 tests in code-base: ```python Traceback (most recent call last): File "/lib/python3.9/site-packages/urllib3/connectionpool.py", line 426, in _make_request six.raise_from(e, None) File "<string>", line 3, in raise_from File "/lib/python3.9/site-packages/urllib3/connectionpool.py", line 421, in _make_request httplib_response = conn.getresponse() File "/..pyenv/versions/3.9.6/lib/python3.9/http/client.py", line 1349, in getresponse response.begin() File "/..pyenv/versions/3.9.6/lib/python3.9/http/client.py", line 316, in begin version, status, reason = self._read_status() File "/..pyenv/versions/3.9.6/lib/python3.9/http/client.py", line 277, in _read_status line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") File "/..pyenv/versions/3.9.6/lib/python3.9/socket.py", line 704, in readinto return self._sock.recv_into(b) socket.timeout: timed out ``` This seems to be an issue with `werkzeug` package which was updated 1 day ago. So until there is an upstream fix for it, we will temporarily need to pin this package. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) URL: #11369
[gpuCI] Forward-merge branch-22.08 to branch-22.10 [skip gpuci]
I wanted to get this up in a PR of its own so we could get some discussion going if necessary. This adds a `byte_array_view` which is almost identical to a `string_view`. The goal is to be able to get these for list columns of bytes, so `list<uint8>` and `list<int8>`. I didn't template it on the type, but instead selected `uint8_t` because `std::byte` is a `uint8_t`. My PR for writing byte arrays in parquet will use this to get the rows of byte data for statistics and writing. That PR is forthcoming. I left this code down in cuio statistics due to the usage and the previous discussions regarding `.element`. I needed to wrap the `device_span` because I need comparison operators for the cub reduce. Authors: - Mike Wilson (https://github.com/hyperbolic2346) Approvers: - MithunR (https://github.com/mythrocks) - Vukasin Milovanovic (https://github.com/vuule) URL: #11322
This PR is needed for PR #11160 and another PR for writing list<int8>'s as byte arrays as well. This PR will be the basis for the statistics written by #11160, but it should be noted that it will work due to an implementation detail. The union of the statistics types for strings and byte arrays is the same layout. This means the string statistics that are stored from the string column can be read as byte array statistics later when writing out the file and it will work simply because they can alias to each other. Alternatives considered included changing the incoming table type in the case of strings being written as a byte array to be a byte array, which would only work because of the implementation detail that the columns are identical data layouts. Also considered was templating out multiple paths in [column_statistics.cuh:107](https://github.com/rapidsai/cudf/blob/e98feab966f6d1b9eba83323e851c955a1691865/cpp/src/io/statistics/column_statistics.cuh#L107) that we could use to branch based on the data type. The latter seems more the correct way, but comes with a performance and complexity issue. The method used in the PR was mainly chosen due to simplicity and the existing reliance on the data formats being identical throughout the code. It seems unlikely these will ever decouple. Authors: - Mike Wilson (https://github.com/hyperbolic2346) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - MithunR (https://github.com/mythrocks) - https://github.com/nvdbaranec URL: #11303
[gpuCI] Forward-merge branch-22.08 to branch-22.10 [skip gpuci]
etseidl
pushed a commit
that referenced
this pull request
Jun 9, 2023
This implements stacktrace and adds a stacktrace string into any exception thrown by cudf. By doing so, the exception carries information about where it originated, allowing the downstream application to trace back with much less effort. Closes rapidsai#12422. ### Example: ``` #0: cudf/cpp/build/libcudf.so : std::unique_ptr<cudf::column, std::default_delete<cudf::column> > cudf::detail::sorted_order<false>(cudf::table_view, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x446 #1: cudf/cpp/build/libcudf.so : cudf::detail::sorted_order(cudf::table_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x113 #2: cudf/cpp/build/libcudf.so : std::unique_ptr<cudf::column, std::default_delete<cudf::column> > cudf::detail::segmented_sorted_order_common<(cudf::detail::sort_method)1>(cudf::table_view const&, cudf::column_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x66e #3: cudf/cpp/build/libcudf.so : cudf::detail::segmented_sort_by_key(cudf::table_view const&, cudf::table_view const&, cudf::column_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x88 #4: cudf/cpp/build/libcudf.so : cudf::segmented_sort_by_key(cudf::table_view const&, cudf::table_view const&, cudf::column_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::mr::device_memory_resource*)+0xb9 #5: cudf/cpp/build/gtests/SORT_TEST : ()+0xe3027 #6: cudf/cpp/build/lib/libgtest.so.1.13.0 : void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x8f rapidsai#7: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::Test::Run()+0xd6 rapidsai#8: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::TestInfo::Run()+0x195 rapidsai#9: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::TestSuite::Run()+0x109 rapidsai#10: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::internal::UnitTestImpl::RunAllTests()+0x44f rapidsai#11: cudf/cpp/build/lib/libgtest.so.1.13.0 : bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0x87 rapidsai#12: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::UnitTest::Run()+0x95 rapidsai#13: cudf/cpp/build/gtests/SORT_TEST : ()+0xdb08c rapidsai#14: /lib/x86_64-linux-gnu/libc.so.6 : ()+0x29d90 rapidsai#15: /lib/x86_64-linux-gnu/libc.so.6 : __libc_start_main()+0x80 rapidsai#16: cudf/cpp/build/gtests/SORT_TEST : ()+0xdf3d5 ``` ### Usage In order to retrieve a stacktrace with fully human-readable symbols, some compiling options must be adjusted. To make such adjustment convenient and effortless, a new cmake option (`CUDF_BUILD_STACKTRACE_DEBUG`) has been added. Just set this option to `ON` before building cudf and it will be ready to use. For downstream applications, whenever a cudf-type exception is thrown, it can retrieve the stored stacktrace and do whatever it wants with it. For example: ``` try { // cudf API calls } catch (cudf::logic_error const& e) { std::cout << e.what() << std::endl; std::cout << e.stacktrace() << std::endl; throw e; } // similar with catching other exception types ``` ### Follow-up work The next step would be patching `rmm` to attach stacktrace into `rmm::` exceptions. Doing so will allow debugging various memory exceptions thrown from libcudf using their stacktrace. ### Note: * This feature doesn't require libcudf to be built in Debug mode. * The flag `CUDF_BUILD_STACKTRACE_DEBUG` should not be turned on in production as it may affect code optimization. Instead, libcudf compiled with that flag turned on should be used only when needed, when debugging cudf throwing exceptions. * This flag removes the current optimization flag from compiling (such as `-O2` or `-O3`, if in Release mode) and replaces by `-Og` (optimize for debugging). * If this option is not set to `ON`, the stacktrace will not be available. This is to avoid expensive stracktrace retrieval if the throwing exception is expected. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - Robert Maynard (https://github.com/robertmaynard) - Vyas Ramasubramani (https://github.com/vyasr) - Jason Lowe (https://github.com/jlowe) URL: rapidsai#13298
etseidl
pushed a commit
that referenced
this pull request
Sep 27, 2023
Pin conda packages to `aws-sdk-cpp<1.11`. The recent upgrade in version `1.11.*` has caused several issues with cleaning up (more details on changes can be read in [this link](https://github.com/aws/aws-sdk-cpp#version-111-is-now-available)), leading to Distributed and Dask-CUDA processes to segfault. The stack for one of those crashes looks like the following: ``` (gdb) bt #0 0x00007f5125359a0c in Aws::Utils::Logging::s_aws_logger_redirect_get_log_level(aws_logger*, unsigned int) () from /opt/conda/envs/dask/lib/python3.9/site-packages/pyarrow/../../.././libaws-cpp-sdk-core.so #1 0x00007f5124968f83 in aws_event_loop_thread () from /opt/conda/envs/dask/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-io.so.1.0.0 #2 0x00007f5124ad9359 in thread_fn () from /opt/conda/envs/dask/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1 #3 0x00007f519958f6db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #4 0x00007f5198b1361f in clone () from /lib/x86_64-linux-gnu/libc.so.6 ``` Such segfaults now manifest frequently in CI, and in some cases are reproducible with a hit rate of ~30%. Given the approaching release time, it's probably the safest option to just pin to an older version of the package while we don't pinpoint the exact cause for the issue and a patched build is released upstream. The `aws-sdk-cpp` is statically-linked in the `pyarrow` pip package, which prevents us from using the same pinning technique. cuDF is currently pinned to `pyarrow=12.0.1` which seems to be built against `aws-sdk-cpp=1.10.*`, as per [recent build logs](https://github.com/apache/arrow/actions/runs/6276453828/job/17046177335?pr=37792#step:6:1372). Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Ray Douglass (https://github.com/raydouglass) URL: rapidsai#14173
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.