-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MapD Thrift interactions in C++ (and C++ in general) #3
Comments
Given that we want to leave room to explore how we do IPC, I don't think we want to commit to a lot of C++ code just yet. However, I think mapd (https://github.com/mapd/mapd-core) may already has that code. If that is the case, that would make things easier. Btw, there will likely be other parts in pygdf that would need to be in C++ code soon. But, I don't think we need everything to be in C++. At the end, the heavy lifting in done in numba compiled and other GPU code. |
This is a somewhat Python-centric view of things -- it would be interesting to define a core implementation that can be used from any program with reasonable C/C++ FFI. This includes Rust, Julia, Go, Lua/LuaJIT, etc. Such a core library would be very small (< 10 KLOC I would say) and focused on data structures, memory management / movement, and APIs for user-defined functions. Numba then could use this public API to plug in custom code written by the user. |
MapD creates the IPC memory but always relays its ownership to pygdf for now. I am all for data exchange layer(that does ref-counting etc.) and maybe renaming pygdf to sth. else. As of UDF, not sure if the use-case is language independent where we might need an ABI, or link its LLVM bitcode. |
I am open to the idea for a C++ core library but it is too soon to commit to one. My approach to PyGDF now is to keep it simple and loosely-coupled for things to evolve quickly. There are many technical questions about how the GDF will look like and all the management it needs. There are a lot more things to think about than CPU code:
By focusing on the IPC front first, we avoid a lot of the technical questions above since each process is isolated in its own process-space. By using Arrow and flatbuffers toolset, we can already share data among multiple languages. Given the requirements for GDF is still evolving, we should focus on the application-side first to get a sense of the usecases. Then, we can come back with a design for the C++ core library if it becomes necessary. |
Would it make sense to start drafting some design documents to lay out the requirements and other constraints / implementation considerations and tradeoffs? I think it would be worth doing some up front design work to build consensus around these questions. Among all the people here, there's a huge amount of domain expertise in doing analytics on the GPU, so I think this would be a productive use of time. With Apache Arrow we spent a lot of time iterating on the specification documents and the mechanics of zero-copy IPC / the Flatbuffer metadata, and I think resolving many of the design questions up front has been a big help. |
@sklam this is a good list of problems to think about that could be longer, but I don't want to see it overcomplicated. Basically we can well isolate the management layer or service regardless of what the use cases are, and all memory de/allocation count on that where probably only IPC memory works. As of error handling this is another topic that could be also well handled. Just need parallelism in tasking. |
Yes, I think we should as we want to be open about our design and implementation. Perhaps, we should open a new repo under gpuopenanalytics to put all the design docs and questions. At this stage, I guess the design note would just be "trying out Arrow on the GPU" =). Also, I would like to get the core cross-language IPC code in its own repo. The arrow flatbuffers code does not need to be in the this repo. And, I don't want folks to think that PyGDF is The GDF. @wesm, is there a recommended way to ship the generated python flatbuffer code? |
@m1mc , yes, we should focus on IPC memory first. It will make things a lot simpler. |
We can start a separate thread about this. Why would you would want to reimplement the metadata serialization and IPC loading/unloading in Python (vs. simply using libarrow)? If there's some aspect of the libarrow API that's inadequate for this use case, I will be happy to scope out and help do some development to help this project. |
One thing I've thought about is defining some C structs to make interacting with raw Arrow memory from LLVM or C simpler. So you could have typedef struct {
int64_t length;
int64_t null_count;
const uint8_t* valid_bits;
int type;
} arrow_base_t;
typedef struct {
struct arrow_base_t base;
const uint8_t* data;
} arrow_primitive_t;
typedef struct {
struct arrow_base_t base;
const int32_t* offsets;
const uint8_t* data;
} arrow_string_t; I don't know all the requirements so it would be great to start some design docs in Markdown or some other format. |
I wouldn't want to reimplement anything if it is already available. But, there are some complication due to the metadata and data being in a single CUDA IPC memory region. It cannot be accessed directly through normal CPU pointer. We need to keep it on the GPU to avoid transfer to host (CPU) memory. Is libarrow willing to include CUDA support? |
(adding to my previous comment) PyGDF only has the IPC reading part. The metadata is the only portion copied back to the host. The data is kept on the device. The reader tries to minimize device->host transfer. I am interested to know how mapd implemented the IPC serialization part. (ping @m1mc) |
Since Arrow (up to 0.3.0) only has CPU serializer, we are keeping and serializing results in system memory and upload to the device for now. Otherwise, we had had to come up w/ our own GPU serializer in short term but should be easy to do by generating an extra null bitmap in a kernel. BTW, PyGDF should have got a separate metadata through thrift API. |
Yes. I just picked the one in the IPC memory. It doesn't have to be that way. The only reason for the current way is that a single memory region feels more consistent. Probably not be a strong reason. Depending on how the serialization is done, we can just keep one copy of the metadata instead of duplicating it on both host and device. FYI. Beside the metadata for the Schema, there are small bits of metadata inside the RecordBatches header that need to be parsed on the host. |
I would like to start a |
* test binary_operator * test one line * essentially use _binaryop with a line flipped * expand to all non commutative reflected ops * revert rmul * rbinaryop function for clarity
* adding eq datetime ops for pygdf * flake8 fixes * Drop Python 2.7, Add Python 3.7 * removing int coercion for datetime * Remove Python 3.7 build * bumping numba * forgot to commit meta.yaml changes * flake8 * commutative addition * commutative subtraction and multiplication * reflected floordiv and truediv * cleanup * stray comment * change rsub method * further testing rsub * rsub docstring * revert back * type coercion * revert to pseudo-commutative implementation * commutative ops tests * test comment cleanup * Feature/reflected ops noncommutative testing (#1) * np array solution * cleanup * np solution for division * full reflected ops tests * cleanup * switching lambda scalar to 2 * Update README.md Conda installation instruction needed changes with pygdf version. * Feature/reflected ops update (#2) * test binary_operator * test one line * essentially use _binaryop with a line flipped * expand to all non commutative reflected ops * revert rmul * Feature/reflected ops update (#3) * test binary_operator * test one line * essentially use _binaryop with a line flipped * expand to all non commutative reflected ops * revert rmul * rbinaryop function for clarity * add scalar to array generation to avoid division by zero behavior * remove integer division test due to libgdf bug * Fix timezone issue when converting from datetime object into datetime64 * Remove unused import to fix flake8 * Initial modifications for new join API
Use GDF_STRING instead of GDF_CATEGORY for string type detection
Updating changlog and fixing flake8 errors
This implements stacktrace and adds a stacktrace string into any exception thrown by cudf. By doing so, the exception carries information about where it originated, allowing the downstream application to trace back with much less effort. Closes #12422. ### Example: ``` #0: cudf/cpp/build/libcudf.so : std::unique_ptr<cudf::column, std::default_delete<cudf::column> > cudf::detail::sorted_order<false>(cudf::table_view, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x446 #1: cudf/cpp/build/libcudf.so : cudf::detail::sorted_order(cudf::table_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x113 #2: cudf/cpp/build/libcudf.so : std::unique_ptr<cudf::column, std::default_delete<cudf::column> > cudf::detail::segmented_sorted_order_common<(cudf::detail::sort_method)1>(cudf::table_view const&, cudf::column_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x66e #3: cudf/cpp/build/libcudf.so : cudf::detail::segmented_sort_by_key(cudf::table_view const&, cudf::table_view const&, cudf::column_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x88 #4: cudf/cpp/build/libcudf.so : cudf::segmented_sort_by_key(cudf::table_view const&, cudf::table_view const&, cudf::column_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::mr::device_memory_resource*)+0xb9 #5: cudf/cpp/build/gtests/SORT_TEST : ()+0xe3027 #6: cudf/cpp/build/lib/libgtest.so.1.13.0 : void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x8f #7: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::Test::Run()+0xd6 #8: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::TestInfo::Run()+0x195 #9: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::TestSuite::Run()+0x109 #10: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::internal::UnitTestImpl::RunAllTests()+0x44f #11: cudf/cpp/build/lib/libgtest.so.1.13.0 : bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0x87 #12: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::UnitTest::Run()+0x95 #13: cudf/cpp/build/gtests/SORT_TEST : ()+0xdb08c #14: /lib/x86_64-linux-gnu/libc.so.6 : ()+0x29d90 #15: /lib/x86_64-linux-gnu/libc.so.6 : __libc_start_main()+0x80 #16: cudf/cpp/build/gtests/SORT_TEST : ()+0xdf3d5 ``` ### Usage In order to retrieve a stacktrace with fully human-readable symbols, some compiling options must be adjusted. To make such adjustment convenient and effortless, a new cmake option (`CUDF_BUILD_STACKTRACE_DEBUG`) has been added. Just set this option to `ON` before building cudf and it will be ready to use. For downstream applications, whenever a cudf-type exception is thrown, it can retrieve the stored stacktrace and do whatever it wants with it. For example: ``` try { // cudf API calls } catch (cudf::logic_error const& e) { std::cout << e.what() << std::endl; std::cout << e.stacktrace() << std::endl; throw e; } // similar with catching other exception types ``` ### Follow-up work The next step would be patching `rmm` to attach stacktrace into `rmm::` exceptions. Doing so will allow debugging various memory exceptions thrown from libcudf using their stacktrace. ### Note: * This feature doesn't require libcudf to be built in Debug mode. * The flag `CUDF_BUILD_STACKTRACE_DEBUG` should not be turned on in production as it may affect code optimization. Instead, libcudf compiled with that flag turned on should be used only when needed, when debugging cudf throwing exceptions. * This flag removes the current optimization flag from compiling (such as `-O2` or `-O3`, if in Release mode) and replaces by `-Og` (optimize for debugging). * If this option is not set to `ON`, the stacktrace will not be available. This is to avoid expensive stracktrace retrieval if the throwing exception is expected. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - Robert Maynard (https://github.com/robertmaynard) - Vyas Ramasubramani (https://github.com/vyasr) - Jason Lowe (https://github.com/jlowe) URL: #13298
Pin conda packages to `aws-sdk-cpp<1.11`. The recent upgrade in version `1.11.*` has caused several issues with cleaning up (more details on changes can be read in [this link](https://github.com/aws/aws-sdk-cpp#version-111-is-now-available)), leading to Distributed and Dask-CUDA processes to segfault. The stack for one of those crashes looks like the following: ``` (gdb) bt #0 0x00007f5125359a0c in Aws::Utils::Logging::s_aws_logger_redirect_get_log_level(aws_logger*, unsigned int) () from /opt/conda/envs/dask/lib/python3.9/site-packages/pyarrow/../../.././libaws-cpp-sdk-core.so #1 0x00007f5124968f83 in aws_event_loop_thread () from /opt/conda/envs/dask/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-io.so.1.0.0 #2 0x00007f5124ad9359 in thread_fn () from /opt/conda/envs/dask/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1 #3 0x00007f519958f6db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #4 0x00007f5198b1361f in clone () from /lib/x86_64-linux-gnu/libc.so.6 ``` Such segfaults now manifest frequently in CI, and in some cases are reproducible with a hit rate of ~30%. Given the approaching release time, it's probably the safest option to just pin to an older version of the package while we don't pinpoint the exact cause for the issue and a patched build is released upstream. The `aws-sdk-cpp` is statically-linked in the `pyarrow` pip package, which prevents us from using the same pinning technique. cuDF is currently pinned to `pyarrow=12.0.1` which seems to be built against `aws-sdk-cpp=1.10.*`, as per [recent build logs](https://github.com/apache/arrow/actions/runs/6276453828/job/17046177335?pr=37792#step:6:1372). Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Ray Douglass (https://github.com/raydouglass) URL: #14173
… pandas columns (#3) Fixes: rapidsai/xdf#322 This PR raises an error when a pandas column with a mix of bools & None are detected i.e., when a boolean column is of type object rather than bool/boolean.
I am wondering if it would be faster and more maintainable to handle serialization, data movement, and other interactions between this system and other platforms (e.g. MapD) in a C++ library that is not Python specific. This would add some complexity to the build system for this project, but that seems inevitable given the nature of the problem (that you'll end up needing to use nvcc at some point to create some support libraries).
The text was updated successfully, but these errors were encountered: