MapD Thrift interactions in C++ (and C++ in general) #3

wesm · 2017-05-16T17:33:08Z

I am wondering if it would be faster and more maintainable to handle serialization, data movement, and other interactions between this system and other platforms (e.g. MapD) in a C++ library that is not Python specific. This would add some complexity to the build system for this project, but that seems inevitable given the nature of the problem (that you'll end up needing to use nvcc at some point to create some support libraries).

sklam · 2017-05-17T23:30:52Z

Given that we want to leave room to explore how we do IPC, I don't think we want to commit to a lot of C++ code just yet. However, I think mapd (https://github.com/mapd/mapd-core) may already has that code. If that is the case, that would make things easier.

Btw, there will likely be other parts in pygdf that would need to be in C++ code soon. But, I don't think we need everything to be in C++. At the end, the heavy lifting in done in numba compiled and other GPU code.

wesm · 2017-05-18T01:02:45Z

At the end, the heavy lifting in done in numba compiled and other GPU code.

This is a somewhat Python-centric view of things -- it would be interesting to define a core implementation that can be used from any program with reasonable C/C++ FFI. This includes Rust, Julia, Go, Lua/LuaJIT, etc. Such a core library would be very small (< 10 KLOC I would say) and focused on data structures, memory management / movement, and APIs for user-defined functions. Numba then could use this public API to plug in custom code written by the user.

m1mc · 2017-05-18T15:01:36Z

MapD creates the IPC memory but always relays its ownership to pygdf for now. I am all for data exchange layer(that does ref-counting etc.) and maybe renaming pygdf to sth. else. As of UDF, not sure if the use-case is language independent where we might need an ABI, or link its LLVM bitcode.

sklam · 2017-05-18T16:22:21Z

Such a core library would be very small (< 10 KLOC I would say) and focused on data structures, memory management / movement, and APIs for user-defined functions.

I am open to the idea for a C++ core library but it is too soon to commit to one. My approach to PyGDF now is to keep it simple and loosely-coupled for things to evolve quickly. There are many technical questions about how the GDF will look like and all the management it needs. There are a lot more things to think about than CPU code:

GPU device context ownership?
- many applications tend to assume sole ownership of all GPU devices and contexts.
GPU memory management
- sharing: unified memory? IPC memory? peer GPU memory access? managed lifetime?
- special memory: pagelocked host memory? mapped host memory?
- working with custom allocator in other GPU software
Asynchronous execution queue (CUDA Stream)
- Who owns? Who manage?
- Async execution could already be a challenge
GPU error
- error may segfault or corrupt the GPU context, forcing the process to terminate and relaunch to recover (see error description in the cuda runtime docs). This can cause problem for multiple applications to share the same process.

By focusing on the IPC front first, we avoid a lot of the technical questions above since each process is isolated in its own process-space. By using Arrow and flatbuffers toolset, we can already share data among multiple languages. Given the requirements for GDF is still evolving, we should focus on the application-side first to get a sense of the usecases. Then, we can come back with a design for the C++ core library if it becomes necessary.

wesm · 2017-05-18T16:31:54Z

My approach to PyGDF now is to keep it simple and loosely-coupled for things to evolve quickly.

Would it make sense to start drafting some design documents to lay out the requirements and other constraints / implementation considerations and tradeoffs? I think it would be worth doing some up front design work to build consensus around these questions. Among all the people here, there's a huge amount of domain expertise in doing analytics on the GPU, so I think this would be a productive use of time.

With Apache Arrow we spent a lot of time iterating on the specification documents and the mechanics of zero-copy IPC / the Flatbuffer metadata, and I think resolving many of the design questions up front has been a big help.

m1mc · 2017-05-18T16:54:16Z

@sklam this is a good list of problems to think about that could be longer, but I don't want to see it overcomplicated. Basically we can well isolate the management layer or service regardless of what the use cases are, and all memory de/allocation count on that where probably only IPC memory works. As of error handling this is another topic that could be also well handled. Just need parallelism in tasking.

sklam · 2017-05-18T17:02:35Z

Would it make sense to start drafting some design documents to lay out the requirements and other constraints / implementation considerations and tradeoffs?

Yes, I think we should as we want to be open about our design and implementation. Perhaps, we should open a new repo under gpuopenanalytics to put all the design docs and questions. At this stage, I guess the design note would just be "trying out Arrow on the GPU" =).

Also, I would like to get the core cross-language IPC code in its own repo. The arrow flatbuffers code does not need to be in the this repo. And, I don't want folks to think that PyGDF is The GDF.

@wesm, is there a recommended way to ship the generated python flatbuffer code?

sklam · 2017-05-18T17:03:54Z

@m1mc , yes, we should focus on IPC memory first. It will make things a lot simpler.

wesm · 2017-05-18T20:36:35Z

@wesm, is there a recommended way to ship the generated python flatbuffer code?

We can start a separate thread about this. Why would you would want to reimplement the metadata serialization and IPC loading/unloading in Python (vs. simply using libarrow)? If there's some aspect of the libarrow API that's inadequate for this use case, I will be happy to scope out and help do some development to help this project.

wesm · 2017-05-18T20:43:18Z

One thing I've thought about is defining some C structs to make interacting with raw Arrow memory from LLVM or C simpler. So you could have

typedef struct {
  int64_t length;
  int64_t null_count;
  const uint8_t* valid_bits;
  int type;
} arrow_base_t;

typedef struct {
  struct arrow_base_t base;
  const uint8_t* data;
} arrow_primitive_t;

typedef struct {
  struct arrow_base_t base;
  const int32_t* offsets;
  const uint8_t* data;
} arrow_string_t;

I don't know all the requirements so it would be great to start some design docs in Markdown or some other format.

sklam · 2017-05-18T21:34:13Z

@wesm

Why would you would want to reimplement the metadata serialization and IPC loading/unloading in Python (vs. simply using libarrow)? If there's some aspect of the libarrow API that's inadequate for this use case, I will be happy to scope out and help do some development to help this project.

I wouldn't want to reimplement anything if it is already available. But, there are some complication due to the metadata and data being in a single CUDA IPC memory region. It cannot be accessed directly through normal CPU pointer. We need to keep it on the GPU to avoid transfer to host (CPU) memory. Is libarrow willing to include CUDA support?

sklam · 2017-05-18T22:08:45Z

(adding to my previous comment)

PyGDF only has the IPC reading part. The metadata is the only portion copied back to the host. The data is kept on the device. The reader tries to minimize device->host transfer.

I am interested to know how mapd implemented the IPC serialization part. (ping @m1mc)

m1mc · 2017-05-18T22:21:22Z

Since Arrow (up to 0.3.0) only has CPU serializer, we are keeping and serializing results in system memory and upload to the device for now. Otherwise, we had had to come up w/ our own GPU serializer in short term but should be easy to do by generating an extra null bitmap in a kernel. BTW, PyGDF should have got a separate metadata through thrift API.

sklam · 2017-05-19T15:36:38Z

BTW, PyGDF should have got a separate metadata through thrift API.

Yes. I just picked the one in the IPC memory. It doesn't have to be that way. The only reason for the current way is that a single memory region feels more consistent. Probably not be a strong reason.

Depending on how the serialization is done, we can just keep one copy of the metadata instead of duplicating it on both host and device.

FYI. Beside the metadata for the Schema, there are small bits of metadata inside the RecordBatches header that need to be parsed on the host.

wesm · 2017-05-19T18:30:47Z

I would like to start a libarrow_cuda add-on library in the Arrow codebase and try to get some simple IPC loading tests running on the host with the existing codebase to see if there are any issues. If others would like to get involved https://issues.apache.org/jira/browse/ARROW-1055

* test binary_operator * test one line * essentially use _binaryop with a line flipped * expand to all non commutative reflected ops * revert rmul * rbinaryop function for clarity

* adding eq datetime ops for pygdf * flake8 fixes * Drop Python 2.7, Add Python 3.7 * removing int coercion for datetime * Remove Python 3.7 build * bumping numba * forgot to commit meta.yaml changes * flake8 * commutative addition * commutative subtraction and multiplication * reflected floordiv and truediv * cleanup * stray comment * change rsub method * further testing rsub * rsub docstring * revert back * type coercion * revert to pseudo-commutative implementation * commutative ops tests * test comment cleanup * Feature/reflected ops noncommutative testing (#1) * np array solution * cleanup * np solution for division * full reflected ops tests * cleanup * switching lambda scalar to 2 * Update README.md Conda installation instruction needed changes with pygdf version. * Feature/reflected ops update (#2) * test binary_operator * test one line * essentially use _binaryop with a line flipped * expand to all non commutative reflected ops * revert rmul * Feature/reflected ops update (#3) * test binary_operator * test one line * essentially use _binaryop with a line flipped * expand to all non commutative reflected ops * revert rmul * rbinaryop function for clarity * add scalar to array generation to avoid division by zero behavior * remove integer division test due to libgdf bug * Fix timezone issue when converting from datetime object into datetime64 * Remove unused import to fix flake8 * Initial modifications for new join API

Use GDF_STRING instead of GDF_CATEGORY for string type detection

update to latest base

Updating changlog and fixing flake8 errors

New changes

Apply isort

C o w 2

This implements stacktrace and adds a stacktrace string into any exception thrown by cudf. By doing so, the exception carries information about where it originated, allowing the downstream application to trace back with much less effort. Closes #12422. ### Example: ``` #0: cudf/cpp/build/libcudf.so : std::unique_ptr<cudf::column, std::default_delete<cudf::column> > cudf::detail::sorted_order<false>(cudf::table_view, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x446 #1: cudf/cpp/build/libcudf.so : cudf::detail::sorted_order(cudf::table_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x113 #2: cudf/cpp/build/libcudf.so : std::unique_ptr<cudf::column, std::default_delete<cudf::column> > cudf::detail::segmented_sorted_order_common<(cudf::detail::sort_method)1>(cudf::table_view const&, cudf::column_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x66e #3: cudf/cpp/build/libcudf.so : cudf::detail::segmented_sort_by_key(cudf::table_view const&, cudf::table_view const&, cudf::column_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x88 #4: cudf/cpp/build/libcudf.so : cudf::segmented_sort_by_key(cudf::table_view const&, cudf::table_view const&, cudf::column_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::mr::device_memory_resource*)+0xb9 #5: cudf/cpp/build/gtests/SORT_TEST : ()+0xe3027 #6: cudf/cpp/build/lib/libgtest.so.1.13.0 : void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x8f #7: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::Test::Run()+0xd6 #8: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::TestInfo::Run()+0x195 #9: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::TestSuite::Run()+0x109 #10: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::internal::UnitTestImpl::RunAllTests()+0x44f #11: cudf/cpp/build/lib/libgtest.so.1.13.0 : bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0x87 #12: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::UnitTest::Run()+0x95 #13: cudf/cpp/build/gtests/SORT_TEST : ()+0xdb08c #14: /lib/x86_64-linux-gnu/libc.so.6 : ()+0x29d90 #15: /lib/x86_64-linux-gnu/libc.so.6 : __libc_start_main()+0x80 #16: cudf/cpp/build/gtests/SORT_TEST : ()+0xdf3d5 ``` ### Usage In order to retrieve a stacktrace with fully human-readable symbols, some compiling options must be adjusted. To make such adjustment convenient and effortless, a new cmake option (`CUDF_BUILD_STACKTRACE_DEBUG`) has been added. Just set this option to `ON` before building cudf and it will be ready to use. For downstream applications, whenever a cudf-type exception is thrown, it can retrieve the stored stacktrace and do whatever it wants with it. For example: ``` try { // cudf API calls } catch (cudf::logic_error const& e) { std::cout << e.what() << std::endl; std::cout << e.stacktrace() << std::endl; throw e; } // similar with catching other exception types ``` ### Follow-up work The next step would be patching `rmm` to attach stacktrace into `rmm::` exceptions. Doing so will allow debugging various memory exceptions thrown from libcudf using their stacktrace. ### Note: * This feature doesn't require libcudf to be built in Debug mode. * The flag `CUDF_BUILD_STACKTRACE_DEBUG` should not be turned on in production as it may affect code optimization. Instead, libcudf compiled with that flag turned on should be used only when needed, when debugging cudf throwing exceptions. * This flag removes the current optimization flag from compiling (such as `-O2` or `-O3`, if in Release mode) and replaces by `-Og` (optimize for debugging). * If this option is not set to `ON`, the stacktrace will not be available. This is to avoid expensive stracktrace retrieval if the throwing exception is expected. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - Robert Maynard (https://github.com/robertmaynard) - Vyas Ramasubramani (https://github.com/vyasr) - Jason Lowe (https://github.com/jlowe) URL: #13298

Pin conda packages to `aws-sdk-cpp<1.11`. The recent upgrade in version `1.11.*` has caused several issues with cleaning up (more details on changes can be read in [this link](https://github.com/aws/aws-sdk-cpp#version-111-is-now-available)), leading to Distributed and Dask-CUDA processes to segfault. The stack for one of those crashes looks like the following: ``` (gdb) bt #0 0x00007f5125359a0c in Aws::Utils::Logging::s_aws_logger_redirect_get_log_level(aws_logger*, unsigned int) () from /opt/conda/envs/dask/lib/python3.9/site-packages/pyarrow/../../.././libaws-cpp-sdk-core.so #1 0x00007f5124968f83 in aws_event_loop_thread () from /opt/conda/envs/dask/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-io.so.1.0.0 #2 0x00007f5124ad9359 in thread_fn () from /opt/conda/envs/dask/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1 #3 0x00007f519958f6db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #4 0x00007f5198b1361f in clone () from /lib/x86_64-linux-gnu/libc.so.6 ``` Such segfaults now manifest frequently in CI, and in some cases are reproducible with a hit rate of ~30%. Given the approaching release time, it's probably the safest option to just pin to an older version of the package while we don't pinpoint the exact cause for the issue and a patched build is released upstream. The `aws-sdk-cpp` is statically-linked in the `pyarrow` pip package, which prevents us from using the same pinning technique. cuDF is currently pinned to `pyarrow=12.0.1` which seems to be built against `aws-sdk-cpp=1.10.*`, as per [recent build logs](https://github.com/apache/arrow/actions/runs/6276453828/job/17046177335?pr=37792#step:6:1372). Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Ray Douglass (https://github.com/raydouglass) URL: #14173

… pandas columns (#3) Fixes: rapidsai/xdf#322 This PR raises an error when a pandas column with a mix of bools & None are detected i.e., when a boolean column is of type object rather than bool/boolean.

mike-wendt added the Needs Triage Need team to review and classify label Aug 6, 2018

mike-wendt changed the title ~~MapD Thrift interactions in C++ (and C++ in general)~~ MapD Thrift interactions in C++ (and C++ in general) Aug 8, 2018

kkraus14 added proposal Change current process or code and removed Needs Triage Need team to review and classify labels Aug 20, 2018

kkraus14 pushed a commit that referenced this issue Aug 23, 2018

Feature/reflected ops update (#3)

a114a29

* test binary_operator * test one line * essentially use _binaryop with a line flipped * expand to all non commutative reflected ops * revert rmul * rbinaryop function for clarity

kkraus14 closed this as completed Dec 10, 2018

felipeblazing pushed a commit to felipeblazing/cudf that referenced this issue Mar 14, 2019

Merge pull request rapidsai#3 from j-ieong/pr_1032

0250e94

Use GDF_STRING instead of GDF_CATEGORY for string type detection

miguelusque mentioned this issue May 2, 2019

[FEA] cuDF column names stored with different data type compared to Pandas #1593

Closed

lmeyerov mentioned this issue May 8, 2019

[BUG] Groupby on NaNs takes a long time to complete #1329

Closed

raydouglass pushed a commit that referenced this issue May 13, 2019

Merge pull request #3 from VibhuJawa/branch-0.7

3e92606

update to latest base

rjzamora pushed a commit to rjzamora/cudf that referenced this issue Jun 19, 2019

Merge pull request rapidsai#3 from rjzamora/cleanup-serialize

92e99a3

Updating changlog and fixing flake8 errors

zhangjianting mentioned this issue Jun 24, 2019

[FEA] GPU-accelerated spatial and trajectory data management techniques #2093

Closed

kkraus14 pushed a commit that referenced this issue Aug 16, 2019

Merge pull request #3 from rapidsai/branch-0.9

524f9fe

New changes

OlivierNV added a commit to OlivierNV/cudf that referenced this issue Feb 10, 2020

Cosmetic Item rapidsai#3: Remove warp size dependency

18967b0

OlivierNV added a commit to OlivierNV/cudf that referenced this issue Feb 21, 2020

test with serialized warps rapidsai#3

3ea4e63

harrism mentioned this issue Jun 4, 2020

[FEA] Add support for CUDA 11 #5369

Closed

codereport added a commit to codereport/cudf that referenced this issue Jun 22, 2020

Refactor rapidsai#3

59b6c92

codereport added a commit to codereport/cudf that referenced this issue Jun 30, 2020

Refactor rapidsai#3

ba7cb6a

codereport added a commit to codereport/cudf that referenced this issue Jul 2, 2020

Refactor rapidsai#3

dc52162

harrism mentioned this issue Jun 8, 2021

Implement reverse in libcudf #8410

Merged

mythrocks mentioned this issue Feb 25, 2022

Implement all methods of groupby rank aggregation in libcudf, python #9569

Merged

9 tasks

rapids-bot bot pushed a commit that referenced this issue Aug 3, 2022

Merge pull request #3 from bdice/isort-fix

05fefbd

Apply isort

abellina mentioned this issue Nov 3, 2022

[BUG] size_type overflow in cudf::groupby::detail::hash::extract_populated_keys #12058

Closed

rapids-bot bot pushed a commit that referenced this issue Dec 13, 2022

Merge pull request #3 from shwina/c-o-w-2

b415fa1

C o w 2

mythrocks mentioned this issue May 9, 2023

[BUG] Misinterpretation of Parquet List schema with single GROUP child named "array" #13313

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MapD Thrift interactions in C++ (and C++ in general) #3

MapD Thrift interactions in C++ (and C++ in general) #3

wesm commented May 16, 2017

sklam commented May 17, 2017 •

edited

Loading

wesm commented May 18, 2017 •

edited

Loading

m1mc commented May 18, 2017 •

edited

Loading

sklam commented May 18, 2017

wesm commented May 18, 2017

m1mc commented May 18, 2017 •

edited

Loading

sklam commented May 18, 2017

sklam commented May 18, 2017

wesm commented May 18, 2017 •

edited

Loading

wesm commented May 18, 2017

sklam commented May 18, 2017

sklam commented May 18, 2017

m1mc commented May 18, 2017 •

edited

Loading

sklam commented May 19, 2017

wesm commented May 19, 2017

MapD Thrift interactions in C++ (and C++ in general) #3

MapD Thrift interactions in C++ (and C++ in general) #3

Comments

wesm commented May 16, 2017

sklam commented May 17, 2017 • edited Loading

wesm commented May 18, 2017 • edited Loading

m1mc commented May 18, 2017 • edited Loading

sklam commented May 18, 2017

wesm commented May 18, 2017

m1mc commented May 18, 2017 • edited Loading

sklam commented May 18, 2017

sklam commented May 18, 2017

wesm commented May 18, 2017 • edited Loading

wesm commented May 18, 2017

sklam commented May 18, 2017

sklam commented May 18, 2017

m1mc commented May 18, 2017 • edited Loading

sklam commented May 19, 2017

wesm commented May 19, 2017

sklam commented May 17, 2017 •

edited

Loading

wesm commented May 18, 2017 •

edited

Loading

m1mc commented May 18, 2017 •

edited

Loading

m1mc commented May 18, 2017 •

edited

Loading

wesm commented May 18, 2017 •

edited

Loading

m1mc commented May 18, 2017 •

edited

Loading