Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MapD Thrift interactions in C++ (and C++ in general) #3

Closed
wesm opened this issue May 16, 2017 · 15 comments
Closed

MapD Thrift interactions in C++ (and C++ in general) #3

wesm opened this issue May 16, 2017 · 15 comments
Labels
proposal Change current process or code

Comments

@wesm
Copy link
Contributor

wesm commented May 16, 2017

I am wondering if it would be faster and more maintainable to handle serialization, data movement, and other interactions between this system and other platforms (e.g. MapD) in a C++ library that is not Python specific. This would add some complexity to the build system for this project, but that seems inevitable given the nature of the problem (that you'll end up needing to use nvcc at some point to create some support libraries).

@sklam
Copy link
Contributor

sklam commented May 17, 2017

Given that we want to leave room to explore how we do IPC, I don't think we want to commit to a lot of C++ code just yet. However, I think mapd (https://github.com/mapd/mapd-core) may already has that code. If that is the case, that would make things easier.

Btw, there will likely be other parts in pygdf that would need to be in C++ code soon. But, I don't think we need everything to be in C++. At the end, the heavy lifting in done in numba compiled and other GPU code.

@wesm
Copy link
Contributor Author

wesm commented May 18, 2017

At the end, the heavy lifting in done in numba compiled and other GPU code.

This is a somewhat Python-centric view of things -- it would be interesting to define a core implementation that can be used from any program with reasonable C/C++ FFI. This includes Rust, Julia, Go, Lua/LuaJIT, etc. Such a core library would be very small (< 10 KLOC I would say) and focused on data structures, memory management / movement, and APIs for user-defined functions. Numba then could use this public API to plug in custom code written by the user.

@m1mc
Copy link

m1mc commented May 18, 2017

MapD creates the IPC memory but always relays its ownership to pygdf for now. I am all for data exchange layer(that does ref-counting etc.) and maybe renaming pygdf to sth. else. As of UDF, not sure if the use-case is language independent where we might need an ABI, or link its LLVM bitcode.

@sklam
Copy link
Contributor

sklam commented May 18, 2017

Such a core library would be very small (< 10 KLOC I would say) and focused on data structures, memory management / movement, and APIs for user-defined functions.

I am open to the idea for a C++ core library but it is too soon to commit to one. My approach to PyGDF now is to keep it simple and loosely-coupled for things to evolve quickly. There are many technical questions about how the GDF will look like and all the management it needs. There are a lot more things to think about than CPU code:

  • GPU device context ownership?
    • many applications tend to assume sole ownership of all GPU devices and contexts.
  • GPU memory management
    • sharing: unified memory? IPC memory? peer GPU memory access? managed lifetime?
    • special memory: pagelocked host memory? mapped host memory?
    • working with custom allocator in other GPU software
  • Asynchronous execution queue (CUDA Stream)
    • Who owns? Who manage?
    • Async execution could already be a challenge
  • GPU error
    • error may segfault or corrupt the GPU context, forcing the process to terminate and relaunch to recover (see error description in the cuda runtime docs). This can cause problem for multiple applications to share the same process.

By focusing on the IPC front first, we avoid a lot of the technical questions above since each process is isolated in its own process-space. By using Arrow and flatbuffers toolset, we can already share data among multiple languages. Given the requirements for GDF is still evolving, we should focus on the application-side first to get a sense of the usecases. Then, we can come back with a design for the C++ core library if it becomes necessary.

@wesm
Copy link
Contributor Author

wesm commented May 18, 2017

My approach to PyGDF now is to keep it simple and loosely-coupled for things to evolve quickly.

Would it make sense to start drafting some design documents to lay out the requirements and other constraints / implementation considerations and tradeoffs? I think it would be worth doing some up front design work to build consensus around these questions. Among all the people here, there's a huge amount of domain expertise in doing analytics on the GPU, so I think this would be a productive use of time.

With Apache Arrow we spent a lot of time iterating on the specification documents and the mechanics of zero-copy IPC / the Flatbuffer metadata, and I think resolving many of the design questions up front has been a big help.

@m1mc
Copy link

m1mc commented May 18, 2017

@sklam this is a good list of problems to think about that could be longer, but I don't want to see it overcomplicated. Basically we can well isolate the management layer or service regardless of what the use cases are, and all memory de/allocation count on that where probably only IPC memory works. As of error handling this is another topic that could be also well handled. Just need parallelism in tasking.

@sklam
Copy link
Contributor

sklam commented May 18, 2017

Would it make sense to start drafting some design documents to lay out the requirements and other constraints / implementation considerations and tradeoffs?

Yes, I think we should as we want to be open about our design and implementation. Perhaps, we should open a new repo under gpuopenanalytics to put all the design docs and questions. At this stage, I guess the design note would just be "trying out Arrow on the GPU" =).

Also, I would like to get the core cross-language IPC code in its own repo. The arrow flatbuffers code does not need to be in the this repo. And, I don't want folks to think that PyGDF is The GDF.

@wesm, is there a recommended way to ship the generated python flatbuffer code?

@sklam
Copy link
Contributor

sklam commented May 18, 2017

@m1mc , yes, we should focus on IPC memory first. It will make things a lot simpler.

@wesm
Copy link
Contributor Author

wesm commented May 18, 2017

@wesm, is there a recommended way to ship the generated python flatbuffer code?

We can start a separate thread about this. Why would you would want to reimplement the metadata serialization and IPC loading/unloading in Python (vs. simply using libarrow)? If there's some aspect of the libarrow API that's inadequate for this use case, I will be happy to scope out and help do some development to help this project.

@wesm
Copy link
Contributor Author

wesm commented May 18, 2017

One thing I've thought about is defining some C structs to make interacting with raw Arrow memory from LLVM or C simpler. So you could have

typedef struct {
  int64_t length;
  int64_t null_count;
  const uint8_t* valid_bits;
  int type;
} arrow_base_t;

typedef struct {
  struct arrow_base_t base;
  const uint8_t* data;
} arrow_primitive_t;

typedef struct {
  struct arrow_base_t base;
  const int32_t* offsets;
  const uint8_t* data;
} arrow_string_t;

I don't know all the requirements so it would be great to start some design docs in Markdown or some other format.

@sklam
Copy link
Contributor

sklam commented May 18, 2017

@wesm

Why would you would want to reimplement the metadata serialization and IPC loading/unloading in Python (vs. simply using libarrow)? If there's some aspect of the libarrow API that's inadequate for this use case, I will be happy to scope out and help do some development to help this project.

I wouldn't want to reimplement anything if it is already available. But, there are some complication due to the metadata and data being in a single CUDA IPC memory region. It cannot be accessed directly through normal CPU pointer. We need to keep it on the GPU to avoid transfer to host (CPU) memory. Is libarrow willing to include CUDA support?

@sklam
Copy link
Contributor

sklam commented May 18, 2017

(adding to my previous comment)

PyGDF only has the IPC reading part. The metadata is the only portion copied back to the host. The data is kept on the device. The reader tries to minimize device->host transfer.

I am interested to know how mapd implemented the IPC serialization part. (ping @m1mc)

@m1mc
Copy link

m1mc commented May 18, 2017

Since Arrow (up to 0.3.0) only has CPU serializer, we are keeping and serializing results in system memory and upload to the device for now. Otherwise, we had had to come up w/ our own GPU serializer in short term but should be easy to do by generating an extra null bitmap in a kernel. BTW, PyGDF should have got a separate metadata through thrift API.

@sklam
Copy link
Contributor

sklam commented May 19, 2017

BTW, PyGDF should have got a separate metadata through thrift API.

Yes. I just picked the one in the IPC memory. It doesn't have to be that way. The only reason for the current way is that a single memory region feels more consistent. Probably not be a strong reason.

Depending on how the serialization is done, we can just keep one copy of the metadata instead of duplicating it on both host and device.

FYI. Beside the metadata for the Schema, there are small bits of metadata inside the RecordBatches header that need to be parsed on the host.

@wesm
Copy link
Contributor Author

wesm commented May 19, 2017

I would like to start a libarrow_cuda add-on library in the Arrow codebase and try to get some simple IPC loading tests running on the host with the existing codebase to see if there are any issues. If others would like to get involved https://issues.apache.org/jira/browse/ARROW-1055

@mike-wendt mike-wendt added the Needs Triage Need team to review and classify label Aug 6, 2018
@mike-wendt mike-wendt changed the title MapD Thrift interactions in C++ (and C++ in general) MapD Thrift interactions in C++ (and C++ in general) Aug 8, 2018
@kkraus14 kkraus14 added proposal Change current process or code and removed Needs Triage Need team to review and classify labels Aug 20, 2018
kkraus14 pushed a commit that referenced this issue Aug 23, 2018
* test binary_operator

* test one line

* essentially use _binaryop with a line flipped

* expand to all non commutative reflected ops

* revert rmul

* rbinaryop function for clarity
kkraus14 pushed a commit that referenced this issue Nov 27, 2018
* adding eq datetime ops for pygdf

* flake8 fixes

* Drop Python 2.7, Add Python 3.7

* removing int coercion for datetime

* Remove Python 3.7 build

* bumping numba

* forgot to commit meta.yaml changes

* flake8

* commutative addition

* commutative subtraction and multiplication

* reflected floordiv and truediv

* cleanup

* stray comment

* change rsub method

* further testing rsub

* rsub docstring

* revert back

* type coercion

* revert to pseudo-commutative implementation

* commutative ops tests

* test comment cleanup

* Feature/reflected ops noncommutative testing (#1)

* np array solution

* cleanup

* np solution for division

* full reflected ops tests

* cleanup

* switching lambda scalar to 2

* Update README.md

Conda installation instruction needed changes with pygdf version.

* Feature/reflected ops update (#2)

* test binary_operator

* test one line

* essentially use _binaryop with a line flipped

* expand to all non commutative reflected ops

* revert rmul

* Feature/reflected ops update (#3)

* test binary_operator

* test one line

* essentially use _binaryop with a line flipped

* expand to all non commutative reflected ops

* revert rmul

* rbinaryop function for clarity

* add scalar to array generation to avoid division by zero behavior

* remove integer division test due to libgdf bug

* Fix timezone issue when converting from datetime object into datetime64

* Remove unused import to fix flake8

* Initial modifications for new join API
felipeblazing pushed a commit to felipeblazing/cudf that referenced this issue Mar 14, 2019
Use GDF_STRING instead of GDF_CATEGORY for string type detection
raydouglass pushed a commit that referenced this issue May 13, 2019
rjzamora pushed a commit to rjzamora/cudf that referenced this issue Jun 19, 2019
Updating changlog and fixing flake8 errors
kkraus14 pushed a commit that referenced this issue Aug 16, 2019
OlivierNV added a commit to OlivierNV/cudf that referenced this issue Feb 10, 2020
OlivierNV added a commit to OlivierNV/cudf that referenced this issue Feb 21, 2020
codereport added a commit to codereport/cudf that referenced this issue Jun 22, 2020
codereport added a commit to codereport/cudf that referenced this issue Jun 30, 2020
codereport added a commit to codereport/cudf that referenced this issue Jul 2, 2020
rapids-bot bot pushed a commit that referenced this issue Aug 3, 2022
rapids-bot bot pushed a commit that referenced this issue Dec 13, 2022
rapids-bot bot pushed a commit that referenced this issue Jun 9, 2023
This implements stacktrace and adds a stacktrace string into any exception thrown by cudf. By doing so, the exception carries information about where it originated, allowing the downstream application to trace back with much less effort.

Closes #12422.

### Example:
```
#0: cudf/cpp/build/libcudf.so : std::unique_ptr<cudf::column, std::default_delete<cudf::column> > cudf::detail::sorted_order<false>(cudf::table_view, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x446
#1: cudf/cpp/build/libcudf.so : cudf::detail::sorted_order(cudf::table_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x113
#2: cudf/cpp/build/libcudf.so : std::unique_ptr<cudf::column, std::default_delete<cudf::column> > cudf::detail::segmented_sorted_order_common<(cudf::detail::sort_method)1>(cudf::table_view const&, cudf::column_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x66e
#3: cudf/cpp/build/libcudf.so : cudf::detail::segmented_sort_by_key(cudf::table_view const&, cudf::table_view const&, cudf::column_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x88
#4: cudf/cpp/build/libcudf.so : cudf::segmented_sort_by_key(cudf::table_view const&, cudf::table_view const&, cudf::column_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::mr::device_memory_resource*)+0xb9
#5: cudf/cpp/build/gtests/SORT_TEST : ()+0xe3027
#6: cudf/cpp/build/lib/libgtest.so.1.13.0 : void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x8f
#7: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::Test::Run()+0xd6
#8: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::TestInfo::Run()+0x195
#9: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::TestSuite::Run()+0x109
#10: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::internal::UnitTestImpl::RunAllTests()+0x44f
#11: cudf/cpp/build/lib/libgtest.so.1.13.0 : bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0x87
#12: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::UnitTest::Run()+0x95
#13: cudf/cpp/build/gtests/SORT_TEST : ()+0xdb08c
#14: /lib/x86_64-linux-gnu/libc.so.6 : ()+0x29d90
#15: /lib/x86_64-linux-gnu/libc.so.6 : __libc_start_main()+0x80
#16: cudf/cpp/build/gtests/SORT_TEST : ()+0xdf3d5
```

### Usage

In order to retrieve a stacktrace with fully human-readable symbols, some compiling options must be adjusted. To make such adjustment convenient and effortless, a new cmake option (`CUDF_BUILD_STACKTRACE_DEBUG`) has been added. Just set this option to `ON` before building cudf and it will be ready to use.

For downstream applications, whenever a cudf-type exception is thrown, it can retrieve the stored stacktrace and do whatever it wants with it. For example:
```
try {
  // cudf API calls
} catch (cudf::logic_error const& e) {
  std::cout << e.what() << std::endl;
  std::cout << e.stacktrace() << std::endl;
  throw e;
} 
// similar with catching other exception types
```

### Follow-up work

The next step would be patching `rmm` to attach stacktrace into `rmm::` exceptions. Doing so will allow debugging various memory exceptions thrown from libcudf using their stacktrace.


### Note:
 * This feature doesn't require libcudf to be built in Debug mode.
 * The flag `CUDF_BUILD_STACKTRACE_DEBUG` should not be turned on in production as it may affect code optimization. Instead, libcudf compiled with that flag turned on should be used only when needed, when debugging cudf throwing exceptions.
 * This flag removes the current optimization flag from compiling (such as `-O2` or `-O3`, if in Release mode) and replaces by `-Og` (optimize for debugging).
 * If this option is not set to `ON`, the stacktrace will not be available. This is to avoid expensive stracktrace retrieval if the throwing exception is expected.

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Robert Maynard (https://github.com/robertmaynard)
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Jason Lowe (https://github.com/jlowe)

URL: #13298
rapids-bot bot pushed a commit that referenced this issue Sep 22, 2023
Pin conda packages to `aws-sdk-cpp<1.11`. The recent upgrade in version `1.11.*` has caused several issues with cleaning up (more details on changes can be read in [this link](https://github.com/aws/aws-sdk-cpp#version-111-is-now-available)), leading to Distributed and Dask-CUDA processes to segfault. The stack for one of those crashes looks like the following:

```
(gdb) bt
#0  0x00007f5125359a0c in Aws::Utils::Logging::s_aws_logger_redirect_get_log_level(aws_logger*, unsigned int) () from /opt/conda/envs/dask/lib/python3.9/site-packages/pyarrow/../../.././libaws-cpp-sdk-core.so
#1  0x00007f5124968f83 in aws_event_loop_thread () from /opt/conda/envs/dask/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-io.so.1.0.0
#2  0x00007f5124ad9359 in thread_fn () from /opt/conda/envs/dask/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1
#3  0x00007f519958f6db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#4  0x00007f5198b1361f in clone () from /lib/x86_64-linux-gnu/libc.so.6
```

Such segfaults now manifest frequently in CI, and in some cases are reproducible with a hit rate of ~30%. Given the approaching release time, it's probably the safest option to just pin to an older version of the package while we don't pinpoint the exact cause for the issue and a patched build is released upstream.

The `aws-sdk-cpp` is statically-linked in the `pyarrow` pip package, which prevents us from using the same pinning technique. cuDF is currently pinned to `pyarrow=12.0.1` which seems to be built against `aws-sdk-cpp=1.10.*`, as per [recent build logs](https://github.com/apache/arrow/actions/runs/6276453828/job/17046177335?pr=37792#step:6:1372).

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Ray Douglass (https://github.com/raydouglass)

URL: #14173
raydouglass pushed a commit that referenced this issue Nov 7, 2023
… pandas columns (#3)

Fixes: rapidsai/xdf#322

This PR raises an error when a pandas column with a mix of bools & None are detected i.e., when a boolean column is of type object rather than bool/boolean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal Change current process or code
Projects
None yet
Development

No branches or pull requests

5 participants