-
Notifications
You must be signed in to change notification settings - Fork 917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collaborations on columnar data structures #1
Comments
Hi @wesm, thanks for this. Yes we are excited about Arrow (even though we are only supporting a subset at the moment) because it provides interoperability with lots of other things and makes sense as a way to represent columnar data. I don't see any issues why it should not be performant on GPU, as the MapD native format is quite similar (except we store nulls in-line when possible to save space and bandwidth). Would it make sense to set up a call with the project members so we can discuss ways to collaborate? |
That sounds good to me. Adding @julienledem @xhochy since they will be interested, and maybe other from the Apache Arrow team. I am interested in
As background, I did some GPU development for accelerating Bayesian inference problems years ago and did a fair amount of CUDA C and PyCUDA work, so I've had a long-standing interest in architecting data structures and memory access patterns for the GPU. |
Bingo on all fronts, all things mentioned in the talk I gave last week at GTC. We have also some basic work to do for supporting the rest of the data types (prototype did only simple, uncompress numerics to keep it simple). |
Does the GPU benefit from columnar compression techniques like CPU-based columnar databases do? |
@wesm, we already have some in core engine like dictionary compression. And we are planning to tokenize any string column that only has digits to save memory, but they don't require to be columnar if you just mean sth. like RLE or HCC. All ways aim to keep GPU decoding fast. |
PFOR and run-length also planned, not yet done.
…On Thu, May 18, 2017 at 2:56 PM, Minggang Yu ***@***.***> wrote:
@wesm <https://github.com/wesm>, we already have some in core engine like
dictionary compression. And we are planning to tokenize any string column
that only has digits to save memory.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHEP-HaH_AM9vNpUNhov3XQTU5sSbKSkks5r7L6JgaJpZM4NUBSb>
.
|
* np array solution * cleanup * np solution for division * full reflected ops tests
* adding eq datetime ops for pygdf * flake8 fixes * Drop Python 2.7, Add Python 3.7 * removing int coercion for datetime * Remove Python 3.7 build * bumping numba * forgot to commit meta.yaml changes * flake8 * commutative addition * commutative subtraction and multiplication * reflected floordiv and truediv * cleanup * stray comment * change rsub method * further testing rsub * rsub docstring * revert back * type coercion * revert to pseudo-commutative implementation * commutative ops tests * test comment cleanup * Feature/reflected ops noncommutative testing (#1) * np array solution * cleanup * np solution for division * full reflected ops tests * cleanup * switching lambda scalar to 2 * Update README.md Conda installation instruction needed changes with pygdf version. * Feature/reflected ops update (#2) * test binary_operator * test one line * essentially use _binaryop with a line flipped * expand to all non commutative reflected ops * revert rmul * Feature/reflected ops update (#3) * test binary_operator * test one line * essentially use _binaryop with a line flipped * expand to all non commutative reflected ops * revert rmul * rbinaryop function for clarity * add scalar to array generation to avoid division by zero behavior * remove integer division test due to libgdf bug * Fix timezone issue when converting from datetime object into datetime64 * Remove unused import to fix flake8 * Initial modifications for new join API
Fea/cudf empty groubpy cont
* ENH: Support `GDF_BOOL8` in Parquet reader * Translate `parquet::BOOLEAN` to `GDF_BOOL8` * Remove no-longer-necessary type conversion in pytests * ENH: Support `GDF_BOOL8` in CSV reader * Add new type detection count for booleans * Add `true` and `false` as detected boolean values * Add extra pytest case for bools * ENH: Support `GDF_BOOL8` in CSV writer * Uncomment lines to call nvstring to convert booleans to string * Fix wrong order of na, true, false arguments * Add boolean column to gtest * Update CHANGELOG.md for PR * first draft of inequality_comparator to replace LesserRTTI * fixed build issues * Added optimization for inequality comparator so that it is faster if there are no nulls * ENH: Support `GDF_BOOL8` in ORC reader * Translate `orc::BOOLEAN` to `GDF_BOOL8` * Remove no-longer-necessary type conversion in pytests * WIP and compiles * split nulls and non nulls operators. This increases compile time. Commiting here anyways for historical reasons * fixed inequality_comparator, updated group by to use new equality_comparator, removed no longer needed null handling flag from context * cleaned up * updated CHANGELOG * improved formatting and added changes that somehow did not make it in a previous commit * fixed issues caused by formatting in previous commit * Fix issue by handling multiindex in series groupby * CHANGELOG * CHANGELOG again * refactored based on PR feedback and added more code documentation * Change output datatype for count groupby to np.int32 * Add assert to ensure count() datatype is updated if gdf_size_type changes * Add changelog entry * Don't check for dtype when doing groupby-count * Split device_atomics.cuh file split the file into `device_atomics.cuh` and `device_operators.cuh` separated the difinition of the device operators * Remove atomicCASImpl(int8 or int16) move atomicCASImpl(int8 or int16) into typesAtomicCASImpl * Simplify `atomicAdd` * Simplify atomicMin, atomixMax simplify atomicMin, atomixMax add cudf::bool8 for atomic test case for atomicAdd,Min,Max add cudf::bool8 specialization for genericAtomicOperation * Add more test coverage * Simplify atomicAnd/Or/Xor * Removed `genericAtomicOperationUnderlyingType` * Remove `typesAtomicOperation32|64` * Update doxygen texts for atomics * Add '__forceinline__ __device__' Add '__forceinline__ __device__' to `W genericAtomicOperator(W)` * add static_assert for long long int size Add size check assert between `long long int` and `int64_t` * remove redundant `sizeof(T)` from `CASImpl` remove redundant `sizeof(T)` when calling 'typesAtomicCASImpl` * remove redundant `sizeof(T)` from `atomic op impl` remove redundant `sizeof(T)` when calling 'genericAtomicOperationImpl` * Add `genericAtomicOperationImpl(int64_t, Sum)` Add native atomicAdd(uint64_t) call for sint64_t * Add comment for impl of atomicAdd(int64_t) Add comment for `genericAtomicOperationImpl<int64_t, DeviceSum, 8>` why it uses atomicAdd(uint64) inside * Removed `genericAtomicOperation(W)` Removed `genericAtomicOperation(W)` since it is not invoked for cudf::wrapper types. Merged it into `genericAtomicOperation(T)` Add size check assert at `type_reinterpret`. * CHANGELOG. * CHANGELOG. * Don't check for dtype when doing groupby-count in test_string.pu * Update CHANGELOG.md Co-Authored-By: Keith Kraus <[email protected]> * update changelog * Concatenate multiindexes. * Use temporary buffer for `NvString:create_from_bool` for GDF_BOOL8 * No guarantee that `cudf::bool8` and `bool` are same type for cast * Insane MultiIndex _concat method and many supporting tests. * Fix style and CHANGELOG * Add two more inverse tests. * CSV reader: support specifying a subset of dtypes when using usecols parameter. Include C++ API changes * Complete the support for partial dtype list w/ usecols. Expanded the test and refactored the dtype assignment. * remove unused include * Update CHANGELOG.md * fix Python style * Fix error cheking when setting the dtype array. * implemented PR feedback * Fix handling read only schema buffers in gpuarrow reader * Changelog # * Remove insane multiindex concatenation block and raise NotImplementedError instead. * typo fixed * Use one line list comprehension and eliminate shallow copies now that the _concat is not performed with levels/codes multiindices. * One more single line fix * Change the dtype behavior with usecols and list dtype parameter - user need to specify all column types, not just the active ones. * format fix * correct a comment * Handle more generalized numpy input instead of forcing unsigned char * Change test to use foreign memory similar to OmniSci * Update CHANGELOG.md * fixed build issue * REL v0.7.2 release * fix groupby count dtype issue * maintain the original series name in series.unique output * changelog * ENH: Add test for cudf::bool8 in booleans gtest * Fill mask with zeros when making a null column * Fix merge * Merge branch 'ohe-perf' of https://github.com/RFinkelberg/cudf into ohe-perf # Please enter a commit message to explain why this merge is necessary, # especially if it merges an updated upstream into a topic branch. # # Lines starting with '#' will be ignored, and an empty message aborts # the commit. * Some cleanup of bindings and fixes for s_v and v_s binops
Minor cudf serialization improvements
Fix for missing series index in isin
add factory methods for DType
1. Output struct size must match target column, not source. 2. Test case for rapidsai#1. 3. Also, move scatter_struct_tests and scatter_list_tests from .cu to .cpp, for faster compile.
Fix style checks for string_udfs.
…untime-checks Remove runtime checks for CUDA versions from strings_udf
This implements stacktrace and adds a stacktrace string into any exception thrown by cudf. By doing so, the exception carries information about where it originated, allowing the downstream application to trace back with much less effort. Closes #12422. ### Example: ``` #0: cudf/cpp/build/libcudf.so : std::unique_ptr<cudf::column, std::default_delete<cudf::column> > cudf::detail::sorted_order<false>(cudf::table_view, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x446 #1: cudf/cpp/build/libcudf.so : cudf::detail::sorted_order(cudf::table_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x113 #2: cudf/cpp/build/libcudf.so : std::unique_ptr<cudf::column, std::default_delete<cudf::column> > cudf::detail::segmented_sorted_order_common<(cudf::detail::sort_method)1>(cudf::table_view const&, cudf::column_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x66e #3: cudf/cpp/build/libcudf.so : cudf::detail::segmented_sort_by_key(cudf::table_view const&, cudf::table_view const&, cudf::column_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x88 #4: cudf/cpp/build/libcudf.so : cudf::segmented_sort_by_key(cudf::table_view const&, cudf::table_view const&, cudf::column_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::mr::device_memory_resource*)+0xb9 #5: cudf/cpp/build/gtests/SORT_TEST : ()+0xe3027 #6: cudf/cpp/build/lib/libgtest.so.1.13.0 : void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x8f #7: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::Test::Run()+0xd6 #8: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::TestInfo::Run()+0x195 #9: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::TestSuite::Run()+0x109 #10: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::internal::UnitTestImpl::RunAllTests()+0x44f #11: cudf/cpp/build/lib/libgtest.so.1.13.0 : bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0x87 #12: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::UnitTest::Run()+0x95 #13: cudf/cpp/build/gtests/SORT_TEST : ()+0xdb08c #14: /lib/x86_64-linux-gnu/libc.so.6 : ()+0x29d90 #15: /lib/x86_64-linux-gnu/libc.so.6 : __libc_start_main()+0x80 #16: cudf/cpp/build/gtests/SORT_TEST : ()+0xdf3d5 ``` ### Usage In order to retrieve a stacktrace with fully human-readable symbols, some compiling options must be adjusted. To make such adjustment convenient and effortless, a new cmake option (`CUDF_BUILD_STACKTRACE_DEBUG`) has been added. Just set this option to `ON` before building cudf and it will be ready to use. For downstream applications, whenever a cudf-type exception is thrown, it can retrieve the stored stacktrace and do whatever it wants with it. For example: ``` try { // cudf API calls } catch (cudf::logic_error const& e) { std::cout << e.what() << std::endl; std::cout << e.stacktrace() << std::endl; throw e; } // similar with catching other exception types ``` ### Follow-up work The next step would be patching `rmm` to attach stacktrace into `rmm::` exceptions. Doing so will allow debugging various memory exceptions thrown from libcudf using their stacktrace. ### Note: * This feature doesn't require libcudf to be built in Debug mode. * The flag `CUDF_BUILD_STACKTRACE_DEBUG` should not be turned on in production as it may affect code optimization. Instead, libcudf compiled with that flag turned on should be used only when needed, when debugging cudf throwing exceptions. * This flag removes the current optimization flag from compiling (such as `-O2` or `-O3`, if in Release mode) and replaces by `-Og` (optimize for debugging). * If this option is not set to `ON`, the stacktrace will not be available. This is to avoid expensive stracktrace retrieval if the throwing exception is expected. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - Robert Maynard (https://github.com/robertmaynard) - Vyas Ramasubramani (https://github.com/vyasr) - Jason Lowe (https://github.com/jlowe) URL: #13298
Pin conda packages to `aws-sdk-cpp<1.11`. The recent upgrade in version `1.11.*` has caused several issues with cleaning up (more details on changes can be read in [this link](https://github.com/aws/aws-sdk-cpp#version-111-is-now-available)), leading to Distributed and Dask-CUDA processes to segfault. The stack for one of those crashes looks like the following: ``` (gdb) bt #0 0x00007f5125359a0c in Aws::Utils::Logging::s_aws_logger_redirect_get_log_level(aws_logger*, unsigned int) () from /opt/conda/envs/dask/lib/python3.9/site-packages/pyarrow/../../.././libaws-cpp-sdk-core.so #1 0x00007f5124968f83 in aws_event_loop_thread () from /opt/conda/envs/dask/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-io.so.1.0.0 #2 0x00007f5124ad9359 in thread_fn () from /opt/conda/envs/dask/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1 #3 0x00007f519958f6db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #4 0x00007f5198b1361f in clone () from /lib/x86_64-linux-gnu/libc.so.6 ``` Such segfaults now manifest frequently in CI, and in some cases are reproducible with a hit rate of ~30%. Given the approaching release time, it's probably the safest option to just pin to an older version of the package while we don't pinpoint the exact cause for the issue and a patched build is released upstream. The `aws-sdk-cpp` is statically-linked in the `pyarrow` pip package, which prevents us from using the same pinning technique. cuDF is currently pinned to `pyarrow=12.0.1` which seems to be built against `aws-sdk-cpp=1.10.*`, as per [recent build logs](https://github.com/apache/arrow/actions/runs/6276453828/job/17046177335?pr=37792#step:6:1372). Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Ray Douglass (https://github.com/raydouglass) URL: #14173
Excited to see this new org created. I am interested to see if Apache Arrow (i.e. contiguous columnar data, validity bitmap for nulls) is the appropriate data model for data on the GPU, and if we can collaborate on some aspects of the code. It seems that CUDA 7 now supports C++11, so in theory we could compile the Arrow C++ libraries with nvcc and provide necessary APIs to enable Numba to interact with the raw memory buffers. This might simplify IPC with GPU main memory (record batch loading and unloading) and make less work for you here. I have an NVIDIA GPU on my home desktop, so I could help with testing.
The text was updated successfully, but these errors were encountered: