Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] FST benchmark #1

Conversation

karthikeyann
Copy link
Collaborator

DO NOT MERGE. This PR is to compare the diff.
Benchmark for Finite State Transducer
parse and identify JSON symbols

vyasr and others added 19 commits July 20, 2022 23:30
This resolves rapidsai#8104. Note that that issue also requests an update to requirements.txt files, but those no longer exist.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - https://github.com/brandon-b-miller
  - H. Thomson Comer (https://github.com/thomcom)

URL: rapidsai#11306
In my endless wandering through parquet code, I found this unused code. Removing it.

Authors:
  - Mike Wilson (https://github.com/hyperbolic2346)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Yunsong Wang (https://github.com/PointKernel)

URL: rapidsai#11305
This just adds in a simple JNI binding for the join_strings cudf function.

Authors:
  - Robert (Bobby) Evans (https://github.com/revans2)

Approvers:
  - Alessandro Bellina (https://github.com/abellina)
  - Raza Jafri (https://github.com/razajafri)
  - Jason Lowe (https://github.com/jlowe)

URL: rapidsai#11309
Closes rapidsai#10994.

This PR removes the Arrow CUDA-IPC related code we have, which has two benefits:

1. It deletes code (I have confirmed that no one uses this code today)
2. It removes our dependency on Arrow CUDA, which contributes towards removing our shared lib dependency on `libcuda.so`

Authors:
  - Ashwin Srinath (https://github.com/shwina)

Approvers:
  - Robert Maynard (https://github.com/robertmaynard)
  - Matthew Roeschke (https://github.com/mroeschke)
  - Bradley Dice (https://github.com/bdice)
  - AJ Schmidt (https://github.com/ajschmidt8)
  - https://github.com/jakirkham

URL: rapidsai#10995
…i#11252)

When handling list columns in the parquet reader, we have to run a preprocess step that computes several things per-page before we can decode values.  If the user has further specified artificial row bounds (`skip_rows`, `min_rows`) we have to do a second pass during the preprocess step.

If the user has _not_ specified row bounds, there is no need to do this; however the code was naively always doing so.  This PR simply detects when we're reading all rows (which is 99% of use cases) and skips the second pass.

Also includes some cleanup of redundant stream synchronizations.

Also worth mentioning, this `skip_rows`/`num_rows` feature is going to be deprecated in 22.08 so we will be able to follow up further in 22.10 to rip more of this code out.

Authors:
  - https://github.com/nvdbaranec

Approvers:
  - Mike Wilson (https://github.com/hyperbolic2346)
  - Jim Brennan (https://github.com/jbrennan333)
  - Nghia Truong (https://github.com/ttnghia)

URL: rapidsai#11252
This PR:

- [x] Deprecates `skiprows` & `num_rows` from cudf parquet reader (`cudf.read_parquet`) since these parameters are adding to a lot of overhead incase of nested types and also not supported in `pd.read_parquet`

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Vukasin Milovanovic (https://github.com/vuule)
  - Richard (Rick) Zamora (https://github.com/rjzamora)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: rapidsai#11218
Fixes: rapidsai#11256 
This PR fixes an issue with type casting when non-numpy dtypes are passed into the column constructor.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Ashwin Srinath (https://github.com/shwina)

URL: rapidsai#11282
Closes rapidsai#10948 

Adds support for dictionary encoding with 24 bit indices.

Authors:
  - Devavret Makkar (https://github.com/devavret)

Approvers:
  - David Wendt (https://github.com/davidwendt)
  - Vukasin Milovanovic (https://github.com/vuule)

URL: rapidsai#11216
This change should make the test fail reliably, whereas the current approach is flaky and leads to not infrequent test failures.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Jake Hemstad (https://github.com/jrhemstad)
  - Nghia Truong (https://github.com/ttnghia)
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#11326
This PR completely removes `cudf::lists::drop_list_duplicates`. It is replaced by the new API `cudf::list::distinct` which has a simpler implementation but better performance. The replacements for internal cudf usage have all been merged before thus there is no side effect or breaking for the existing APIs in this work.

Closes rapidsai#11114, rapidsai#11093, rapidsai#11053, rapidsai#11034, and closes rapidsai#9257.

Depends on:
 * rapidsai#11228
 * rapidsai#11149
 * rapidsai#11234
 * rapidsai#11233

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - Jordan Jacobelli (https://github.com/Ethyling)
  - Robert Maynard (https://github.com/robertmaynard)
  - Vukasin Milovanovic (https://github.com/vuule)
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#11236
This PR adds a parallel _Finite-State Transducer_ (FST) algorithm. The FST is a key component of the nested JSON parser.

# Background


**An example of a Finite-State Transducer (FST) // aka the algorithm which we try to mimic**:
[Slides from the JSON parser presentation, Slides 11-17](https://docs.google.com/presentation/d/1NTQdUMM44NzzHxLNnvcGLQk6pI-fdoM3cXqNqushMbU/edit?usp=sharing)

## Our GPU-based implementation
**The GPU-based algorithm builds on the following work:**
[ParPaRaw: Massively Parallel Parsing of Delimiter-Separated Raw Data](https://arxiv.org/pdf/1905.13415.pdf)

**The following sections are of relevance:**
- Section 3.1
- Section 4.5 (i.e., the Multi-fragment in-register array)

**How the algorithm works is illustrated in the following presentation:**
[ParPaRaw @VlLDB'20](https://eliasstehle.com/media/parparaw_vldb_2020.pdf#page=21)

## Relevent Data Structures
**A word about the motivation and need for the _Multi-fragment in-register array_:**

The composition over to state-transaction vectors is a key operation (in the prefix scan). Basically, what it does for two state-transition vectors `lhs` and `rhs`, both comprising `N` items:
```
for (int32_t i = 0; i < N; ++i) {
  result[n] = rhs[lhs[i]];
}
return result;
```


The relevant part is the indexing into `rhs`: `rhs[lhs[i]]`, i.e., the index is `lhs[i]`, a runtime value that isn't known at compile time. It's important to understand that in CUB's prefix scan both `rhs` and `lhs` are thread-local variables. As such, they either live in the fast register file or in (slow off-chip) local memory. 
The register file has a shortcoming, it cannot be indexed dynamically. And here, we are dynamically indexing into `rhs`. So `rhs` will need to be spilled to local memory (backed by device memory) to allow for dynamic indexing. This would usually make the algorithm very slow. That's why we have the _Multi-fragment in-register array_. For its implementation details I'd suggest reading [Section 4.5](https://arxiv.org/pdf/1905.13415.pdf).

In contrast, the following example is fine and `foo` will be mapped to registers, because the loop can be unrolled, and, if `N` is known at compile time and sufficiently small (of at most tens of items).
```
// this is fine, if N is a compile-time constant 
for (int32_t i = 1; i < N; ++i) {
  foo[n] = foo[n-1];
}
```

# Style & CUB Integration

The following may be considered for being integrated into CUB at a later point, hence the deviation in style from cuDF.

- `in_reg_array.cuh`
- `agent_dfa.cuh`
- `device_dfa.cuh`
- `dispatch_dfa.cuh`

Authors:
  - Elias Stehle (https://github.com/elstehle)

Approvers:
  - Vukasin Milovanovic (https://github.com/vuule)
  - Tobias Ribizel (https://github.com/upsj)
  - Karthikeyan (https://github.com/karthikeyann)

URL: rapidsai#11242
…imate of sizes (rapidsai#11288)

This is a possible workaround for issue rapidsai#11280.  We have a goal to support NVCOMP ZSTD in 22.08, so a short-term fix is desired.

There is a heuristic in `gpuParseCompressedStripeData` to estimate the size of the decompress buffer for very small compressed blocks.  For ZSTD, it is possible to have a high enough compression ratio that this heuristic underestimates the needed decompress size.

This pr adds a boolean parameter to allow us to disable the block size estimate for ZSTD.  When the estimate is disabled, it falls back to the maximum block size, which is guaranteed to be big enough.

cc: @devavret, @vuule

Authors:
  - Jim Brennan (https://github.com/jbrennan333)

Approvers:
  - Vukasin Milovanovic (https://github.com/vuule)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: rapidsai#11288
Resolves rapidsai#3036 by making `make test` or `ninja test` default to showing output when tests fail.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Robert Maynard (https://github.com/robertmaynard)
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#11321
I recently revamped our cuDF [CONTRIBUTING guide](https://github.com/rapidsai/cudf/blob/HEAD/CONTRIBUTING.md). I would like to consider replacing the current PR template (which has a fairly daunting amount of text that is immediately deleted by many contributors) with a short checklist of actionable items and a reference to the CONTRIBUTING guide for longer content.

I kept this draft very minimal. Reviewers can see other examples here for inspiration: https://axolo.co/blog/p/part-3-github-pull-request-template. Happy to crowdsource others' thoughts here.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)
  - Jake Hemstad (https://github.com/jrhemstad)
  - Karthikeyan (https://github.com/karthikeyann)
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: rapidsai#10774
…ins` (rapidsai#11330)

The current implementation of `cudf::detail::contains` can process input with arbitrary nested types. However, it was reported to have severe performance issue when the input tables have many duplicate rows (rapidsai#11299). In order to fix the issue, rapidsai#11310 and rapidsai#11325 was created. 

Unfortunately, rapidsai#11310 is separating semi-anti-join from `cudf::detail::contains`, causing duplicate implementation. On the other hand, rapidsai#11325 can address the issue rapidsai#11299 but semi-anti-join using it still performs worse than the previous semi-anti-join implementation.

The changes in this PR include the following:
 * Fix the performance issue reported in rapidsai#11299 for the current `cudf::detail::contains` implementation that support nested types.
 * Add a separate code path into `cudf::detail::contains` such that:
     * Input without having lists column (at any nested level) will be processed by the code path that is the same as the old implementation of semi-anti-join. This is to make sure the performance of semi-anti-join will remain the same as before.
     * Input with nested lists column, or NaNs compared as unequal, will be processed by another code path that supports nested types and different NaNs behavior. This will make sure support for nested types will not be dropped.

Closes rapidsai#11299.

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - Yunsong Wang (https://github.com/PointKernel)
  - Bradley Dice (https://github.com/bdice)
  - MithunR (https://github.com/mythrocks)
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Mike Wilson (https://github.com/hyperbolic2346)
  - Alessandro Bellina (https://github.com/abellina)

URL: rapidsai#11330
rapids-bot bot pushed a commit to rapidsai/cudf that referenced this pull request Jul 26, 2022
Depends on #11242 Feature/finite state transducer 

Benchmark for Finite State Transducer
parse and identify JSON symbols
- [x] FST with output, output index, output str
- [x] FST without output index
- [x] FST without, output
- [x] FST without output str

Look into elstehle#1 for files modified only in this PR (i.e excluding parent depending PR)

Authors:
  - Karthikeyan (https://github.com/karthikeyann)
  - Elias Stehle (https://github.com/elstehle)

Approvers:
  - Yunsong Wang (https://github.com/PointKernel)
  - Elias Stehle (https://github.com/elstehle)

URL: #11243
@karthikeyann karthikeyann deleted the fst_benchmark branch July 26, 2022 09:50
@karthikeyann karthikeyann restored the fst_benchmark branch August 4, 2022 16:25
elstehle pushed a commit that referenced this pull request Jun 13, 2023
This implements stacktrace and adds a stacktrace string into any exception thrown by cudf. By doing so, the exception carries information about where it originated, allowing the downstream application to trace back with much less effort.

Closes rapidsai#12422.

### Example:
```
#0: cudf/cpp/build/libcudf.so : std::unique_ptr<cudf::column, std::default_delete<cudf::column> > cudf::detail::sorted_order<false>(cudf::table_view, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x446
#1: cudf/cpp/build/libcudf.so : cudf::detail::sorted_order(cudf::table_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x113
#2: cudf/cpp/build/libcudf.so : std::unique_ptr<cudf::column, std::default_delete<cudf::column> > cudf::detail::segmented_sorted_order_common<(cudf::detail::sort_method)1>(cudf::table_view const&, cudf::column_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x66e
#3: cudf/cpp/build/libcudf.so : cudf::detail::segmented_sort_by_key(cudf::table_view const&, cudf::table_view const&, cudf::column_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x88
#4: cudf/cpp/build/libcudf.so : cudf::segmented_sort_by_key(cudf::table_view const&, cudf::table_view const&, cudf::column_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::mr::device_memory_resource*)+0xb9
rapidsai#5: cudf/cpp/build/gtests/SORT_TEST : ()+0xe3027
rapidsai#6: cudf/cpp/build/lib/libgtest.so.1.13.0 : void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x8f
rapidsai#7: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::Test::Run()+0xd6
rapidsai#8: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::TestInfo::Run()+0x195
rapidsai#9: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::TestSuite::Run()+0x109
rapidsai#10: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::internal::UnitTestImpl::RunAllTests()+0x44f
rapidsai#11: cudf/cpp/build/lib/libgtest.so.1.13.0 : bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0x87
rapidsai#12: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::UnitTest::Run()+0x95
rapidsai#13: cudf/cpp/build/gtests/SORT_TEST : ()+0xdb08c
rapidsai#14: /lib/x86_64-linux-gnu/libc.so.6 : ()+0x29d90
rapidsai#15: /lib/x86_64-linux-gnu/libc.so.6 : __libc_start_main()+0x80
rapidsai#16: cudf/cpp/build/gtests/SORT_TEST : ()+0xdf3d5
```

### Usage

In order to retrieve a stacktrace with fully human-readable symbols, some compiling options must be adjusted. To make such adjustment convenient and effortless, a new cmake option (`CUDF_BUILD_STACKTRACE_DEBUG`) has been added. Just set this option to `ON` before building cudf and it will be ready to use.

For downstream applications, whenever a cudf-type exception is thrown, it can retrieve the stored stacktrace and do whatever it wants with it. For example:
```
try {
  // cudf API calls
} catch (cudf::logic_error const& e) {
  std::cout << e.what() << std::endl;
  std::cout << e.stacktrace() << std::endl;
  throw e;
} 
// similar with catching other exception types
```

### Follow-up work

The next step would be patching `rmm` to attach stacktrace into `rmm::` exceptions. Doing so will allow debugging various memory exceptions thrown from libcudf using their stacktrace.


### Note:
 * This feature doesn't require libcudf to be built in Debug mode.
 * The flag `CUDF_BUILD_STACKTRACE_DEBUG` should not be turned on in production as it may affect code optimization. Instead, libcudf compiled with that flag turned on should be used only when needed, when debugging cudf throwing exceptions.
 * This flag removes the current optimization flag from compiling (such as `-O2` or `-O3`, if in Release mode) and replaces by `-Og` (optimize for debugging).
 * If this option is not set to `ON`, the stacktrace will not be available. This is to avoid expensive stracktrace retrieval if the throwing exception is expected.

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Robert Maynard (https://github.com/robertmaynard)
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Jason Lowe (https://github.com/jlowe)

URL: rapidsai#13298
elstehle pushed a commit that referenced this pull request Oct 5, 2023
Pin conda packages to `aws-sdk-cpp<1.11`. The recent upgrade in version `1.11.*` has caused several issues with cleaning up (more details on changes can be read in [this link](https://github.com/aws/aws-sdk-cpp#version-111-is-now-available)), leading to Distributed and Dask-CUDA processes to segfault. The stack for one of those crashes looks like the following:

```
(gdb) bt
#0  0x00007f5125359a0c in Aws::Utils::Logging::s_aws_logger_redirect_get_log_level(aws_logger*, unsigned int) () from /opt/conda/envs/dask/lib/python3.9/site-packages/pyarrow/../../.././libaws-cpp-sdk-core.so
#1  0x00007f5124968f83 in aws_event_loop_thread () from /opt/conda/envs/dask/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-io.so.1.0.0
#2  0x00007f5124ad9359 in thread_fn () from /opt/conda/envs/dask/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1
#3  0x00007f519958f6db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#4  0x00007f5198b1361f in clone () from /lib/x86_64-linux-gnu/libc.so.6
```

Such segfaults now manifest frequently in CI, and in some cases are reproducible with a hit rate of ~30%. Given the approaching release time, it's probably the safest option to just pin to an older version of the package while we don't pinpoint the exact cause for the issue and a patched build is released upstream.

The `aws-sdk-cpp` is statically-linked in the `pyarrow` pip package, which prevents us from using the same pinning technique. cuDF is currently pinned to `pyarrow=12.0.1` which seems to be built against `aws-sdk-cpp=1.10.*`, as per [recent build logs](https://github.com/apache/arrow/actions/runs/6276453828/job/17046177335?pr=37792#step:6:1372).

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Ray Douglass (https://github.com/raydouglass)

URL: rapidsai#14173
elstehle pushed a commit that referenced this pull request Dec 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.