Example code for blog on new row comparators #13795

divyegala · 2023-08-01T19:28:09Z

Description

Example code using a few libcudf APIs to demonstrate nested-type usage.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

karthikeyann · 2023-08-18T02:25:53Z

@divyegala which is a good location for adding code to round tripping data using JSON reader and writer?

karthikeyann · 2023-08-18T12:40:14Z

Added metadata round trip for json reader, writer usages.

divyegala · 2023-08-18T13:11:48Z

@karthikeyann can you rebase and remove your commits? I removed example.cpp and only have deduplication.cpp in my working branch locally. When I try to merge your commits and run the examples, I get std::bad_alloc() and I'm probably doing a bad merge somewhere. It'll be easier for me to push my commits and for you to apply yours on top of mine

karthikeyann · 2023-08-23T17:50:58Z

The cmake compilation didn’t work right away.
Added -DCMAKE_CUDA_ARCHITECTURES=native to all cmake commands in cpp/examples/build.sh to compile.

GregoryKimball · 2023-09-20T19:24:43Z

@divyegala Would you please update this example to use an RMM pool? For the strings example, @davidwendt added this in examples/strings/common.hpp

copy-pr-bot · 2023-09-21T19:52:49Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

divyegala · 2023-09-21T19:53:05Z

/ok to test

cpp/examples/nested_types/deduplication.cpp

Co-authored-by: Gregory Kimball <[email protected]>

vyasr · 2023-11-14T19:04:02Z

/ok to test

cpp/examples/nested_types/CMakeLists.txt

cpp/examples/nested_types/deduplication.cpp

bdice · 2023-11-14T19:28:55Z

cpp/examples/nested_types/deduplication.cpp

+{
+  // Get count for each key
+  auto keys = cudf::table_view{{tbl.column(0)}};
+  auto val  = cudf::make_numeric_column(cudf::data_type{cudf::type_id::INT32}, keys.num_rows());


We need to file an issue that makes this use hash-based aggregations without any "forcing"

I think Vyas is asking, if you don't add this column to "force" hash aggregations, are some of the aggregations hash-based and others sort-based? My understanding of libcudf's behavior is that if any aggregation is sort-based, all the aggregations fall back to using sort-based implementations. Is that true?

cpp/examples/nested_types/deduplication.cpp

cpp/examples/nested_types/example.json

Co-authored-by: Bradley Dice <[email protected]>

divyegala · 2023-11-14T22:23:23Z

/ok to test

divyegala · 2023-11-14T22:36:43Z

/ok to test

cpp/examples/nested_types/deduplication.cpp

Co-authored-by: Bradley Dice <[email protected]>

divyegala · 2023-11-15T17:34:41Z

/ok to test

karthikeyann

LGTM 👍

vyasr · 2023-11-16T00:06:31Z

/ok to test

divyegala · 2023-11-16T02:04:22Z

/merge

In #13795, we found out that `nullable()` causes severe perf degradation for the nested-type case when the input is read from file via `cudf::io::read_json`. This is because the JSON reader adds a null mask for columns that don't have NULLs. This change is a no-overhead replacement that checks the actual null count instead of checking if a null mask is present. This PR also solves a bug in quantile/median groupby where NULLs were being [set](https://github.com/rapidsai/cudf/blob/8deb3dd7573000e7d87f18a9e2bbe39cf2932e10/cpp/src/groupby/sort/group_quantiles.cu#L73) but the null count was not updated. Authors: - Divye Gala (https://github.com/divyegala) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) - Vyas Ramasubramani (https://github.com/vyasr) - David Wendt (https://github.com/davidwendt) - Nghia Truong (https://github.com/ttnghia) URL: #14363

first draft example

34b8fb4

github-actions bot added libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue labels Aug 1, 2023

add dedup example

9059909

karthikeyann added doc Documentation non-breaking Non-breaking change labels Aug 18, 2023

divyegala added 2 commits August 18, 2023 05:55

write new example

5f3b0da

delete old example

a15dfef

divyegala force-pushed the blog-example branch from 0fcd8e6 to a15dfef Compare August 18, 2023 16:34

karthikeyann and others added 4 commits August 18, 2023 23:46

add metadata to json writer

776b10d

Merge branch 'branch-23.10' into blog-example

3c2c0ee

add new column name

fa6f928

Merge branch 'branch-23.10' into blog-example

d0da110

add sort, drop filtering

e1f6c7c

GregoryKimball assigned divyegala Sep 5, 2023

divyegala marked this pull request as ready for review September 19, 2023 21:43

divyegala requested a review from a team as a code owner September 19, 2023 21:43

divyegala requested review from hyperbolic2346 and karthikeyann September 19, 2023 21:43

divyegala added 2 commits September 21, 2023 12:50

add pool mr, reduce unnecessary table_view copy

df45de2

Merge remote-tracking branch 'upstream/branch-23.10' into blog-example

c12bc71

GregoryKimball reviewed Sep 22, 2023

View reviewed changes

cpp/examples/nested_types/deduplication.cpp Outdated Show resolved Hide resolved

Update cpp/examples/nested_types/deduplication.cpp

496c28a

Co-authored-by: Gregory Kimball <[email protected]>

github-actions bot added the ci label Nov 14, 2023

vyasr approved these changes Nov 14, 2023

View reviewed changes

bdice reviewed Nov 14, 2023

View reviewed changes

This was referenced Nov 14, 2023

Groupby hash aggregations use sort-based implementation if nested-type columns are used as values #14412

Open

read_json does not compile if using std::string_view instead of std::string #14413

Closed

divyegala and others added 4 commits November 14, 2023 14:50

Update cpp/examples/nested_types/example.json

57ccb51

Co-authored-by: Bradley Dice <[email protected]>

add cout

4d6ac25

Merge remote-tracking branch 'origin/blog-example' into blog-example

b3fee13

Merge branch 'branch-23.12' into blog-example

6cf9f58

divyegala added 2 commits November 14, 2023 14:35

add param in docs

02438d1

Merge remote-tracking branch 'origin/blog-example' into blog-example

26ed48c

divyegala requested review from bdice and karthikeyann November 15, 2023 15:10

bdice approved these changes Nov 15, 2023

View reviewed changes

cpp/examples/nested_types/deduplication.cpp Show resolved Hide resolved

divyegala and others added 2 commits November 15, 2023 12:34

Update cpp/examples/nested_types/deduplication.cpp

57d68bd

Co-authored-by: Bradley Dice <[email protected]>

Merge branch 'branch-23.12' into blog-example

43c0fc3

karthikeyann approved these changes Nov 15, 2023

View reviewed changes

PointKernel approved these changes Nov 15, 2023

View reviewed changes

ttnghia approved these changes Nov 15, 2023

View reviewed changes

bdice mentioned this pull request Nov 15, 2023

Add CI jobs rapidsai/pynvjitlink#10

Merged

Merge branch 'branch-23.12' into blog-example

9625037

raydouglass approved these changes Nov 16, 2023

View reviewed changes

rapids-bot bot merged commit afd7d18 into rapidsai:branch-23.12 Nov 16, 2023
65 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example code for blog on new row comparators #13795

Example code for blog on new row comparators #13795

divyegala commented Aug 1, 2023 •

edited

Loading

karthikeyann commented Aug 18, 2023

karthikeyann commented Aug 18, 2023

divyegala commented Aug 18, 2023

karthikeyann commented Aug 23, 2023

GregoryKimball commented Sep 20, 2023 •

edited

Loading

copy-pr-bot bot commented Sep 21, 2023

divyegala commented Sep 21, 2023

vyasr commented Nov 14, 2023

bdice Nov 14, 2023

divyegala commented Nov 14, 2023

divyegala commented Nov 14, 2023

divyegala commented Nov 15, 2023

karthikeyann left a comment

vyasr commented Nov 16, 2023

divyegala commented Nov 16, 2023

Example code for blog on new row comparators #13795

Example code for blog on new row comparators #13795

Conversation

divyegala commented Aug 1, 2023 • edited Loading

Description

Checklist

karthikeyann commented Aug 18, 2023

karthikeyann commented Aug 18, 2023

divyegala commented Aug 18, 2023

karthikeyann commented Aug 23, 2023

GregoryKimball commented Sep 20, 2023 • edited Loading

copy-pr-bot bot commented Sep 21, 2023

divyegala commented Sep 21, 2023

vyasr commented Nov 14, 2023

bdice Nov 14, 2023

Choose a reason for hiding this comment

divyegala commented Nov 14, 2023

divyegala commented Nov 14, 2023

divyegala commented Nov 15, 2023

karthikeyann left a comment

Choose a reason for hiding this comment

vyasr commented Nov 16, 2023

divyegala commented Nov 16, 2023

divyegala commented Aug 1, 2023 •

edited

Loading

GregoryKimball commented Sep 20, 2023 •

edited

Loading