-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Example code for blog on new row comparators #13795
Conversation
@divyegala which is a good location for adding code to round tripping data using JSON reader and writer? |
Added metadata round trip for json reader, writer usages. |
@karthikeyann can you rebase and remove your commits? I removed |
0fcd8e6
to
a15dfef
Compare
The cmake compilation didn’t work right away. |
@divyegala Would you please update this example to use an RMM pool? For the strings example, @davidwendt added this in |
/ok to test |
Co-authored-by: Gregory Kimball <[email protected]>
/ok to test |
{ | ||
// Get count for each key | ||
auto keys = cudf::table_view{{tbl.column(0)}}; | ||
auto val = cudf::make_numeric_column(cudf::data_type{cudf::type_id::INT32}, keys.num_rows()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We need to file an issue that makes this use hash-based aggregations without any "forcing"
- I think Vyas is asking, if you don't add this column to "force" hash aggregations, are some of the aggregations hash-based and others sort-based? My understanding of libcudf's behavior is that if any aggregation is sort-based, all the aggregations fall back to using sort-based implementations. Is that true?
/ok to test |
/ok to test |
Co-authored-by: Bradley Dice <[email protected]>
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
/ok to test |
/merge |
In #13795, we found out that `nullable()` causes severe perf degradation for the nested-type case when the input is read from file via `cudf::io::read_json`. This is because the JSON reader adds a null mask for columns that don't have NULLs. This change is a no-overhead replacement that checks the actual null count instead of checking if a null mask is present. This PR also solves a bug in quantile/median groupby where NULLs were being [set](https://github.com/rapidsai/cudf/blob/8deb3dd7573000e7d87f18a9e2bbe39cf2932e10/cpp/src/groupby/sort/group_quantiles.cu#L73) but the null count was not updated. Authors: - Divye Gala (https://github.com/divyegala) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) - Vyas Ramasubramani (https://github.com/vyasr) - David Wendt (https://github.com/davidwendt) - Nghia Truong (https://github.com/ttnghia) URL: #14363
Description
Example code using a few libcudf APIs to demonstrate nested-type usage.
Checklist