-
Notifications
You must be signed in to change notification settings - Fork 917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add libcudf example with large strings #15983
Add libcudf example with large strings #15983
Conversation
That unexpected. I'll check the code and update here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving -- only a few small suggestions.
cpp/examples/1billion/CMakeLists.txt
Outdated
rapids_cuda_set_architectures(RAPIDS) | ||
|
||
project( | ||
billion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename this to brc
? Or does the project need a unique name?
It's a little confusing to have three unique names for this example:
- The directory is named
1billion
- The project is named
billion
- The executable is named
brc
(and variations thereof)
Let's consolidate these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with renaming the directory and project the same. I want to use something easily readable/discoverable from the perspective of someone looking at the examples folder. My suggestion is to use billion_rows
for the directory and project name.
I would like to keep the executable names shorter since they are built in context of the parent directory and would look less cumbersome in my opinion. I'd like to keep the brc
variations also because the blog uses those names in charts that would need to be regenerated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That’s fine! Let’s do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a few non-blocking suggestions
Nit:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lovely examples. Expected more complex code, especially for the pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great
auto const mr_name = std::string("pool"); | ||
auto resource = create_memory_resource(mr_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: to keep it simple,
auto const mr_name = std::string("pool"); | |
auto resource = create_memory_resource(mr_name); | |
auto resource = create_memory_resource("pool"); |
/merge |
Description
Creating an example that shows reading large strings columns. This uses the 1 billion row challenge input data and provides three examples of loading this data:
brc
uses the CSV reader to load the input file in one call and aggregates the results usinggroupby
brc_chunks
uses the CSV reader to load the input file in chunks, aggregates each chunk, and computes the resultsbrc_pipeline
same asbrc_chunks
but input chunks are processed in separate threads/streams.Checklist