Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the example in documentation for get_dremel_data() #17242

17 changes: 9 additions & 8 deletions cpp/include/cudf/lists/detail/dremel.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ struct dremel_data {
};

/**
* @brief Get the dremel offsets and repetition and definition levels for a LIST column
* @brief Get the dremel offsets, repetition levels, and definition levels for a LIST column
*
* Dremel is a query system created by Google for ad hoc data analysis. The Dremel engine is
* described in depth in the paper "Dremel: Interactive Analysis of Web-Scale
Expand All @@ -74,7 +74,7 @@ struct dremel_data {
*
* http://www.goldsborough.me/distributed-systems/2019/05/18/21-09-00-a_look_at_dremel/
* https://akshays-blog.medium.com/wrapping-head-around-repetition-and-definition-levels-in-dremel-powering-bigquery-c1a33c9695da
* https://blog.twitter.com/engineering/en_us/a/2013/dremel-made-simple-with-parquet
* https://blog.x.com/engineering/en_us/a/2013/dremel-made-simple-with-parquet
mhaseeb123 marked this conversation as resolved.
Show resolved Hide resolved
*
* The remainder of this documentation assumes familiarity with the Dremel concepts.
*
Expand Down Expand Up @@ -102,16 +102,17 @@ struct dremel_data {
* ```
* We can represent it in cudf format with two level of offsets like this:
* ```
* Level 0 offsets = {0, 0, 3, 5, 6}
* Level 0 offsets = {0, 0, 3, 4}
mhaseeb123 marked this conversation as resolved.
Show resolved Hide resolved
* Level 1 offsets = {0, 0, 3, 5, 5}
* Values = {1, 2, 3, 4, 5}
* ```
* The desired result of this function is the repetition and definition level values that
* correspond to the data values:
* This function returns the dremel offsets, repetition levels, and definition level
* values that correspond to the data values:
* ```
* col = {[], [[], [1, 2, 3], [4, 5]], [[]]}
* def = { 0 1, 2, 2, 2, 2, 2, 1 }
* rep = { 0, 0, 0, 2, 2, 1, 2, 0 }
* col = {[], [[], [1, 2, 3], [4, 5]], [[]]}
* dremel_offsets = { 0, 1, 7, 8}
* def_levels = { 0, 1, 2, 2, 2, 2, 2, 1 }
* rep_levels = { 0, 0, 1, 2, 2, 1, 2, 0 }
* ```
*
* Since repetition and definition levels arrays contain a value for each empty list, the size of
Copy link
Member Author

@mhaseeb123 mhaseeb123 Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewers: Would love an extra set of eyes at the explanation of the algorithm at L120-151 if you could. I looked into it while making the edits and the explanation looked fine to me so I didn't make any changes. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I'm just confused as to why we included the algorithm description in the public header :)
Not a concern for this PR

Expand Down
Loading