Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extract_list_element() with column_view indices #9214

Conversation

mythrocks
Copy link
Contributor

@mythrocks mythrocks commented Sep 10, 2021

Fixes #9172.

Adds an overload of extract_list_element() where the indices
may be specified as a column_view.

This function returns a list element from a potentially different index,
for each list row. The semantics of the scalar-index version of the function
are retained. i.e.:

  1. The index is 0-based.
  2. if (list_row == null) return null;
  3. if (index > list_row.size()) return null;
  4. if (index == null) return null;
  5. if (index < 0 || -index <= length) return list_row[length + index];

Currently a work in progress. The following still needs addressing:

  • Tests for nested list elements
  • Doxygen docs for the header
  • Polish

Fixes rapidsai#9124.

Adds an overload of `extract_list_element()` where the indices
may be specified as a column_view.

This function returns a list element from a potentially different index,
for each list row. The semantics of the scalar-index version of the function
are retained. i.e.:

0. The index is 0-based.
1. if (list_row == null) return null;
2. if (index > list_row.size()) return null;
3. if (index == null) return null;
4. if (index < 0 || -index <= length) return list_row[length + index];
@mythrocks mythrocks requested a review from a team as a code owner September 10, 2021 04:21
@mythrocks mythrocks marked this pull request as draft September 10, 2021 04:21
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Sep 10, 2021
@mythrocks mythrocks self-assigned this Sep 10, 2021
@mythrocks mythrocks added 2 - In Progress Currently a work in progress feature request New feature or request non-breaking Non-breaking change labels Sep 10, 2021
@mythrocks
Copy link
Contributor Author

WIP. The need to do the following:

  • Tests for nested list elements
  • Doxygen docs for the header
  • Polish

@jrhemstad
Copy link
Contributor

@mythrocks it's more helpful to have lists like that in the PR description as GH actually parses those and shows it in the PR list view.

Comment on lines +48 to +49
if constexpr (PositiveIndex) { return index < length ? index + offset : out_of_bounds; }
return index >= -length ? length + index + offset : out_of_bounds;
Copy link
Contributor

@jrhemstad jrhemstad Sep 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is far more complicated than it needs to be. This looks like precisely the same logic as what is done in gather. I don't think there's any need for the PositiveIndex template argument.

Suggested change
if constexpr (PositiveIndex) { return index < length ? index + offset : out_of_bounds; }
return index >= -length ? length + index + offset : out_of_bounds;
auto const wrapped = (index < 0) ? index + length + offset : index + offset;
return (wrapped >= 0 and wrapped < length) ? wrapped : out_of_bounds;

Copy link
Contributor Author

@mythrocks mythrocks Sep 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PositiveIndex, etc. is the original code I'm trying to retain for extract_list_element(list_column_view, size_type, mr). I interpreted your advice here to be to retain the algo and fit the column_view version around it.
It would certainly simplify things to remove PositiveIndex.

@jrhemstad
Copy link
Contributor

Come to think of it, this is just segmented_gather with a one element gather map per list. Do we need this new function?

@mythrocks
Copy link
Contributor Author

mythrocks commented Sep 10, 2021

this is just segmented_gather with a one element gather map per list. Do we need this new function?

@ttnghia alerted me to this fact on the issue, based on your advice in #9124.
There are a couple of possibly minor kinks to consider:

  1. For a null index value, the result needs to be null. segmented_gather() requires the gather_map_list rows not to have null elements. Perhaps copy_if_else() to replace the nulls with numeric_limits<size_type>::min()? As per docs in segmented_gather(), that seems undefined:
/**
* If indices in `gather_map_list` are outside the range `[-n, n)`, where `n` is the number of
 * elements in corresponding row of the source column, the behavior is undefined.
 * /
  1. segmented_gather() returns a LIST. We'd probably need to return child(1)?

Would you recommend that we redo both versions of extract_list_element() in terms of segmented_gather()?

@jrhemstad
Copy link
Contributor

  1. For a null index value, the result needs to be null.

I think we could add the NULLIFY logic to segmented_gather for out of bounds indices like is in normal gather.

segmented_gather() returns a LIST. We'd probably need to return child(1)?

Hm, good point. It could either be the caller's responsibility to extract the data child or we have a wrapper function to call segmented_gather and extract the child for the caller. I tend to prefer the former from the perspective of a more minimal API surface area, but I could be convinced of the latter.

@mythrocks
Copy link
Contributor Author

Come to think of it, this is just segmented_gather with a one element gather map per list.

Closing this PR in favour of #9367, where it's all (re)done via segmented_gather().

@mythrocks mythrocks closed this Oct 4, 2021
rapids-bot bot pushed a commit that referenced this pull request Oct 13, 2021
Fixes #9172.

Adds an overload of `extract_list_element()` where the indices
may be specified as a `column_view`.

This function returns a list element from a potentially different index,
for each list row. The semantics of the scalar-index version of the function
are retained. i.e.:

0. The index is 0-based.
1. `if (list_row == null) return null;`
2. `if (index > list_row.size()) return null;`
3. `if (index == null) return null;`
4. `if (index < 0 && -index <= length) return list_row[length + index];`

This commit also reworks `extract_list_element(list, size_type, stream)`,
to use `segmented_gather()`, as per the advice in #9214.

Authors:
  - MithunR (https://github.com/mythrocks)

Approvers:
  - David Wendt (https://github.com/davidwendt)
  - Mark Harris (https://github.com/harrism)
  - Nghia Truong (https://github.com/ttnghia)

URL: #9367
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currently a work in progress feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Implement extract_list_element() with column_view indices
2 participants