Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] concatenate lists together #7767

Closed
revans2 opened this issue Mar 30, 2021 · 0 comments · Fixed by #8049
Closed

[FEA] concatenate lists together #7767

revans2 opened this issue Mar 30, 2021 · 0 comments · Fixed by #8049
Assignees
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS

Comments

@revans2
Copy link
Contributor

revans2 commented Mar 30, 2021

Is your feature request related to a problem? Please describe.
We would like to be able to support the concat operator in Apache Spark sql for arrays. concat lets you concatenate strings together into a new string. It also lets you concatenate lists of other things together into a new list.

This is very similar to what happens with strings, the main difference is that a list can hold nulls and a string cannot, so we need to make sure that the nulls are honored.

Also if one of the input lists is a null the output list should be a null.

Describe the solution you'd like

I would like an API like the following that would take multiple lists of Something and concatenate them together.

std::unique_ptr<column> concatenate(
  table_view const& lists_columns,
  rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

Describe alternatives you've considered
I think I hack something together myself but it would not be pretty.

@revans2 revans2 added feature request New feature or request Needs Triage Need team to review and classify Spark Functionality that helps Spark RAPIDS labels Mar 30, 2021
@kkraus14 kkraus14 added libcudf Affects libcudf (C++/CUDA) code. and removed Needs Triage Need team to review and classify labels Apr 6, 2021
@rapids-bot rapids-bot bot closed this as completed in #8049 May 3, 2021
rapids-bot bot pushed a commit that referenced this issue May 3, 2021
This PR closes #7767. It implements `lists::concatenate_rows` that performs concatenation of all list elements at the same rows from the given table of list elements.

For example:
```
s1 = [{0, 1}, {2, 3, 4}, {5}, {}, {6, 7}]
s2 = [{8}, {9}, {}, {10, 11, 12}, {13, 14, 15, 16}]
r = lists::concatenate_rows( table_view{s1, s2} )
r is now [{0, 1, 8}, {2, 3, 4, 9}, {5}, {10, 11, 12}, {6, 7, 13, 14, 15, 16}]
```

Currently, only lists columns of one depth level are supported.

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - Robert Maynard (https://github.com/robertmaynard)
  - Devavret Makkar (https://github.com/devavret)
  - Ray Douglass (https://github.com/raydouglass)

URL: #8049
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants