Skip to content

Commit

Permalink
Implement lists::concatenate_list_elements (#8231)
Browse files Browse the repository at this point in the history
This PR implements `lists::concatenate_list_elements` for list type. Given a lists column in which each row is a list of lists, the output column is generated by concatenating all lists in the same row into a single list.

Example:
```
l = [ [{1, 2}, {3, 4}, {5}], [{6}, {}, {7, 8, 9}] ]
r = lists::concatenate_list_elements(l);
r is [ {1, 2, 3, 4, 5}, {6, 7, 8, 9} ]
```

This closes #8164. In addition, `lists::concatenate_rows` is rewritten using `lists::interleave_columns` following by `lists::concatenate_list_elements`, which is significantly shorter.

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - Robert Maynard (https://github.com/robertmaynard)
  - Jason Lowe (https://github.com/jlowe)
  - AJ Schmidt (https://github.com/ajschmidt8)
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Devavret Makkar (https://github.com/devavret)

URL: #8231
  • Loading branch information
ttnghia authored May 20, 2021
1 parent 7427049 commit 944e932
Show file tree
Hide file tree
Showing 14 changed files with 959 additions and 452 deletions.
3 changes: 2 additions & 1 deletion conda/recipes/libcudf/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -133,13 +133,14 @@ test:
- test -f $PREFIX/include/cudf/io/types.hpp
- test -f $PREFIX/include/cudf/ipc.hpp
- test -f $PREFIX/include/cudf/join.hpp
- test -f $PREFIX/include/cudf/lists/detail/combine.hpp
- test -f $PREFIX/include/cudf/lists/detail/concatenate.hpp
- test -f $PREFIX/include/cudf/lists/detail/copying.hpp
- test -f $PREFIX/include/cudf/lists/lists_column_factories.hpp
- test -f $PREFIX/include/cudf/lists/detail/drop_list_duplicates.hpp
- test -f $PREFIX/include/cudf/lists/detail/interleave_columns.hpp
- test -f $PREFIX/include/cudf/lists/detail/sorting.hpp
- test -f $PREFIX/include/cudf/lists/concatenate_rows.hpp
- test -f $PREFIX/include/cudf/lists/combine.hpp
- test -f $PREFIX/include/cudf/lists/count_elements.hpp
- test -f $PREFIX/include/cudf/lists/explode.hpp
- test -f $PREFIX/include/cudf/lists/drop_list_duplicates.hpp
Expand Down
3 changes: 2 additions & 1 deletion cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -266,7 +266,8 @@ add_library(cudf
src/join/join.cu
src/join/semi_join.cu
src/lists/contains.cu
src/lists/concatenate_rows.cu
src/lists/combine/concatenate_list_elements.cu
src/lists/combine/concatenate_rows.cu
src/lists/copying/concatenate.cu
src/lists/copying/copying.cu
src/lists/copying/gather.cu
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
namespace cudf {
namespace lists {
/**
* @addtogroup lists_concatenate_rows
* @addtogroup lists_combine
* @{
* @file
*/
Expand Down Expand Up @@ -53,16 +53,47 @@ enum class concatenate_null_policy { IGNORE, NULLIFY_OUTPUT_ROW };
*
* @param input Table of lists to be concatenated.
* @param null_policy The parameter to specify whether a null list element will be ignored from
* concatenation, or any concatenation involving a null list element will result in a null list.
* concatenation, or any concatenation involving a null element will result in a null list.
* @param mr Device memory resource used to allocate the returned column's device memory.
* @return A new column in which each row is a list resulted from concatenating all list elements in
* the corresponding row of the input table.
* the corresponding row of the input table.
*/
std::unique_ptr<column> concatenate_rows(
table_view const& input,
concatenate_null_policy null_policy = concatenate_null_policy::IGNORE,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @brief Concatenating multiple lists on the same row of a lists column into a single list.
*
* Given a lists column where each row in the column is a list of lists of entries, an output lists
* column is generated by concatenating all the list elements at the same row together. If any row
* contains null list elements, the concatenation process will either ignore those null elements, or
* will simply set the entire resulting row to be a null element.
*
* @code{.pseudo}
* l = [ [{1, 2}, {3, 4}, {5}], [{6}, {}, {7, 8, 9}] ]
* r = lists::concatenate_list_elements(l);
* r is [ {1, 2, 3, 4, 5}, {6, 7, 8, 9} ]
* @endcode
*
* @throws cudf::logic_error if the input column is not at least two-level depth lists column (i.e.,
* each row must be a list of list).
* @throws cudf::logic_error if the input lists column contains nested typed entries that are not
* lists.
*
* @param input The lists column containing lists of list elements to concatenate.
* @param null_policy The parameter to specify whether a null list element will be ignored from
* concatenation, or any concatenation involving a null element will result in a null list.
* @param mr Device memory resource used to allocate the returned column's device memory.
* @return A new column in which each row is a list resulted from concatenating all list elements in
* the corresponding row of the input lists column.
*/
std::unique_ptr<column> concatenate_list_elements(
column_view const& input,
concatenate_null_policy null_policy = concatenate_null_policy::IGNORE,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/** @} */ // end of group
} // namespace lists
} // namespace cudf
49 changes: 49 additions & 0 deletions cpp/include/cudf/lists/detail/combine.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once

#include <cudf/column/column.hpp>
#include <cudf/lists/combine.hpp>
#include <cudf/lists/lists_column_view.hpp>

namespace cudf {
namespace lists {
namespace detail {
/**
* @copydoc cudf::lists::concatenate_rows
*
* @param stream CUDA stream used for device memory operations and kernel launches.
*/
std::unique_ptr<column> concatenate_rows(
table_view const& input,
concatenate_null_policy null_policy,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @copydoc cudf::lists::concatenate_list_elements
*
* @param stream CUDA stream used for device memory operations and kernel launches.
*/
std::unique_ptr<column> concatenate_list_elements(
column_view const& input,
concatenate_null_policy null_policy,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

} // namespace detail
} // namespace lists
} // namespace cudf
2 changes: 1 addition & 1 deletion cpp/include/doxygen_groups.h
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@
* @}
* @defgroup lists_apis Lists
* @{
* @defgroup lists_concatenate_rows Combining
* @defgroup lists_combine Combining
* @defgroup lists_extract Extracting
* @defgroup lists_contains Searching
* @defgroup lists_gather Gathering
Expand Down
Loading

0 comments on commit 944e932

Please sign in to comment.