Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add explode_outer and explode_outer_position #7499

Merged
merged 18 commits into from
Mar 17, 2021
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
9541e27
First pass at explode_outer and explode_outer_position
hyperbolic2346 Mar 3, 2021
566a8d1
Merge remote-tracking branch 'upstream/branch-0.19' into mwilson/expl…
hyperbolic2346 Mar 3, 2021
557fcfc
merging two for_each calls into one
hyperbolic2346 Mar 4, 2021
7bcede0
linting
hyperbolic2346 Mar 4, 2021
869c27c
Merge remote-tracking branch 'upstream/branch-0.19' into mwilson/expl…
hyperbolic2346 Mar 8, 2021
552e43c
Merge remote-tracking branch 'upstream/branch-0.19' into mwilson/expl…
hyperbolic2346 Mar 8, 2021
23f4abc
Merge remote-tracking branch 'upstream/branch-0.19' into mwilson/expl…
hyperbolic2346 Mar 8, 2021
01cdde2
Merge remote-tracking branch 'upstream/branch-0.19' into mwilson/expl…
hyperbolic2346 Mar 9, 2021
4aee4ea
Update cpp/src/reshape/explode.cu
hyperbolic2346 Mar 9, 2021
b573889
Update cpp/src/reshape/explode.cu
hyperbolic2346 Mar 9, 2021
fb54f2e
Update cpp/src/reshape/explode.cu
hyperbolic2346 Mar 9, 2021
30d4d23
Updates based on review comments
hyperbolic2346 Mar 9, 2021
aa41842
updating comment to hopefully clarify the performance issue
hyperbolic2346 Mar 9, 2021
cf334cd
Updating from review comments
hyperbolic2346 Mar 12, 2021
2fd6652
Moving explode out of reshape and some test cleanup
hyperbolic2346 Mar 15, 2021
d2584e6
updating from review comments
hyperbolic2346 Mar 15, 2021
03701e7
adding explode.hpp to meta.yaml conda recipe
hyperbolic2346 Mar 16, 2021
44066f7
Merge remote-tracking branch 'upstream/branch-0.19' into mwilson/expl…
hyperbolic2346 Mar 16, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
200 changes: 200 additions & 0 deletions cpp/include/cudf/lists/explode.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,200 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#pragma once

//#include <cudf/column/column.hpp>
hyperbolic2346 marked this conversation as resolved.
Show resolved Hide resolved
//#include <cudf/table/table_view.hpp>
//#include <cudf/types.hpp>
//#include <memory>

namespace cudf {

/**
* @brief Explodes a list column's elements.
*
* Any list is exploded, which means the elements of the list in each row are expanded into new rows
* in the output. The corresponding rows for other columns in the input are duplicated. Example:
* ```
* [[5,10,15], 100],
* [[20,25], 200],
* [[30], 300],
* returns
* [5, 100],
* [10, 100],
* [15, 100],
* [20, 200],
* [25, 200],
* [30, 300],
* ```
*
* Nulls and empty lists propagate in different ways depending on what is null or empty.
*```
* [[5,null,15], 100],
* [null, 200],
* [[], 300],
* returns
* [5, 100],
* [null, 100],
* [15, 100],
* ```
* Note that null lists are not included in the resulting table, but nulls inside
* lists and empty lists will be represented with a null entry for that column in that row.
*
* @param input_table Table to explode.
* @param explode_column_idx Column index to explode inside the table.
* @param mr Device memory resource used to allocate the returned column's device memory.
*
* @return A new table with explode_col exploded.
*/
std::unique_ptr<table> explode(
table_view const& input_table,
size_type explode_column_idx,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @brief Explodes a list column's elements and includes a position column.
*
* Any list is exploded, which means the elements of the list in each row are expanded into new rows
* in the output. The corresponding rows for other columns in the input are duplicated. A position
* column is added that has the index inside the original list for each row. Example:
* ```
* [[5,10,15], 100],
* [[20,25], 200],
* [[30], 300],
* returns
* [0, 5, 100],
* [1, 10, 100],
* [2, 15, 100],
* [0, 20, 200],
* [1, 25, 200],
* [0, 30, 300],
* ```
*
* Nulls and empty lists propagate in different ways depending on what is null or empty.
*```
* [[5,null,15], 100],
* [null, 200],
* [[], 300],
* returns
* [0, 5, 100],
* [1, null, 100],
* [2, 15, 100],
* ```
* Note that null lists are not included in the resulting table, but nulls inside
* lists and empty lists will be represented with a null entry for that column in that row.
*
* @param input_table Table to explode.
* @param explode_column_idx Column index to explode inside the table.
* @param mr Device memory resource used to allocate the returned column's device memory.
*
* @return A new table with exploded value and position. The column order of return table is
* [cols before explode_input, explode_position, explode_value, cols after explode_input].
*/
std::unique_ptr<table> explode_position(
table_view const& input_table,
size_type explode_column_idx,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @brief Explodes a list column's elements retaining any null entries or empty lists inside.
*
* Any list is exploded, which means the elements of the list in each row are expanded into new rows
* in the output. The corresponding rows for other columns in the input are duplicated. Example:
* ```
* [[5,10,15], 100],
* [[20,25], 200],
* [[30], 300],
* returns
* [5, 100],
* [10, 100],
* [15, 100],
* [20, 200],
* [25, 200],
* [30, 300],
* ```
*
* Nulls and empty lists propagate as null entries in the result.
*```
* [[5,null,15], 100],
* [null, 200],
* [[], 300],
* returns
* [5, 100],
* [null, 100],
* [15, 100],
* [null, 200],
* [null, 300],
* ```
*
* @param input_table Table to explode.
* @param explode_column_idx Column index to explode inside the table.
* @param mr Device memory resource used to allocate the returned column's device memory.
*
* @return A new table with explode_col exploded.
*/
std::unique_ptr<table> explode_outer(
table_view const& input_table,
size_type explode_column_idx,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @brief Explodes a list column's elements retaining any null entries or empty lists and includes a
*position column.
*
* Any list is exploded, which means the elements of the list in each row are expanded into new rows
* in the output. The corresponding rows for other columns in the input are duplicated. A position
* column is added that has the index inside the original list for each row. Example:
* ```
* [[5,10,15], 100],
* [[20,25], 200],
* [[30], 300],
* returns
* [0, 5, 100],
* [1, 10, 100],
* [2, 15, 100],
* [0, 20, 200],
* [1, 25, 200],
* [0, 30, 300],
* ```
*
* Nulls and empty lists propagate as null entries in the result.
*```
* [[5,null,15], 100],
* [null, 200],
* [[], 300],
* returns
* [0, 5, 100],
* [1, null, 100],
* [2, 15, 100],
* [0, null, 200],
* [0, null, 300],
* ```
*
* @param input_table Table to explode.
* @param explode_column_idx Column index to explode inside the table.
* @param mr Device memory resource used to allocate the returned column's device memory.
*
* @return A new table with explode_col exploded.
*/
std::unique_ptr<table> explode_outer_position(
table_view const& input_table,
size_type explode_column_idx,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/** @} */ // end of group

} // namespace cudf
86 changes: 0 additions & 86 deletions cpp/include/cudf/reshape.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -97,92 +97,6 @@ std::unique_ptr<column> byte_cast(
flip_endianness endian_configuration,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @brief Explodes a list column's elements.
*
* Any list is exploded, which means the elements of the list in each row are expanded into new rows
* in the output. The corresponding rows for other columns in the input are duplicated. Example:
* ```
* [[5,10,15], 100],
* [[20,25], 200],
* [[30], 300],
* returns
* [5, 100],
* [10, 100],
* [15, 100],
* [20, 200],
* [25, 200],
* [30, 300],
* ```
*
* Nulls and empty lists propagate in different ways depending on what is null or empty.
*```
* [[5,null,15], 100],
* [null, 200],
* [[], 300],
* returns
* [5, 100],
* [null, 100],
* [15, 100],
* ```
* Note that null lists are not included in the resulting table, but nulls inside
* lists and empty lists will be represented with a null entry for that column in that row.
*
* @param input_table Table to explode.
* @param explode_column_idx Column index to explode inside the table.
* @param mr Device memory resource used to allocate the returned column's device memory.
*
* @return A new table with explode_col exploded.
*/
std::unique_ptr<table> explode(
table_view const& input_table,
size_type explode_column_idx,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @brief Explodes a list column's elements and includes a position column.
*
* Any list is exploded, which means the elements of the list in each row are expanded into new rows
* in the output. The corresponding rows for other columns in the input are duplicated. A position
* column is added that has the index inside the original list for each row. Example:
* ```
* [[5,10,15], 100],
* [[20,25], 200],
* [[30], 300],
* returns
* [0, 5, 100],
* [1, 10, 100],
* [2, 15, 100],
* [0, 20, 200],
* [1, 25, 200],
* [0, 30, 300],
* ```
*
* Nulls and empty lists propagate in different ways depending on what is null or empty.
*```
* [[5,null,15], 100],
* [null, 200],
* [[], 300],
* returns
* [0, 5, 100],
* [1, null, 100],
* [2, 15, 100],
* ```
* Note that null lists are not included in the resulting table, but nulls inside
* lists and empty lists will be represented with a null entry for that column in that row.
*
* @param input_table Table to explode.
* @param explode_column_idx Column index to explode inside the table.
* @param mr Device memory resource used to allocate the returned column's device memory.
*
* @return A new table with exploded value and position. The column order of return table is
* [cols before explode_input, explode_position, explode_value, cols after explode_input].
*/
std::unique_ptr<table> explode_position(
table_view const& input_table,
size_type explode_column_idx,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/** @} */ // end of group

} // namespace cudf
26 changes: 25 additions & 1 deletion cpp/include/cudf/table/table.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,27 @@ class table {
*/
std::vector<std::unique_ptr<column>> release();

/**
* @brief Returns a table_view built from a range of column indices.
*
* @throws std::out_of_range
* If any index is outside [0, num_columns())
*
* @param begin Beginning of the range
* @param end Ending of the range
* @return A table_view consisting of columns from the original table
* specified by the elements of `column_indices`
*/

template <typename InputIterator>
table_view select(InputIterator begin, InputIterator end) const
{
std::vector<column_view> columns(std::distance(begin, end));
std::transform(
begin, end, columns.begin(), [this](auto index) { return _columns.at(index)->view(); });
return table_view(columns);
}

/**
* @brief Returns a table_view with set of specified columns.
*
Expand All @@ -120,7 +141,10 @@ class table {
* @return A table_view consisting of columns from the original table
* specified by the elements of `column_indices`
*/
table_view select(std::vector<cudf::size_type> const& column_indices) const;
table_view select(std::vector<cudf::size_type> const& column_indices) const
{
return select(column_indices.begin(), column_indices.end());
};

/**
* @brief Returns a reference to the specified column
Expand Down
19 changes: 19 additions & 0 deletions cpp/include/cudf/table/table_view.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,25 @@ class table_view : public detail::table_view_base<column_view> {
*/
table_view(std::vector<table_view> const& views);

/**
* @brief Returns a table_view built from a range of column indices.
*
* @throws std::out_of_range
* If any index is outside [0, num_columns())
*
* @param begin Beginning of the range
* @param end Ending of the range
* @return A table_view consisting of columns from the original table
* specified by the elements of `column_indices`
*/
template <typename InputIterator>
table_view select(InputIterator begin, InputIterator end) const
{
std::vector<column_view> columns(std::distance(begin, end));
std::transform(begin, end, columns.begin(), [this](auto index) { return this->column(index); });
return table_view(columns);
}

/**
* @brief Returns a table_view with set of specified columns.
*
Expand Down
Loading