-
Notifications
You must be signed in to change notification settings - Fork 915
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add explode_outer and explode_outer_position (#7499)
This code adds support for explode_outer and explode_outer_position. These differ from explode and explode_position by the way null and empty lists are handled. Explode discards null and empty lists and as such, lifts the child column directly out of the list column. Explode_outer must find these null and empty lists and make space for a null entry in the child column. This means we need to gather both the table and the exploded column. Further, we must make a pass on the exploded column to count these entries initially as we do not know the required size of the gather maps until we have this information and it isn't just the null count. If there are no null or empty lists in the input, the normal explode function is called as it is simpler, but it does come at the cost of marching the offsets looking for duplicates, which indicate null or empty lists. closes #7466 Authors: - Mike Wilson (@hyperbolic2346) Approvers: - AJ Schmidt (@ajschmidt8) - Jake Hemstad (@jrhemstad) - Nghia Truong (@ttnghia) URL: #7499
- Loading branch information
1 parent
34cccfe
commit 0146f74
Showing
13 changed files
with
1,381 additions
and
810 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,200 @@ | ||
/* | ||
* Copyright (c) 2021, NVIDIA CORPORATION. | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
#pragma once | ||
|
||
#include <cudf/column/column.hpp> | ||
#include <cudf/table/table_view.hpp> | ||
#include <cudf/types.hpp> | ||
#include <memory> | ||
|
||
namespace cudf { | ||
|
||
/** | ||
* @brief Explodes a list column's elements. | ||
* | ||
* Any list is exploded, which means the elements of the list in each row are expanded into new rows | ||
* in the output. The corresponding rows for other columns in the input are duplicated. Example: | ||
* ``` | ||
* [[5,10,15], 100], | ||
* [[20,25], 200], | ||
* [[30], 300], | ||
* returns | ||
* [5, 100], | ||
* [10, 100], | ||
* [15, 100], | ||
* [20, 200], | ||
* [25, 200], | ||
* [30, 300], | ||
* ``` | ||
* | ||
* Nulls and empty lists propagate in different ways depending on what is null or empty. | ||
*``` | ||
* [[5,null,15], 100], | ||
* [null, 200], | ||
* [[], 300], | ||
* returns | ||
* [5, 100], | ||
* [null, 100], | ||
* [15, 100], | ||
* ``` | ||
* Note that null lists are not included in the resulting table, but nulls inside | ||
* lists and empty lists will be represented with a null entry for that column in that row. | ||
* | ||
* @param input_table Table to explode. | ||
* @param explode_column_idx Column index to explode inside the table. | ||
* @param mr Device memory resource used to allocate the returned column's device memory. | ||
* | ||
* @return A new table with explode_col exploded. | ||
*/ | ||
std::unique_ptr<table> explode( | ||
table_view const& input_table, | ||
size_type explode_column_idx, | ||
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource()); | ||
|
||
/** | ||
* @brief Explodes a list column's elements and includes a position column. | ||
* | ||
* Any list is exploded, which means the elements of the list in each row are expanded into new rows | ||
* in the output. The corresponding rows for other columns in the input are duplicated. A position | ||
* column is added that has the index inside the original list for each row. Example: | ||
* ``` | ||
* [[5,10,15], 100], | ||
* [[20,25], 200], | ||
* [[30], 300], | ||
* returns | ||
* [0, 5, 100], | ||
* [1, 10, 100], | ||
* [2, 15, 100], | ||
* [0, 20, 200], | ||
* [1, 25, 200], | ||
* [0, 30, 300], | ||
* ``` | ||
* | ||
* Nulls and empty lists propagate in different ways depending on what is null or empty. | ||
*``` | ||
* [[5,null,15], 100], | ||
* [null, 200], | ||
* [[], 300], | ||
* returns | ||
* [0, 5, 100], | ||
* [1, null, 100], | ||
* [2, 15, 100], | ||
* ``` | ||
* Note that null lists are not included in the resulting table, but nulls inside | ||
* lists and empty lists will be represented with a null entry for that column in that row. | ||
* | ||
* @param input_table Table to explode. | ||
* @param explode_column_idx Column index to explode inside the table. | ||
* @param mr Device memory resource used to allocate the returned column's device memory. | ||
* | ||
* @return A new table with exploded value and position. The column order of return table is | ||
* [cols before explode_input, explode_position, explode_value, cols after explode_input]. | ||
*/ | ||
std::unique_ptr<table> explode_position( | ||
table_view const& input_table, | ||
size_type explode_column_idx, | ||
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource()); | ||
|
||
/** | ||
* @brief Explodes a list column's elements retaining any null entries or empty lists inside. | ||
* | ||
* Any list is exploded, which means the elements of the list in each row are expanded into new rows | ||
* in the output. The corresponding rows for other columns in the input are duplicated. Example: | ||
* ``` | ||
* [[5,10,15], 100], | ||
* [[20,25], 200], | ||
* [[30], 300], | ||
* returns | ||
* [5, 100], | ||
* [10, 100], | ||
* [15, 100], | ||
* [20, 200], | ||
* [25, 200], | ||
* [30, 300], | ||
* ``` | ||
* | ||
* Nulls and empty lists propagate as null entries in the result. | ||
*``` | ||
* [[5,null,15], 100], | ||
* [null, 200], | ||
* [[], 300], | ||
* returns | ||
* [5, 100], | ||
* [null, 100], | ||
* [15, 100], | ||
* [null, 200], | ||
* [null, 300], | ||
* ``` | ||
* | ||
* @param input_table Table to explode. | ||
* @param explode_column_idx Column index to explode inside the table. | ||
* @param mr Device memory resource used to allocate the returned column's device memory. | ||
* | ||
* @return A new table with explode_col exploded. | ||
*/ | ||
std::unique_ptr<table> explode_outer( | ||
table_view const& input_table, | ||
size_type explode_column_idx, | ||
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource()); | ||
|
||
/** | ||
* @brief Explodes a list column's elements retaining any null entries or empty lists and includes a | ||
*position column. | ||
* | ||
* Any list is exploded, which means the elements of the list in each row are expanded into new rows | ||
* in the output. The corresponding rows for other columns in the input are duplicated. A position | ||
* column is added that has the index inside the original list for each row. Example: | ||
* ``` | ||
* [[5,10,15], 100], | ||
* [[20,25], 200], | ||
* [[30], 300], | ||
* returns | ||
* [0, 5, 100], | ||
* [1, 10, 100], | ||
* [2, 15, 100], | ||
* [0, 20, 200], | ||
* [1, 25, 200], | ||
* [0, 30, 300], | ||
* ``` | ||
* | ||
* Nulls and empty lists propagate as null entries in the result. | ||
*``` | ||
* [[5,null,15], 100], | ||
* [null, 200], | ||
* [[], 300], | ||
* returns | ||
* [0, 5, 100], | ||
* [1, null, 100], | ||
* [2, 15, 100], | ||
* [0, null, 200], | ||
* [0, null, 300], | ||
* ``` | ||
* | ||
* @param input_table Table to explode. | ||
* @param explode_column_idx Column index to explode inside the table. | ||
* @param mr Device memory resource used to allocate the returned column's device memory. | ||
* | ||
* @return A new table with explode_col exploded. | ||
*/ | ||
std::unique_ptr<table> explode_outer_position( | ||
table_view const& input_table, | ||
size_type explode_column_idx, | ||
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource()); | ||
|
||
/** @} */ // end of group | ||
|
||
} // namespace cudf |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.