Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support get_element from LIST column #8071

Merged
merged 24 commits into from
May 7, 2021
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions conda/recipes/libcudf/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ test:
- test -f $PREFIX/include/cudf/detail/groupby/sort_helper.hpp
- test -f $PREFIX/include/cudf/detail/hashing.hpp
- test -f $PREFIX/include/cudf/detail/interop.hpp
- test -f $PREFIX/include/cudf/detail/is_element_valid.hpp
- test -f $PREFIX/include/cudf/detail/null_mask.hpp
- test -f $PREFIX/include/cudf/detail/nvtx/nvtx3.hpp
- test -f $PREFIX/include/cudf/detail/nvtx/ranges.hpp
Expand Down
1 change: 1 addition & 0 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,7 @@ add_library(cudf
src/binaryop/compiled/binary_ops.cu
src/labeling/label_bins.cu
src/bitmask/null_mask.cu
src/bitmask/is_element_valid.cpp
src/column/column.cu
src/column/column_device_view.cu
src/column/column_factories.cpp
Expand Down
10 changes: 10 additions & 0 deletions cpp/docs/DEVELOPER_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,11 @@ currently supported by cudf. Each type of value is represented by a separate typ
which are all derived from `cudf::scalar`. e.g. A `numeric_scalar` holds a single numerical value,
a `string_scalar` holds a single string. The data for the stored value resides in device memory.

A `list_scalar` holds the underlying data of a list. This means the underlying data can be any type
that cudf supports. e.g. A `list_scalar` representing a list of integers stores an `cudf::column`
of type `INT32`, a `list_scalar` represents a list of list of integers stores an `cudf::column` of
type `LIST`, which in turn stores a column of type `INT32`.
isVoid marked this conversation as resolved.
Show resolved Hide resolved

|Value type|Scalar class|Notes|
|-|-|-|
|fixed-width|`fixed_width_scalar<T>`| `T` can be any fixed-width type|
Expand All @@ -263,11 +268,14 @@ a `string_scalar` holds a single string. The data for the stored value resides i
|timestamp|`timestamp_scalar<T>` | `T` can be `timestamp_D`, `timestamp_s`, etc.|
|duration|`duration_scalar<T>` | `T` can be `duration_D`, `duration_s`, etc.|
|string|`string_scalar`| This class object is immutable|
|list|`list_scalar`| Underlying data can be any type supported by cudf |

### Construction
`scalar`s can be created using either their respective constructors or using factory functions like
`make_numeric_scalar()`, `make_timestamp_scalar()` or `make_string_scalar()`.

// TODO: add details for `list_scalar`

isVoid marked this conversation as resolved.
Show resolved Hide resolved
### Casting
All the factory methods return a `unique_ptr<scalar>` which needs to be statically downcasted to
its respective scalar class type before accessing its value. Their validity (nullness) can be
Expand All @@ -290,6 +298,8 @@ and its validity from the device. This can be obtained using the function
`get_scalar_device_view(ScalarType s)`. Note that a device view is not provided for a base scalar
object, only for the derived typed scalar class objects.

// TODO: add details for `list_scalar`

# libcudf++ API and Implementation

## Streams
Expand Down
46 changes: 46 additions & 0 deletions cpp/include/cudf/detail/is_element_valid.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#pragma once

#include <cudf/column/column_view.hpp>

#include <rmm/cuda_stream_view.hpp>

namespace cudf {
namespace detail {

/**
* @brief Return validity of a row
*
* Retrieves the specified row validity from device memory.
isVoid marked this conversation as resolved.
Show resolved Hide resolved
*
* @note Synchronizes `stream`.
*
* @throw cudf::logic_error if `element_index < 0 or >= col_view.size()`
*
* @param col_view The column to retrieve the validity from.
* @param element_index The index of the row to retrieve.
* @param stream The stream to use for copying the validity to the host.
* @return Host boolean that indicates the validity of the row.
*/

bool is_element_valid_sync(column_view const& col_view,
size_type element_index,
rmm::cuda_stream_view stream = rmm::cuda_stream_default);

} // namespace detail
} // namespace cudf
18 changes: 17 additions & 1 deletion cpp/include/cudf/scalar/scalar.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -607,7 +607,7 @@ class list_scalar : public scalar {
list_scalar& operator=(list_scalar&& other) = delete;

/**
* @brief Construct a new list scalar object from existing device data
* @brief Construct a new list scalar object by copying from existing device data
*
* @param elements The elements of the list
* @param is_valid Whether the value held by the scalar is valid
Expand All @@ -622,6 +622,22 @@ class list_scalar : public scalar {
{
}

/**
* @brief Move existing device data into a new list scalar object
*
* @param elements The elements of the list
* @param is_valid Whether the value held by the scalar is valid
* @param stream CUDA stream used for device memory operations.
* @param mr Device memory resource to use for device memory allocation
*/
list_scalar(cudf::column const&& elements,
bool is_valid = true,
rmm::cuda_stream_view stream = rmm::cuda_stream_default,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource())
: scalar(data_type(type_id::LIST), is_valid, stream, mr), _data(elements)
{
}

/**
isVoid marked this conversation as resolved.
Show resolved Hide resolved
* @brief Returns a non-owning, immutable view to underlying device data
*/
Expand Down
47 changes: 47 additions & 0 deletions cpp/src/bitmask/is_element_valid.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@

/*
* Copyright (c) 2021, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#include <cudf/column/column_view.hpp>
#include <cudf/utilities/bit.hpp>
#include <cudf/utilities/error.hpp>

#include <rmm/cuda_stream_view.hpp>

namespace cudf {
namespace detail {

bool is_element_valid_sync(column_view const& col_view,
size_type element_index,
rmm::cuda_stream_view stream)
{
CUDF_EXPECTS(element_index >= 0 and element_index < col_view.size(), "invalid index.");
if (!col_view.nullable()) { return true; }

bitmask_type word;
// null_mask() returns device ptr to bitmask without offset
size_type index = element_index + col_view.offset();
CUDA_TRY(cudaMemcpyAsync(&word,
col_view.null_mask() + word_index(index),
sizeof(bitmask_type),
cudaMemcpyDeviceToHost,
stream.value()));
stream.synchronize();
return static_cast<bool>(word & (bitmask_type{1} << intra_word_index(index)));
}

} // namespace detail
} // namespace cudf
19 changes: 17 additions & 2 deletions cpp/src/copying/get_element.cu
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2020, NVIDIA CORPORATION.
* Copyright (c) 2020-2021, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -17,8 +17,11 @@
#include <cudf/column/column_device_view.cuh>
#include <cudf/copying.hpp>
#include <cudf/detail/indexalator.cuh>
#include <cudf/detail/is_element_valid.hpp>
#include <cudf/detail/utilities/cuda.cuh>
#include <cudf/dictionary/dictionary_column_view.hpp>
#include <cudf/lists/detail/copying.hpp>
#include <cudf/lists/lists_column_view.hpp>
#include <cudf/scalar/scalar_device_view.cuh>
#include <cudf/scalar/scalar_factories.hpp>

Expand Down Expand Up @@ -122,7 +125,19 @@ struct get_element_functor {
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource *mr = rmm::mr::get_current_device_resource())
{
CUDF_FAIL("get_element_functor not supported for list_view");
bool valid = is_element_valid_sync(input, index, stream);

if (valid) {
lists_column_view lcv(input);
// Make a copy of the row
auto row_slice_contents =
lists::detail::copy_slice(lcv, index, index + 1, stream, mr)->release();
nvdbaranec marked this conversation as resolved.
Show resolved Hide resolved
// Construct scalar with row data
return std::make_unique<list_scalar>(
std::move(*row_slice_contents.children[1]), valid, stream, mr);
} else {
return make_default_constructed_scalar(data_type(type_id::LIST));
}
}

template <typename T, std::enable_if_t<cudf::is_fixed_point<T>()> *p = nullptr>
Expand Down
2 changes: 1 addition & 1 deletion cpp/src/scalar/scalar_factories.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ template <>
std::unique_ptr<cudf::scalar> default_scalar_functor::operator()<list_view>(
rmm::cuda_stream_view stream, rmm::mr::device_memory_resource* mr)
{
CUDF_FAIL("list_view type not supported");
return std::make_unique<list_scalar>(column(), false, stream, mr);
}

template <>
Expand Down
3 changes: 2 additions & 1 deletion cpp/tests/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -367,7 +367,8 @@ ConfigureTest(TEXT_TEST
ConfigureTest(BITMASK_TEST
bitmask/valid_if_tests.cu
bitmask/set_nullmask_tests.cu
bitmask/bitmask_tests.cu)
bitmask/bitmask_tests.cu
bitmask/is_element_valid_tests.cpp)


###################################################################################################
Expand Down
92 changes: 92 additions & 0 deletions cpp/tests/bitmask/is_element_valid_tests.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#include <cudf_test/base_fixture.hpp>
#include <cudf_test/column_wrapper.hpp>

#include <cudf/copying.hpp>
#include <cudf/detail/is_element_valid.hpp>
#include <cudf/detail/iterator.cuh>

#include <thrust/iterator/counting_iterator.h>

namespace cudf {
namespace test {

struct IsElementValidTest : public BaseFixture {
};

TEST_F(IsElementValidTest, IsElementValidBasic)
{
fixed_width_column_wrapper<int32_t> col({1, 1, 1, 1, 1}, {1, 0, 0, 0, 1});
EXPECT_TRUE(cudf::detail::is_element_valid_sync(col, 0));
EXPECT_FALSE(cudf::detail::is_element_valid_sync(col, 1));
EXPECT_FALSE(cudf::detail::is_element_valid_sync(col, 2));
EXPECT_FALSE(cudf::detail::is_element_valid_sync(col, 3));
EXPECT_TRUE(cudf::detail::is_element_valid_sync(col, 4));
}

TEST_F(IsElementValidTest, IsElementValidLarge)
{
auto filter = [](auto i) { return static_cast<bool>(i % 3); };
auto val = thrust::make_counting_iterator(0);
auto valid = cudf::detail::make_counting_transform_iterator(0, filter);
size_type num_rows = 1000;

fixed_width_column_wrapper<int32_t> col(val, val + num_rows, valid);

for (int i = 0; i < num_rows; i++) {
EXPECT_EQ(cudf::detail::is_element_valid_sync(col, i), filter(i));
}
}

TEST_F(IsElementValidTest, IsElementValidOffset)
{
fixed_width_column_wrapper<int32_t> col({1, 1, 1, 1, 1}, {1, 0, 0, 0, 1});
{
auto offset_col = slice(col, {1, 5}).front();
EXPECT_FALSE(cudf::detail::is_element_valid_sync(offset_col, 0));
EXPECT_FALSE(cudf::detail::is_element_valid_sync(offset_col, 1));
EXPECT_FALSE(cudf::detail::is_element_valid_sync(offset_col, 2));
EXPECT_TRUE(cudf::detail::is_element_valid_sync(offset_col, 3));
}
{
auto offset_col = slice(col, {2, 5}).front();
EXPECT_FALSE(cudf::detail::is_element_valid_sync(offset_col, 0));
EXPECT_FALSE(cudf::detail::is_element_valid_sync(offset_col, 1));
EXPECT_TRUE(cudf::detail::is_element_valid_sync(offset_col, 2));
}
}

TEST_F(IsElementValidTest, IsElementValidOffsetLarge)
{
auto filter = [](auto i) { return static_cast<bool>(i % 3); };
size_type offset = 37;
auto val = thrust::make_counting_iterator(0);
auto valid = cudf::detail::make_counting_transform_iterator(0, filter);
size_type num_rows = 1000;

fixed_width_column_wrapper<int32_t> col(val, val + num_rows, valid);
auto offset_col = slice(col, {offset, num_rows}).front();

for (int i = 0; i < offset_col.size(); i++) {
EXPECT_EQ(cudf::detail::is_element_valid_sync(offset_col, i), filter(i + offset));
}
}

} // namespace test

} // namespace cudf
Loading