Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests for pylibcudf binaryops #15470

Merged
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
1b482bf
patch changes from original PR
brandon-b-miller Apr 4, 2024
0a46a0f
first halfway decent testing strategy
brandon-b-miller Apr 4, 2024
f5f33e6
refactor out a lot of repeated code
brandon-b-miller Apr 4, 2024
61ab85b
cleanup
brandon-b-miller Apr 4, 2024
ba03539
add some more tests
brandon-b-miller Apr 4, 2024
4911f33
start to work nulls in
brandon-b-miller Apr 5, 2024
9eba971
merge latest/resolve conflicts
brandon-b-miller Jul 1, 2024
562c765
start to refactor, pass add, almost sub
brandon-b-miller Jul 1, 2024
e533469
simpler approach
brandon-b-miller Jul 8, 2024
b81e017
Merge branch 'branch-24.08' into pylibcudf-binops-tests
brandon-b-miller Jul 8, 2024
e5f34e7
merge latest / resolve conflicts
brandon-b-miller Jul 9, 2024
322a7de
address reviews
brandon-b-miller Jul 9, 2024
ecbb895
refactor again
brandon-b-miller Jul 10, 2024
076b83d
address reviews, add tests
brandon-b-miller Jul 10, 2024
e06d5cd
fix a few ops
brandon-b-miller Jul 11, 2024
473845f
few more fixes
brandon-b-miller Jul 11, 2024
6e0c816
address more reviews
brandon-b-miller Jul 15, 2024
f0a62a9
Merge branch 'branch-24.08' into pylibcudf-binops-tests
brandon-b-miller Jul 16, 2024
77c709f
fix cpp test
brandon-b-miller Jul 16, 2024
4e653d6
fix atan2
brandon-b-miller Jul 16, 2024
6968f95
update assert_column_eq to handle nans
brandon-b-miller Jul 16, 2024
cc8afe9
only check nans for floating
brandon-b-miller Jul 16, 2024
12a748b
Merge branch 'branch-24.08' into pylibcudf-binops-tests
brandon-b-miller Jul 16, 2024
8c44149
address reviews
brandon-b-miller Jul 18, 2024
959e9a7
refactor again
brandon-b-miller Jul 18, 2024
367dfe9
merge /resolve
brandon-b-miller Jul 18, 2024
849e586
adjust docstring
brandon-b-miller Jul 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions cpp/include/cudf/binaryop.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -290,6 +290,17 @@ cudf::data_type binary_operation_fixed_point_output_type(binary_operator op,

namespace binops {

/**
* @brief Returns true if the binary operator is supported for the given input types.
*
* @param out The output data type
* @param lhs The left-hand cudf::data_type
* @param rhs The right-hand cudf::data_type
* @param op The binary operator
* @return true if the binary operator is supported for the given input types
*/
bool is_supported_operation(data_type out, data_type lhs, data_type rhs, binary_operator op);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also #16239. That PR exposes reflection for casting.


/**
* @brief Computes output valid mask for op between a column and a scalar
*
Expand Down
7 changes: 6 additions & 1 deletion cpp/src/binaryop/binaryop.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,11 @@
namespace cudf {
namespace binops {

bool is_supported_operation(data_type out, data_type lhs, data_type rhs, binary_operator op)
{
return cudf::binops::compiled::is_supported_operation(out, lhs, rhs, op);
}

/**
* @brief Computes output valid mask for op between a column and a scalar
*/
Expand Down Expand Up @@ -194,7 +199,7 @@ std::unique_ptr<column> binary_operation(LhsType const& lhs,
rmm::device_async_resource_ref mr)
{
if constexpr (std::is_same_v<LhsType, column_view> and std::is_same_v<RhsType, column_view>)
CUDF_EXPECTS(lhs.size() == rhs.size(), "Column sizes don't match");
CUDF_EXPECTS(lhs.size() == rhs.size(), "Column sizes don't match", std::invalid_argument);

if (lhs.type().id() == type_id::STRING and rhs.type().id() == type_id::STRING and
output_type.id() == type_id::STRING and
Expand Down
14 changes: 14 additions & 0 deletions python/cudf/cudf/_lib/pylibcudf/binaryop.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -84,3 +84,17 @@ cpdef Column binary_operation(
raise ValueError(f"Invalid arguments {lhs} and {rhs}")

return Column.from_libcudf(move(result))


def is_supported_operation(
brandon-b-miller marked this conversation as resolved.
Show resolved Hide resolved
DataType out,
DataType lhs,
DataType rhs,
binary_operator op
):
return cpp_binaryop.is_supported_operation(
brandon-b-miller marked this conversation as resolved.
Show resolved Hide resolved
out.c_obj,
lhs.c_obj,
rhs.c_obj,
op
)
42 changes: 31 additions & 11 deletions python/cudf/cudf/_lib/pylibcudf/libcudf/binaryop.pxd
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
# Copyright (c) 2020-2024, NVIDIA CORPORATION.

from libc.stdint cimport int32_t
from libc.stdint cimport bool, int32_t
from libcpp cimport bool
brandon-b-miller marked this conversation as resolved.
Show resolved Hide resolved
from libcpp.memory cimport unique_ptr
from libcpp.string cimport string

from cudf._lib.exception_handler cimport cudf_exception_handler
from cudf._lib.pylibcudf.libcudf.column.column cimport column
from cudf._lib.pylibcudf.libcudf.column.column_view cimport column_view
from cudf._lib.pylibcudf.libcudf.scalar.scalar cimport scalar
Expand All @@ -19,48 +21,66 @@ cdef extern from "cudf/binaryop.hpp" namespace "cudf" nogil:
TRUE_DIV
FLOOR_DIV
MOD
PMOD
PYMOD
POW
INT_POW
LOG_BASE
ATAN2
SHIFT_LEFT
SHIFT_RIGHT
SHIFT_RIGHT_UNSIGNED
BITWISE_AND
BITWISE_OR
BITWISE_XOR
LOGICAL_AND
LOGICAL_OR
EQUAL
NOT_EQUAL
LESS
GREATER
LESS_EQUAL
GREATER_EQUAL
NULL_EQUALS
NULL_NOT_EQUALS
BITWISE_AND
BITWISE_OR
BITWISE_XOR
LOGICAL_AND
LOGICAL_OR
NULL_MAX
brandon-b-miller marked this conversation as resolved.
Show resolved Hide resolved
NULL_MIN
GENERIC_BINARY
NULL_LOGICAL_AND
NULL_LOGICAL_OR
INVALID_BINARY

cdef unique_ptr[column] binary_operation (
const scalar& lhs,
const column_view& rhs,
binary_operator op,
data_type output_type
) except +
) except +cudf_exception_handler

cdef unique_ptr[column] binary_operation (
const column_view& lhs,
const scalar& rhs,
binary_operator op,
data_type output_type
) except +
) except +cudf_exception_handler

cdef unique_ptr[column] binary_operation (
const column_view& lhs,
const column_view& rhs,
binary_operator op,
data_type output_type
) except +
) except +cudf_exception_handler

cdef unique_ptr[column] binary_operation (
const column_view& lhs,
const column_view& rhs,
const string& op,
data_type output_type
) except +
) except +cudf_exception_handler

cdef extern from "cudf/binaryop.hpp" namespace "cudf::binops" nogil:
cdef bool is_supported_operation(
data_type output_type,
data_type lhs_type,
data_type rhs_type,
binary_operator op
) except +cudf_exception_handler
247 changes: 247 additions & 0 deletions python/cudf/cudf/pylibcudf_tests/test_binaryops.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can make a mapping like:

{
  "ADD": operator.add,
  "SUB": operator.sub,
  ...
  "PYMOD": operator.mod,
  ...
}

and use that for testing everything?

Original file line number Diff line number Diff line change
@@ -0,0 +1,247 @@
# Copyright (c) 2024, NVIDIA CORPORATION.


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

import numpy as np
import pyarrow as pa
import pytest
from utils import assert_column_eq

from cudf._lib import pylibcudf as plc


def idfn(param):
ltype, rtype, outtype = param
return f"{ltype}-{rtype}-{outtype}"
brandon-b-miller marked this conversation as resolved.
Show resolved Hide resolved


@pytest.fixture(params=[True, False], ids=["nulls", "no_nulls"])
def nulls(request):
return request.param


@pytest.fixture
def pa_data(request, nulls):
ltype, rtype, outtype = request.param
values = make_col(ltype, nulls), make_col(rtype, nulls), outtype
return values


@pytest.fixture
def plc_data(pa_data):
lhs, rhs, outtype = pa_data
return (
plc.interop.from_arrow(lhs),
plc.interop.from_arrow(rhs),
plc.interop.from_arrow(pa.from_numpy_dtype(np.dtype(outtype))),
)


def make_col(dtype, nulls):
if dtype == "int64":
data = [1, 2, 3, 4, 5]
pa_type = pa.int32()
elif dtype == "uint64":
data = [1, 2, 3, 4, 5]
pa_type = pa.uint32()
brandon-b-miller marked this conversation as resolved.
Show resolved Hide resolved
elif dtype == "float64":
data = [1.0, 2.0, 3.0, 4.0, 5.0]
pa_type = pa.float32()
elif dtype == "bool":
data = [True, False, True, False, True]
pa_type = pa.bool_()
elif dtype == "timestamp64[ns]":
data = [
np.datetime64("2022-01-01"),
np.datetime64("2022-01-02"),
np.datetime64("2022-01-03"),
np.datetime64("2022-01-04"),
np.datetime64("2022-01-05"),
]
pa_type = pa.timestamp("ns")
elif dtype == "timedelta64[ns]":
data = [
np.timedelta64(1, "ns"),
np.timedelta64(2, "ns"),
np.timedelta64(3, "ns"),
np.timedelta64(4, "ns"),
np.timedelta64(5, "ns"),
]
pa_type = pa.duration("ns")
else:
raise ValueError("Unsupported dtype")

if nulls:
data[3] = None

return pa.array(data, type=pa_type)


def _test_binaryop_inner(pa_data, plc_data, pyop, plc_op):
lhs_py, rhs_py, outty_py = pa_data
lhs_plc, rhs_plc, outty_plc = plc_data

def get_result():
return plc.binaryop.binary_operation(
lhs_plc,
rhs_plc,
plc_op,
outty_plc,
)

if not plc.binaryop.is_supported_operation(
outty_plc, lhs_plc.type(), rhs_plc.type(), plc_op
):
with pytest.raises(TypeError):
get_result()
return

expect = [
pyop(x, y) for x, y in zip(lhs_py.to_pylist(), rhs_py.to_pylist())
]
expect = pa.array(expect, type=outty_py)
got = get_result()
assert_column_eq(expect, got)


@pytest.mark.parametrize(
"pa_data",
[
("int64", "int64", "int64"),
("int64", "float64", "float64"),
("int64", "int64", "datetime64[ns]"),
],
indirect=True,
ids=idfn,
)
def test_add(pa_data, plc_data):
def add(x, y):
if x is None or y is None:
return None
return x + y

_test_binaryop_inner(
pa_data,
plc_data,
add,
plc.binaryop.BinaryOperator.ADD,
)


@pytest.mark.parametrize(
"pa_data",
[("int64", "int64", "int64"), ("int64", "float64", "float64")],
indirect=True,
ids=idfn,
)
def test_sub(pa_data, plc_data):
def sub(x, y):
if x is None or y is None:
return None
return x - y

_test_binaryop_inner(
pa_data,
plc_data,
sub,
plc.binaryop.BinaryOperator.SUB,
)


@pytest.mark.parametrize(
"pa_data",
[("int64", "int64", "int64"), ("int64", "float64", "float64")],
indirect=True,
ids=idfn,
)
def test_mul(pa_data, plc_data):
def mul(x, y):
if x is None or y is None:
return None
return x * y

_test_binaryop_inner(
pa_data,
plc_data,
mul,
plc.binaryop.BinaryOperator.MUL,
)


@pytest.mark.parametrize(
"pa_data",
[("int64", "int64", "int64"), ("int64", "float64", "float64")],
indirect=True,
ids=idfn,
)
def test_div(pa_data, plc_data):
def div(x, y):
if x is None or y is None:
return None
return x / y
brandon-b-miller marked this conversation as resolved.
Show resolved Hide resolved

_test_binaryop_inner(
pa_data,
plc_data,
div,
plc.binaryop.BinaryOperator.DIV,
)


@pytest.mark.parametrize(
"pa_data",
[("int64", "int64", "int64"), ("int64", "float64", "float64")],
indirect=True,
ids=idfn,
)
def test_floordiv(pa_data, plc_data):
def floordiv(x, y):
if x is None or y is None:
return None
return x // y

_test_binaryop_inner(
pa_data,
plc_data,
floordiv,
plc.binaryop.BinaryOperator.FLOOR_DIV,
)


@pytest.mark.parametrize(
"pa_data",
[("int64", "int64", "int64"), ("int64", "float64", "float64")],
indirect=True,
ids=idfn,
)
def test_truediv(pa_data, plc_data):
def truediv(x, y):
if x is None or y is None:
return None
return x / y

_test_binaryop_inner(
pa_data,
plc_data,
truediv,
plc.binaryop.BinaryOperator.TRUE_DIV,
)


@pytest.mark.parametrize(
"pa_data",
[("int64", "int64", "int64"), ("int64", "float64", "float64")],
indirect=True,
ids=idfn,
)
def test_mod(pa_data, plc_data):
def mod(x, y):
if x is None or y is None:
return None
return x % y

_test_binaryop_inner(
pa_data,
plc_data,
mod,
plc.binaryop.BinaryOperator.MOD,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python's modulus operator is implemented as PYMOD. We probably need more expansive inputs to test this code accurately.

Suggested change
plc.binaryop.BinaryOperator.MOD,
plc.binaryop.BinaryOperator.PYMOD,

Reference: https://stackoverflow.com/questions/1907565/c-and-python-different-behaviour-of-the-modulo-operation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need more expansive inputs to test this code accurately.

The goal of the pylibcudf tests is generally to test that the wrappers are implemented correctly: we defer to libcudf for covering the implementation correctness of all the kernels.

So I guess in this case we would want an input that can differentiate between getting MOD and PYMOD mixed up in the wrapper.

)
Loading