Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Python bindings for string literal support in AST #13073

Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
982af8a
string scalar support in AST - proof of concept
karthikeyann Mar 30, 2023
0a9eb86
Add cudf::ast::generic_scalar_device_view
karthikeyann Apr 4, 2023
50ee55d
remove filter by range example from test code
karthikeyann Apr 4, 2023
9735d51
cleanup docs
karthikeyann Apr 4, 2023
8653e61
Merge branch 'branch-23.06' of github.com:rapidsai/cudf into fea-stri…
karthikeyann Apr 4, 2023
7ad5c5d
add cython bindings, unit tests for string literal support in AST
karthikeyann Apr 5, 2023
3a40c31
Apply suggestions from code review
karthikeyann Apr 18, 2023
24b6589
Merge branch 'branch-23.06' into fea-cython-string_scalar_ast_compare
karthikeyann Apr 18, 2023
3405241
Merge branch 'branch-23.06' into fea-cython-string_scalar_ast_compare
karthikeyann Apr 20, 2023
4c44afb
cleanup cython Literal, update docs
karthikeyann Apr 21, 2023
74cd710
Merge branch 'branch-23.06' of github.com:rapidsai/cudf into fea-cyth…
karthikeyann Apr 21, 2023
5d80737
Merge branch 'branch-23.06' into fea-cython-string_scalar_ast_compare
karthikeyann Apr 25, 2023
c3d8b66
Merge branch 'branch-23.06' into fea-cython-string_scalar_ast_compare
karthikeyann Apr 26, 2023
215d4db
Merge branch 'branch-23.06' into fea-cython-string_scalar_ast_compare
karthikeyann May 2, 2023
fa31360
Update python/cudf/cudf/core/dataframe.py
vyasr May 3, 2023
ab9fd3d
Merge branch 'branch-23.06' into fea-cython-string_scalar_ast_compare
vyasr May 3, 2023
30c4b96
Merge branch 'branch-23.06' into fea-cython-string_scalar_ast_compare
karthikeyann May 5, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 3 additions & 13 deletions python/cudf/cudf/_lib/expressions.pxd
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2022, NVIDIA CORPORATION.
# Copyright (c) 2022-2023, NVIDIA CORPORATION.

from libc.stdint cimport int32_t, int64_t
from libcpp.memory cimport unique_ptr
Expand All @@ -9,25 +9,15 @@ from cudf._lib.cpp.expressions cimport (
literal,
operation,
)
from cudf._lib.cpp.scalar.scalar cimport numeric_scalar

ctypedef enum scalar_type_t:
INT
DOUBLE


ctypedef union int_or_double_scalar_ptr:
unique_ptr[numeric_scalar[int64_t]] int_ptr
unique_ptr[numeric_scalar[double]] double_ptr
from cudf._lib.cpp.scalar.scalar cimport numeric_scalar, scalar, string_scalar


cdef class Expression:
cdef unique_ptr[expression] c_obj


cdef class Literal(Expression):
cdef scalar_type_t c_scalar_type
cdef int_or_double_scalar_ptr c_scalar
cdef unique_ptr[scalar] c_scalar


cdef class ColumnReference(Expression):
Expand Down
25 changes: 9 additions & 16 deletions python/cudf/cudf/_lib/expressions.pyx
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2022, NVIDIA CORPORATION.
# Copyright (c) 2022-2023, NVIDIA CORPORATION.

from enum import Enum

Expand Down Expand Up @@ -77,27 +77,20 @@ class TableReference(Enum):
# restrictive at the moment.
cdef class Literal(Expression):
def __cinit__(self, value):
# TODO: Would love to find a better solution than unions for literals.
cdef int intval
cdef double doubleval

if isinstance(value, int):
self.c_scalar_type = scalar_type_t.INT
intval = value
self.c_scalar.int_ptr = make_unique[numeric_scalar[int64_t]](
intval, True
)
self.c_scalar.reset(new numeric_scalar[int64_t](value, True))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Shouldn't every new have a corresponding delete? Note that if we need to call delete, it can be done in the __dealloc__ method of this class
  • Can we continue to use a unique/shared pointer so we don't have to worry about defining __dealloc__ or calling delete?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I think I see what's going here, we're resetting the value of the unique pointer to a new numeric_scalar. But why do it this way instead of make_unique?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c_scalar is a unique pointer here. This line is calling unique_ptr constructor where argument is the object pointer. This unique pointer will delete the pointing object when it's destroyed.

Copy link
Contributor Author

@karthikeyann karthikeyann May 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make_unique creates the numeric_scalar unique pointer. It can't be type casted and stored to generic base pointer. Hence this approach. I saw similar approach in another Cython file as well.
https://github.com/rapidsai/cudf/blob/branch-23.06/python/cudf/cudf/_lib/scalar.pyx#L250

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I approved the PR, but still have a question. Why prefer:

c_scalar.reset(new ....)

versus:

c_scalar = make_unique(...)

?

Copy link
Contributor Author

@karthikeyann karthikeyann May 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would require releasing the unique_ptr of derived pointer anyway to store in unique_ptr<scalar> class.
unique_ptr[scalar] c_scalar = unique_ptr(static_cast<scalar*>(make_unique(...).release()));

Hence using the new operator directly.
Is 'releaseing the unique_ptr and typecasting' safer than using reset(new... ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it - thanks for clarifying! Definitely a TIL for me

Is 'releaseing the unique_ptr and typecasting' safer than using reset(new... ?

Do you mean is it safer from a Cython perspective? If so, the answer is that it should be no different than in C++. I would say whatever pattern you prefer in C++, we should use here as well.

self.c_obj = <expression_ptr> make_unique[libcudf_exp.literal](
<numeric_scalar[int64_t] &>dereference(self.c_scalar.int_ptr)
<numeric_scalar[int64_t] &>dereference(self.c_scalar)
)
elif isinstance(value, float):
self.c_scalar_type = scalar_type_t.DOUBLE
doubleval = value
self.c_scalar.double_ptr = make_unique[numeric_scalar[double]](
doubleval, True
self.c_scalar.reset(new numeric_scalar[double](value, True))
self.c_obj = <expression_ptr> make_unique[libcudf_exp.literal](
<numeric_scalar[double] &>dereference(self.c_scalar)
)
elif isinstance(value, str):
self.c_scalar.reset(new string_scalar(value.encode(), True))
self.c_obj = <expression_ptr> make_unique[libcudf_exp.literal](
<numeric_scalar[double] &>dereference(self.c_scalar.double_ptr)
<string_scalar &>dereference(self.c_scalar)
)


Expand Down
4 changes: 2 additions & 2 deletions python/cudf/cudf/core/_internals/expressions.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2022, NVIDIA CORPORATION.
# Copyright (c) 2022-2023, NVIDIA CORPORATION.

import ast
import functools
Expand Down Expand Up @@ -115,7 +115,7 @@ def visit_Name(self, node):
self.stack.append(ColumnReference(col_id))

def visit_Constant(self, node):
if not isinstance(node, ast.Num):
if not isinstance(node, (ast.Num, ast.Str)):
raise ValueError(
f"Unsupported literal {repr(node.value)} of type "
"{type(node.value).__name__}"
Expand Down
3 changes: 2 additions & 1 deletion python/cudf/cudf/core/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -7057,7 +7057,8 @@ def eval(self, expr: str, inplace: bool = False, **kwargs):
Specifically, `&` must be used for bitwise operators on integers,
not `and`, which is specifically for the logical and between
booleans.
* Only numerical types are currently supported.
* Only numerical types are currently supported on all operators.
* String types are supported only on comparison operators.
vyasr marked this conversation as resolved.
Show resolved Hide resolved
* Operators generally will not cast automatically. Users are
responsible for casting columns to suitable types before
evaluating a function.
Expand Down
3 changes: 3 additions & 0 deletions python/cudf/cudf/tests/test_dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -9820,6 +9820,9 @@ def df_eval(request):
float,
),
("a_b_are_equal = (a == b)", int),
("a > b", str),
("a < '1'", str),
('a == "1"', str),
vyasr marked this conversation as resolved.
Show resolved Hide resolved
],
)
def test_dataframe_eval(df_eval, expr, dtype):
Expand Down