Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement DataFrame.eval using libcudf ASTs #8022

Merged
merged 69 commits into from
Apr 28, 2022
Merged
Show file tree
Hide file tree
Changes from 63 commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
9ecaa59
Expose ast operator and reference enums to Python.
vyasr Apr 20, 2021
8ad2866
Expose limited usage of literal to Python.
vyasr Apr 20, 2021
cdba8e9
Add column reference to Cython and make all constructors pass through…
vyasr Apr 20, 2021
dfd9e25
Switch raw pointers to unique pointers and remove manual deallocation.
vyasr Apr 20, 2021
7fc023e
Switch unique pointers to shared pointers and add expression to Cytho…
vyasr Apr 20, 2021
0e716e0
Export compute_column to Python.
vyasr Apr 20, 2021
f5db47f
Minor Cython cleanup.
vyasr Apr 20, 2021
98049b9
Add basic Python API.
vyasr Apr 20, 2021
ea8d39e
Add Python AST parsing functionality to simplify expression construct…
vyasr Apr 21, 2021
d75b5e9
Add map for proper operator mapping from Python AST.
vyasr Apr 21, 2021
ab935a6
Remove unnecessary push/pop ops and add beginnings of new API.
vyasr Apr 21, 2021
423d47c
Use column names directly rather than passing the df through.
vyasr Apr 21, 2021
163b1fc
Add better API and implement a non-recursive version of the tree-pars…
vyasr Apr 21, 2021
eb26e86
Remove extra list being passed around.
vyasr Apr 22, 2021
c16c3be
Explicitly separate the parse stack from the stored list of nodes to …
vyasr Apr 22, 2021
e3748aa
Separate all logic into individual lines for profiling.
vyasr Apr 22, 2021
225ff96
Simplify logic by only storing a single pointer and rely on casting i…
vyasr Apr 22, 2021
77b871e
Switch back from shared_ptr to unique_ptr.
vyasr Apr 22, 2021
d2da703
Substitute df._column_names for df.columns.tolist().
vyasr Apr 22, 2021
b37e505
Delete older unnecessary versions.
vyasr Apr 22, 2021
f7048d5
Inline more of the object construction for brevity.
vyasr Apr 22, 2021
e27a2a3
Some further simplification and removal of extra functions.
vyasr Apr 22, 2021
22a6a0e
Basic working version of literals.
vyasr Apr 23, 2021
567d138
Add docstring, rename ast_visit to ast_traverse, and cleanup code.
vyasr Apr 23, 2021
eacce8a
Add support for comparison operators.
vyasr Apr 23, 2021
2b2928f
Enable binops, note that we may need to stack a compatibility API ove…
vyasr Apr 23, 2021
e040aa2
Add docstring for evaluate_expression.
vyasr Apr 23, 2021
6f543b2
Update all Python bindings to new C++ APIs.
vyasr Aug 25, 2021
971a5fd
Fix compilation errors.
vyasr Aug 25, 2021
a99d9b6
Various cleanup tasks.
vyasr Aug 25, 2021
963683f
Update to stop using deprecated APIs and update copyright years.
vyasr Apr 8, 2022
14b45bb
Implement basic version of eval.
vyasr Apr 8, 2022
2c8b5dc
Fix implementation of comparison ops.
vyasr Apr 8, 2022
cdb3b25
Add some basic tests of eval.
vyasr Apr 8, 2022
cce5983
Add support for chained operators.
vyasr Apr 8, 2022
dc2f735
Remove reliance on implicit field ordering.
vyasr Apr 8, 2022
d298e35
Unify comparison operator logic to work identically for one or multip…
vyasr Apr 8, 2022
dfdb91e
Handle bool ops correctly relative to binops, fix the return type of …
vyasr Apr 9, 2022
8575706
Unify CompareOp and BoolOp branches.
vyasr Apr 9, 2022
977a318
Enable C++ exception propagation and execution without the GIL.
vyasr Apr 11, 2022
5c162f7
Add support for unary functions.
vyasr Apr 11, 2022
b11924d
Improve error handling for constants.
vyasr Apr 11, 2022
2379c4d
Simplify testing.
vyasr Apr 11, 2022
53ed0fd
Some cleanup.
vyasr Apr 11, 2022
efe03cf
Fix recursive cases and add more tests.
vyasr Apr 11, 2022
c0316f2
Minor cleanup.
vyasr Apr 11, 2022
8c2da11
Improve some code style and formatting.
vyasr Apr 11, 2022
8fcca8d
Enable unary ops and do some cleanup.
vyasr Apr 11, 2022
76505b6
Some attempts to streamline the function flow.
vyasr Apr 11, 2022
6371149
Enable caching properly.
vyasr Apr 12, 2022
4c27f70
Fix docstring.
vyasr Apr 12, 2022
954b998
Test out approach using a NodeVisitor instead of a recursive function.
vyasr Apr 12, 2022
5198aac
Remove recursive function in favor of visitor.
vyasr Apr 13, 2022
3f3885f
Employ a union to differentiate between different scalar types for li…
vyasr Apr 14, 2022
ff08921
Partially enable USub and enable UAdd.
vyasr Apr 14, 2022
279c916
Enable assignment and inplace operations.
vyasr Apr 14, 2022
237c54f
Move ast.evaluate_expression to transform.compute_column to match C++…
vyasr Apr 14, 2022
9f64040
Switch to using a list of columns for the Cython API.
vyasr Apr 14, 2022
7a228d2
Some minor streamlining of the logic.
vyasr Apr 14, 2022
aedd3f1
Fix comment.
vyasr Apr 14, 2022
8ba26bc
Simplify casting and improve explanation.
vyasr Apr 14, 2022
6c1543e
Fix missing parenthesis.
vyasr Apr 19, 2022
31241be
Switch from a simple 'in' check to using a regex to avoid treating ==…
vyasr Apr 21, 2022
2623d0a
Move all 'ast' files to 'expressions' to reflect C++ namespaces accur…
vyasr Apr 26, 2022
9b0d4e4
Address some PR comments.
vyasr Apr 26, 2022
4d6898e
Address remaining PR comments aside from splitting out the visitor in…
vyasr Apr 26, 2022
d8f0ebc
Move expression parsing logic into a separate pure Python module.
vyasr Apr 26, 2022
2c58b60
Address final PR comments.
vyasr Apr 27, 2022
f009e5b
Fix import issue hidden by locally persistent files.
vyasr Apr 28, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions cpp/include/cudf/ast/expressions.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2020-2021, NVIDIA CORPORATION.
* Copyright (c) 2020-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -21,6 +21,8 @@
#include <cudf/types.hpp>
#include <cudf/utilities/error.hpp>

#include <cstdint>

namespace cudf {
namespace ast {

Expand Down Expand Up @@ -53,7 +55,7 @@ struct expression {
/**
* @brief Enum of supported operators.
*/
enum class ast_operator {
enum class ast_operator : int32_t {
// Binary operators
ADD, ///< operator +
SUB, ///< operator -
Expand Down
3 changes: 2 additions & 1 deletion python/cudf/cudf/_lib/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
# Copyright (c) 2020-2021, NVIDIA CORPORATION.
# Copyright (c) 2020-2022, NVIDIA CORPORATION.
import numpy as np

from . import (
ast,
avro,
binaryop,
concat,
Expand Down
33 changes: 33 additions & 0 deletions python/cudf/cudf/_lib/ast.pxd
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Copyright (c) 2022, NVIDIA CORPORATION.

from libc.stdint cimport int32_t, int64_t
from libcpp.memory cimport unique_ptr

from cudf._lib.cpp.ast cimport column_reference, expression, literal, operation
from cudf._lib.cpp.scalar.scalar cimport numeric_scalar

ctypedef enum scalar_type_t:
INT
DOUBLE


ctypedef union int_or_double_scalar_ptr:
unique_ptr[numeric_scalar[int64_t]] int_ptr
unique_ptr[numeric_scalar[double]] double_ptr


cdef class Expression:
cdef unique_ptr[expression] c_obj


cdef class Literal(Expression):
cdef scalar_type_t c_scalar_type
cdef int_or_double_scalar_ptr c_scalar


cdef class ColumnReference(Expression):
pass


cdef class Operation(Expression):
pass
Loading