-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce NamedColumn
concept in cudf-polars
#15914
Introduce NamedColumn
concept in cudf-polars
#15914
Conversation
class NamedExpr(Expr): | ||
__slots__ = ("name", "children") | ||
_non_child = ("dtype", "name") | ||
class NamedExpr: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decided to deliberately not make this one an Expr
(it should not appear when evaluating expressions themselves, only when constructing return values in dataframe (IR
) nodes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be helpful to leave as a comment.
class NamedExpr: | |
class NamedExpr: | |
# NamedExpr does not inherit from Expr because it should not appear when | |
# evaluating expressions themselves, only when constructing return values | |
# in dataframe (IR) nodes). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Names in the result dataframe only appear from PyExprIR and thence NamedExpr nodes. To avoid name tracking issues, only require a name when translating a NamedExpr.
Expressions must now be translated with the node which is to provide the schema active.
769b248
to
9b87759
Compare
We can't decide expression-by-expression whether the result should be broadcast to the size of the context DataFrame. It is only when we return "out" to construct a new DataFrame (i.e. when we are evaluating an IR node) that we have the necessary information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally the Python code looks good
This looks good (to the best of my knowledge). Maybe @vyasr or @brandon-b-miller should double check this though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked over this PR to acquaint myself with more of the internals of cudf-polars. I have just a couple comments. Thanks!
class NamedExpr(Expr): | ||
__slots__ = ("name", "children") | ||
_non_child = ("dtype", "name") | ||
class NamedExpr: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be helpful to leave as a comment.
class NamedExpr: | |
class NamedExpr: | |
# NamedExpr does not inherit from Expr because it should not appear when | |
# evaluating expressions themselves, only when constructing return values | |
# in dataframe (IR) nodes). |
/merge |
Description
Simplify name tracking in expression evaluation by only requiring names for columns when putting them in to a
DataFrame
. At the same time, this allows us to have one place where we broadcast-expandScalar
s to the size of theDataFrame
, so we can expunge tracking them in theDataFrame
itself.Additionally, adapt to minor changes on the polars side in terms of translating the DSL: we no longer need to handle CSE expressions specially, and sorting by multiple keys takes a list of
descending
flags, rather than a single bool as previously.Checklist