-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Abstract polars function expression nodes to ensure they are serializable #17418
Abstract polars function expression nodes to ensure they are serializable #17418
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two suggestions
@@ -31,22 +30,49 @@ | |||
__all__ = ["BooleanFunction"] | |||
|
|||
|
|||
class BooleanFunctionName(Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: use IntEnum
.
Also, let's nest this class inside the BooleanFunction
class below, and just call it Name
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 3f69d98 .
@staticmethod | ||
def get_polars_type(tp: BooleanFunctionName): | ||
function, name = str(tp).split(".") | ||
if function != "BooleanFunction": | ||
raise ValueError("BooleanFunction required") | ||
return getattr(BooleanFunctionName, name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this doesn't return a polars object, but one of ours. Also, because we're calling getattr, that suggests we should make this a classmethod
@classmethod
def from_polars(cls, obj: pl_expr.BooleanFunction) -> Self:
function, name = str(obj).split(".", maxsplit=1)
if function != "BooleanFunction":
raise ValueError("BooleanFunction required")
return getattr(cls, name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I totally mix everything up here, thanks for catching. Done in be7fa52 .
if self.name == pl_expr.BooleanFunction.IsIn and not all( | ||
if self.name == BooleanFunctionName.IsIn and not all( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Because we have enums, we should now use is
for comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 7d009f9 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Peter, I think this makes sense. Should we add a test that these things are now pickleable, or does that need other bits and pieces?
Thanks for the review Lawrence. I agree, tests are a good idea. I've made a small improvement in 9b54437 that I discovered could be better while writing some tests in 5a04207. Those tests check that serialization of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small further suggestions
@pytest.mark.parametrize( | ||
"function", [BooleanFunction, TemporalFunction, StringFunction] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's abstract this into a reusable fixture:
@pytest.fixture(params=["BooleanFunction", "StringFunction", "TemporalFunction"])
def function(request):
return request.param
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, done in fcf820f . I've kept the module references instead of using strings and resolved them in the tests with __name__
, let me know if you have a strong preference for strings instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that's fine.
/merge |
Description
Use
Enum
s to define Python types as references topolars.polars._expr_nodes.*Function
as to ensurecudf_polars.dsl.expressions
specializations ofExpr
are serializable.Checklist