-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add more Spark Expressions #1724
base: main
Are you sure you want to change the base?
feat: add more Spark Expressions #1724
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for looking into this! 🙏
def _all(_input: Column) -> Column: | ||
from pyspark.sql import functions as F # noqa: N812 | ||
|
||
return F.bool_and(_input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool_and
and bool_or
are only available from pyspark 3.5, while narwhals minimum dependency is 3.3
Is there a way to do write these for older versions of spark?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would be new functionality, maybe we could bump the minimum pyspark version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not opposed ^ pyspark 3.5 introduced a whole batch of functions that would let us work on other expressions like the ones mentioned in #1714
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems the easier way forward. Alternatively, raise for those methods and backend_version pre 3.5. E.g.
if self._backend_version < (3, 5):
msg = ...
raise NotImplementedError(msg)
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd definetly make it easier for us :) Happy to make a PR for increasing the version later this afternoon
My only worry is that it may hurt adoption a bit. For what I have seen, updating to the latest pyspark is not always easy in enterprise (may be dependent on vendors,may require update of the cluster, etc.). But we can worry about loosening the dependency after we implement all the methods 👌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the perspective - sure, if it's not too painful to support 3.3+, i'd say that that's ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moving the discussion to #1744
Hey @EdAbati , Took an initial swing at implementing the |
What type of PR is this? (check all applicable)
Related issues
Checklist
If you have comments or can explain your changes, please do so below
Added the following methods to
SparkLikeExpr
andSparkLikeNamespace
:any
all
null_count
any_horizontal
Copied respective tests over - couldn't run them without Java on my machine but running them locally on their respective test datasets worked for me.
Let me know if anything needs to be updated!