Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add more Spark Expressions #1724

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

lucas-nelson-uiuc
Copy link
Contributor

What type of PR is this? (check all applicable)

  • 💾 Refactor
  • ✨ Feature
  • 🐛 Bug Fix
  • 🔧 Optimization
  • 📝 Documentation
  • ✅ Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

Added the following methods to SparkLikeExpr and SparkLikeNamespace:

  • any
  • all
  • null_count
  • any_horizontal

Copied respective tests over - couldn't run them without Java on my machine but running them locally on their respective test datasets worked for me.

Let me know if anything needs to be updated!

@lucas-nelson-uiuc lucas-nelson-uiuc changed the title Missing spark expr feat: add more Spark Expressions Jan 4, 2025
Copy link
Collaborator

@EdAbati EdAbati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for looking into this! 🙏

tests/spark_like_test.py Outdated Show resolved Hide resolved
def _all(_input: Column) -> Column:
from pyspark.sql import functions as F # noqa: N812

return F.bool_and(_input)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool_and and bool_or are only available from pyspark 3.5, while narwhals minimum dependency is 3.3

Is there a way to do write these for older versions of spark?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be new functionality, maybe we could bump the minimum pyspark version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not opposed ^ pyspark 3.5 introduced a whole batch of functions that would let us work on other expressions like the ones mentioned in #1714

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems the easier way forward. Alternatively, raise for those methods and backend_version pre 3.5. E.g.

if self._backend_version < (3, 5):
    msg = ...
    raise NotImplementedError(msg)

...

Copy link
Collaborator

@EdAbati EdAbati Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd definetly make it easier for us :) Happy to make a PR for increasing the version later this afternoon

My only worry is that it may hurt adoption a bit. For what I have seen, updating to the latest pyspark is not always easy in enterprise (may be dependent on vendors,may require update of the cluster, etc.). But we can worry about loosening the dependency after we implement all the methods 👌

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the perspective - sure, if it's not too painful to support 3.3+, i'd say that that's ok

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moving the discussion to #1744

@lucas-nelson-uiuc
Copy link
Contributor Author

Hey @EdAbati ,

Took an initial swing at implementing the replace_strict() method - think I took care of everything except for handling the test_replace_non_full test (checks that replacement is exhaustive). Left some thoughts and other questions in my commit - lmk what you think!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants