-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use spark3 by default #1549
Use spark3 by default #1549
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
88bee0e
to
557a2fe
Compare
Codecov Report
@@ Coverage Diff @@
## staging #1549 +/- ##
============================================
+ Coverage 0.00% 62.07% +62.07%
============================================
Files 84 84
Lines 8492 8492
============================================
+ Hits 0 5271 +5271
- Misses 0 3221 +3221
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
@laserprec @miguelgfierro I wonder whether it is possible to specify a custom command line flag that tells |
I don't know of an approach other than using extras. Was the issue with this change that version 3 is not installed by default? |
Yes, sometimes pip installed v3 (when I tried on a DSVM), others v2 (when Jianjie tried on Windows). I think we would like to install v3 by default (since it's the version we mainly support), but maybe we would like to leave v2 as an option (although I don't see any use case where v2 is a strong requirement). |
@laserprec do you know why that happens? pip on windows should by default install the latest pyspark release (3.2.0), this is what happens on my local machine when i do |
Ah, looking into this, it turns out my local environment is corrupted with a global dependency that pings the pyspark to 2 😅 . After removing it, I think the question I have with keeping dependency anchored in range of Like @gramhagen mentioned, there isn't a good way to switch between spark3 and spark2 unless we introduce a new extra |
So if you follow the installation process as in the README in a clean environment, do you get version 3? |
I reran the installation in a clean docker image, we will get the latest pyspark version with FYI @anargyri, I think our smoke tests failed on pyspark==3.2.0. For more detail see: https://github.com/microsoft/recommenders/runs/3928935302?check_suite_focus=true No issue running on pyspark==3.1.2: 3.2.0 is released 9 hours ago 😄. We may also want to adjust the upper bound to |
:-> Yes, I agree, let's do |
Sounds good. I can apply this change. I think we have a consensus on not introducing an extra dependency solely for spark2 support (e.g.
I think eventually we want to update the lower bound to |
84cffea
to
ce81bdd
Compare
2.4.5<=pyspark<3.2.0 is fine with me, can we add an issue capturing what needs to change to support >=3.2.0? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Done. Please see #1553. The errors are accessible in the pipeline history and are linked in the issue. |
Description
Setpyspark
dependency to>=3.0.0, <4.0.0
>=2.4.5, <3.2.0
Checklist:
staging branch
and not tomain branch
.