-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-49792][PYTHON][BUILD] Upgrade to numpy 2 for building and testing Spark branches #48180
Conversation
there is a new panda version https://pandas.pydata.org/pandas-docs/version/2.2.3/whatsnew/v2.2.3.html that have support for numpy 2.1pandas-dev/pandas#59444 |
Thank you @bjornjorgensen! |
@@ -193,6 +195,10 @@ def predict(inputs): | |||
batch_sizes = preds["preds"].to_numpy() | |||
self.assertTrue(all(batch_sizes <= batch_size)) | |||
|
|||
# TODO(SPARK-49793): enable the test below |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WeichenXu123 may I get your input on that please?
More details can be found here https://issues.apache.org/jira/browse/SPARK-49793.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you have error message and error stack for numpy2 + caching ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you please see https://issues.apache.org/jira/browse/SPARK-49793? There is no error but the results are unexpected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it , need some time to investigation, but we can disable it as a workaround for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also cc @leewyang as the test author
Maybe helpful here: the Ruff linter/formatter has some rules to check for NumPy 2 deprecations (https://docs.astral.sh/ruff/rules/numpy2-deprecation/). |
13d9f7a
to
97d465c
Compare
The test failures we are trying to fix here are almost all related to this issue. Thank you @codesorcery for sharing! |
53e3775
to
cde53ea
Compare
Retriggered irrelevant tests |
Retriggering |
@HyukjinKwon @zhengruifeng @dongjoon-hyun would you please review? |
In General LGTM, pending @WeichenXu123 's feedback on the failed ml caching test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @xinrong-meng and all.
Merged to master for Apache Spark 4.0.0 on February 2025. |
Thank you @dongjoon-hyun ! |
…ing Spark branches ### What changes were proposed in this pull request? Upgrade numpy to 2.1.0 for building and testing Spark branches. Failed tests are categorized into the following groups: - Most of test failures fixed are related to pandas-dev/pandas#59838 (comment). - Replaced np.mat with np.asmatrix. - TODO: SPARK-49793 ### Why are the changes needed? Ensure compatibility with newer NumPy, which is utilized by Pandas (on Spark). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48180 from xinrong-meng/np_upgrade. Authored-by: Xinrong Meng <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
Upgrade numpy to 2.1.0 for building and testing Spark branches.
Failed tests are categorized into the following groups:
Why are the changes needed?
Ensure compatibility with newer NumPy, which is utilized by Pandas (on Spark).
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing tests.
Was this patch authored or co-authored using generative AI tooling?
No.