[SPARK-49792][PYTHON][BUILD] Upgrade to numpy 2 for building and testing Spark branches #48180

xinrong-meng · 2024-09-20T08:11:07Z

What changes were proposed in this pull request?

Upgrade numpy to 2.1.0 for building and testing Spark branches.

Failed tests are categorized into the following groups:

Most of test failures fixed are related to Inconsistent Return Types Between numpy 1.26.4 and numpy 2.1.0 in pandas 2.2.2 pandas-dev/pandas#59838 (comment).
Replaced np.mat with np.asmatrix.
TODO: SPARK-49793

Why are the changes needed?

Ensure compatibility with newer NumPy, which is utilized by Pandas (on Spark).

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests.

Was this patch authored or co-authored using generative AI tooling?

No.

bjornjorgensen · 2024-09-22T18:12:07Z

there is a new panda version https://pandas.pydata.org/pandas-docs/version/2.2.3/whatsnew/v2.2.3.html that have support for numpy 2.1pandas-dev/pandas#59444

xinrong-meng · 2024-09-26T03:15:58Z

Thank you @bjornjorgensen!
I think we can separate the pandas upgrade from the numpy upgrade, as the current pandas version should be compatible with numpy 2.1.0 as well.

xinrong-meng · 2024-09-26T05:17:50Z

python/pyspark/ml/tests/test_functions.py

@@ -193,6 +195,10 @@ def predict(inputs):
        batch_sizes = preds["preds"].to_numpy()
        self.assertTrue(all(batch_sizes <= batch_size))

+    # TODO(SPARK-49793): enable the test below


@WeichenXu123 may I get your input on that please?
More details can be found here https://issues.apache.org/jira/browse/SPARK-49793.

do you have error message and error stack for numpy2 + caching ?

Would you please see https://issues.apache.org/jira/browse/SPARK-49793? There is no error but the results are unexpected.

Got it , need some time to investigation, but we can disable it as a workaround for now.

Sounds good, thank you!

also cc @leewyang as the test author

codesorcery · 2024-10-01T12:35:12Z

Maybe helpful here: the Ruff linter/formatter has some rules to check for NumPy 2 deprecations (https://docs.astral.sh/ruff/rules/numpy2-deprecation/).
I intended to create a pull request for adding those checks to the build pipeline after #47083 was merged, but unfortunately didn't find the time back then.

xinrong-meng · 2024-10-07T03:21:53Z

The test failures we are trying to fix here are almost all related to this issue. Thank you @codesorcery for sharing!

xinrong-meng · 2024-10-11T00:39:50Z

Retriggered irrelevant tests

xinrong-meng · 2024-10-11T04:42:53Z

[info] - interrupt all - background queries, foreground interrupt *** FAILED *** (20 seconds, 50 milliseconds)
[info]   The code passed to eventually never returned normally. Attempted 30 times over 20.046569918 seconds. Last failure message: q2Interrupted was false. (SparkSessionE2ESuite.scala:71)
[info]   org.scalatest.exceptions.TestFailedDueToTimeoutException:

Retriggering

xinrong-meng · 2024-10-14T01:36:34Z

@HyukjinKwon @zhengruifeng @dongjoon-hyun would you please review?

zhengruifeng · 2024-10-14T01:52:39Z

In General LGTM, pending @WeichenXu123 's feedback on the failed ml caching test

dongjoon-hyun

+1, LGTM. Thank you, @xinrong-meng and all.

dongjoon-hyun · 2024-10-15T22:28:56Z

Merged to master for Apache Spark 4.0.0 on February 2025.

xinrong-meng · 2024-10-16T00:50:31Z

Thank you @dongjoon-hyun !

…ing Spark branches ### What changes were proposed in this pull request? Upgrade numpy to 2.1.0 for building and testing Spark branches. Failed tests are categorized into the following groups: - Most of test failures fixed are related to pandas-dev/pandas#59838 (comment). - Replaced np.mat with np.asmatrix. - TODO: SPARK-49793 ### Why are the changes needed? Ensure compatibility with newer NumPy, which is utilized by Pandas (on Spark). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48180 from xinrong-meng/np_upgrade. Authored-by: Xinrong Meng <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

github-actions bot added the BUILD label Sep 20, 2024

github-actions bot added PYTHON PANDAS API ON SPARK ML labels Sep 24, 2024

xinrong-meng changed the title ~~[WIP] Upgrade numpy to 2.1.0~~ [SPARK-49792][PS][BUILD] Upgrade numpy to 2.1.0 Sep 26, 2024

xinrong-meng changed the title ~~[SPARK-49792][PS][BUILD] Upgrade numpy to 2.1.0~~ [SPARK-49792][PS][BUILD] Upgrade numpy to 2.1.0 for building and testing Spark branches Sep 26, 2024

xinrong-meng marked this pull request as ready for review September 26, 2024 03:17

xinrong-meng changed the title ~~[SPARK-49792][PS][BUILD] Upgrade numpy to 2.1.0 for building and testing Spark branches~~ [SPARK-49792][PS][BUILD] Upgrade numpy for building and testing Spark branches Sep 26, 2024

xinrong-meng changed the title ~~[SPARK-49792][PS][BUILD] Upgrade numpy for building and testing Spark branches~~ [SPARK-49792][PS][BUILD] Upgrade to numpy 2 for building and testing Spark branches Sep 26, 2024

xinrong-meng commented Sep 26, 2024

View reviewed changes

xinrong-meng force-pushed the np_upgrade branch from 13d9f7a to 97d465c Compare October 7, 2024 02:52

xinrong-meng changed the title ~~[SPARK-49792][PS][BUILD] Upgrade to numpy 2 for building and testing Spark branches~~ [WIP][SPARK-49792][PS][BUILD] Upgrade to numpy 2 for building and testing Spark branches Oct 7, 2024

xinrong-meng marked this pull request as draft October 7, 2024 03:17

github-actions bot added the MLLIB label Oct 8, 2024

np 2.1.0

cde53ea

xinrong-meng force-pushed the np_upgrade branch from 53e3775 to cde53ea Compare October 10, 2024 02:36

xinrong-meng changed the title ~~[WIP][SPARK-49792][PS][BUILD] Upgrade to numpy 2 for building and testing Spark branches~~ [SPARK-49792][PS][BUILD] Upgrade to numpy 2 for building and testing Spark branches Oct 11, 2024

xinrong-meng marked this pull request as ready for review October 11, 2024 00:38

xinrong-meng requested review from WeichenXu123 and zhengruifeng October 14, 2024 02:04

zhengruifeng approved these changes Oct 15, 2024

View reviewed changes

zhengruifeng changed the title ~~[SPARK-49792][PS][BUILD] Upgrade to numpy 2 for building and testing Spark branches~~ [SPARK-49792][PYTHON][BUILD] Upgrade to numpy 2 for building and testing Spark branches Oct 15, 2024

dongjoon-hyun approved these changes Oct 15, 2024

View reviewed changes

dongjoon-hyun closed this in 0e75d19 Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-49792][PYTHON][BUILD] Upgrade to numpy 2 for building and testing Spark branches #48180

[SPARK-49792][PYTHON][BUILD] Upgrade to numpy 2 for building and testing Spark branches #48180

xinrong-meng commented Sep 20, 2024 •

edited

Loading

bjornjorgensen commented Sep 22, 2024

xinrong-meng commented Sep 26, 2024

xinrong-meng Sep 26, 2024

WeichenXu123 Oct 11, 2024

xinrong-meng Oct 14, 2024

WeichenXu123 Oct 14, 2024

xinrong-meng Oct 14, 2024

zhengruifeng Oct 15, 2024

codesorcery commented Oct 1, 2024

xinrong-meng commented Oct 7, 2024

xinrong-meng commented Oct 11, 2024

xinrong-meng commented Oct 11, 2024

xinrong-meng commented Oct 14, 2024

zhengruifeng commented Oct 14, 2024

dongjoon-hyun left a comment

dongjoon-hyun commented Oct 15, 2024

xinrong-meng commented Oct 16, 2024

[SPARK-49792][PYTHON][BUILD] Upgrade to numpy 2 for building and testing Spark branches #48180

[SPARK-49792][PYTHON][BUILD] Upgrade to numpy 2 for building and testing Spark branches #48180

Conversation

xinrong-meng commented Sep 20, 2024 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

bjornjorgensen commented Sep 22, 2024

xinrong-meng commented Sep 26, 2024

xinrong-meng Sep 26, 2024

Choose a reason for hiding this comment

WeichenXu123 Oct 11, 2024

Choose a reason for hiding this comment

xinrong-meng Oct 14, 2024

Choose a reason for hiding this comment

WeichenXu123 Oct 14, 2024

Choose a reason for hiding this comment

xinrong-meng Oct 14, 2024

Choose a reason for hiding this comment

zhengruifeng Oct 15, 2024

Choose a reason for hiding this comment

codesorcery commented Oct 1, 2024

xinrong-meng commented Oct 7, 2024

xinrong-meng commented Oct 11, 2024

xinrong-meng commented Oct 11, 2024

xinrong-meng commented Oct 14, 2024

zhengruifeng commented Oct 14, 2024

dongjoon-hyun left a comment

Choose a reason for hiding this comment

dongjoon-hyun commented Oct 15, 2024

xinrong-meng commented Oct 16, 2024

xinrong-meng commented Sep 20, 2024 •

edited

Loading