-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-47725][INFRA] Set up the CI for pyspark-connect package #45870
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
e4b2f73
to
0f4a33d
Compare
0a6d486
to
f314b23
Compare
HyukjinKwon
added a commit
that referenced
this pull request
Apr 5, 2024
…ckage ### What changes were proposed in this pull request? This PR is a followup of #45150 that adds the new `shell` module into PyPI package. ### Why are the changes needed? So PyPI package contains `shell` module. ### Does this PR introduce _any_ user-facing change? No, the main change has not been released yet. ### How was this patch tested? The test case will be added at #45870. It was found out during working on that PR. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45882 from HyukjinKwon/SPARK-47081-followup. Lead-authored-by: Hyukjin Kwon <[email protected]> Co-authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
dongjoon-hyun
pushed a commit
that referenced
this pull request
Apr 5, 2024
…ible with pyspark-connect ### What changes were proposed in this pull request? This PR proposes to make `pyspark.testing.connectutils` compatible with `pyspark-connect`. ### Why are the changes needed? This is the base work to set up the CI for pyspark-connect. ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Tested in #45870. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45887 from HyukjinKwon/SPARK-47735. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun
pushed a commit
that referenced
this pull request
Apr 5, 2024
…ts SPARK_CONNECT_TESTING_REMOTE env ### What changes were proposed in this pull request? This PR is a followup of #45868 that proposes to make testing script to inherits SPARK_CONNECT_TESTING_REMOTE environment variable. ### Why are the changes needed? So the testing script can set `SPARK_CONNECT_TESTING_REMOTE`, and makes the env effective. ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Manually tested at #45870 ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45886 from HyukjinKwon/SPARK-47724-followup. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
cf5128a
to
3db677f
Compare
This was referenced Apr 7, 2024
Closed
HyukjinKwon
added a commit
that referenced
this pull request
Apr 7, 2024
…sts finished ### What changes were proposed in this pull request? This PR proposes to drop the tables after tests finished. ### Why are the changes needed? - To clean up resources properly. - It can affect other test cases when only one session is being used across other tests. ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Tested in #45870. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45913 from HyukjinKwon/SPARK-46722-followup. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
HyukjinKwon
added a commit
that referenced
this pull request
Apr 7, 2024
…ith pyspark-connect ### What changes were proposed in this pull request? This PR proposes to make `pyspark.worker_utils` compatible with `pyspark-connect`. ### Why are the changes needed? In order for `pyspark-connect` to work without classic PySpark packages and dependencies. Spark Connect does not support `Broadcast` and `Accumulator`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Yes, at #45870. Once CI is setup there, it will be tested there properly. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45914 from HyukjinKwon/SPARK-47751. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
HyukjinKwon
added a commit
that referenced
this pull request
Apr 7, 2024
… with pyspark-connect ### What changes were proposed in this pull request? This PR proposes to make `pyspark.testing` compatible with `pyspark-connect` by using noop context manager `contextlib.nullcontext` instead of `QuietTest` which requires JVM access. ### Why are the changes needed? In order for `pyspark-connect` to work without classic PySpark packages and dependencies. Also, the logs are hidden as it's written to the separate file so it is actually already quiet. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Yes, at #45870. Once CI is setup there, it will be tested there properly. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45916 from HyukjinKwon/SPARK-47753. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
HyukjinKwon
added a commit
that referenced
this pull request
Apr 7, 2024
…k-connect ### What changes were proposed in this pull request? This PR proposes to make `pyspark.pandas` compatible with `pyspark-connect`. ### Why are the changes needed? In order for `pyspark-connect` to work without classic PySpark packages and dependencies. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Yes, at #45870. Once CI is setup there, it will be tested there properly. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45915 from HyukjinKwon/SPARK-47752. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
e7b9db5
to
0a2670b
Compare
HyukjinKwon
added a commit
that referenced
this pull request
Apr 8, 2024
…streaming_foreach_batch` ### What changes were proposed in this pull request? This PR proposes to drop the tables after tests finished. ### Why are the changes needed? - To clean up resources properly. - It can affect other test cases when only one session is being used across other tests. ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Tested in #45870 ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45920 from HyukjinKwon/minor-cleanup-table. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
d265907
to
0cd3548
Compare
d3924ee
to
9730bd0
Compare
HyukjinKwon
added a commit
that referenced
this pull request
Apr 8, 2024
…setup.py ### What changes were proposed in this pull request? This PR is a followup of #42563 (but using new JIRA as it's already released), which adds `pyspark.sql.connect.protobuf` into `setup.py`. ### Why are the changes needed? So PyPI packaged PySpark can support protobuf function with Spark Connect on. ### Does this PR introduce _any_ user-facing change? Yes. The new feature is now available with Spark Connect on if users install Spark Connect by `pip`. ### How was this patch tested? Being tested in #45870 ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45924 from HyukjinKwon/SPARK-47762. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
HyukjinKwon
added a commit
that referenced
this pull request
Apr 8, 2024
…setup.py This PR is a followup of #42563 (but using new JIRA as it's already released), which adds `pyspark.sql.connect.protobuf` into `setup.py`. So PyPI packaged PySpark can support protobuf function with Spark Connect on. Yes. The new feature is now available with Spark Connect on if users install Spark Connect by `pip`. Being tested in #45870 No. Closes #45924 from HyukjinKwon/SPARK-47762. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit f94d95d) Signed-off-by: Hyukjin Kwon <[email protected]>
d29ed39
to
15dcb5d
Compare
Test results: https://github.com/HyukjinKwon/spark/actions/runs/8598881063 Should be ready to go, cc @zhengruifeng @dongjoon-hyun @ueshin |
dongjoon-hyun
approved these changes
Apr 8, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Nice. Thank you, @HyukjinKwon .
Let me merge this to start from this. |
HyukjinKwon
added a commit
that referenced
this pull request
Apr 11, 2024
…pository ### What changes were proposed in this pull request? This is a followup of #45870 that skips the run in forked repository. ### Why are the changes needed? For consistency, and to save resources in forked repository by default. ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Should be tested in individual forked repository. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45992 from HyukjinKwon/SPARK-47725-followup. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
HyukjinKwon
added a commit
that referenced
this pull request
May 3, 2024
…5 client <> 4.0 server ### What changes were proposed in this pull request? This PR proposes to skip the tests that fail with 3.5 client and 4.0 server in Spark Connect (by adding `SPARK_SKIP_CONNECT_COMPAT_TESTS`). This is a base work for #46298. This partially backports #45870 This PR also adds `SPARK_CONNECT_TESTING_REMOTE` environment variable so developers can run PySpark unittests against a Spark Connect server. ### Why are the changes needed? In order to set up the CI that tests 3.5 client and 4.0 server in Spark Connect. ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Tested it in my fork, see #46298 ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46334 from HyukjinKwon/SPARK-48088. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR proposes to set up a scheduled job for
pyspark-connect
package. The CI:pyspark-connect
with test casespython/lib/pyspark.zip
andpython/lib/py4j.zip
to make sure we don't use JVMpyspark-connect
.Why are the changes needed?
In order to make sure on the feature coverage in
pyspark-connect
.Does this PR introduce any user-facing change?
No, test-only.
How was this patch tested?
Manually tested in my fork, https://github.com/HyukjinKwon/spark/actions/runs/8598881063
Was this patch authored or co-authored using generative AI tooling?
No.