-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Spark Connect Tests - CI & Test Suite Update #244
Add Spark Connect Tests - CI & Test Suite Update #244
Conversation
This commit introduces the following changes: * Updates the `ci.yml` file by introducing a new step under the `test` job to perform tests using Spark-Connect. * Creates a shell-script that downloads & installs Spark binaries and then runs the Spark-Connect server. * Creates a pytest module/file that tests a very simple function on Spark-Connect. * Updates the Makefile to add a new step for the Spark-Connect tests.
Amazing thank you! @SemyonSinchenko . Looks like they haven't locked down |
This dependencies should be optional for us (or be in the "provided" scope). Quinn itself should depends of python itself only. Answering the question: we are mostly OK with that except numpy. The release of 2.0 was after the release of spark and Im more than sure that spark won't work with 2.0; so, I would like to add |
Understood; thanks! This is an interesting PR/issue. I have been testing this spark-connect on the other test cases as well and have noted which ones might need to be worked on (for a later PR). Thanks for your guidance @SemyonSinchenko . |
As per the review comment, the recently added dependencies such as Pyarrow, Pandas etc., are optional and not required for Spark-Classic. Update the pyproject.toml to reflect that and lock the poetry file
Most likely fixes issue #247 as well. Please review and let me know. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted on the changes. Have responded to a few comments for clarifications.
* Remove the test_spark_connect.py file as it is not relevant anymore. * Update the Makefile to remove spark-connect test * Hardcode the hadoop version to 3 as 2 is EOL.
@SemyonSinchenko - final go I reckon 🤞🏽 I have made all the changes. Happy for those comments to be resolved and the PR to be merged IF you are happy with it and if the tests pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LFTM
@nijanthanvijayakumar Thank you for the work! |
cfcb720
into
mrpowers-io:planning-1.0-release
My pleasure. Thanks for the guidance and support. |
Proposed changes
This PR is regarding the open issue #241 to add a
Spark-Connect
test to the CI workflows and introduces the following changes:Updates to existing files
Updates to the
ci.yml
file:test
job to test a file using Spark-Connect server's SparkSession instance.SPARK_CONNECT_MODE_ENABLE
to switch between the Spark-Connect session and the Local Spark session.Updates the
spark.py
to create remote Spark-Connect instance:Updates the
pyproject.toml
:pandas = "^1.5.3", pyarrow = "16.1.0" numpy = "^1.21.0" grpcio = "^1.48.1" grpcio-status = "^1.64.1"
Updates the
Makefile
:test_spark_connect.py
from being PyTested using the regular tests.test_spark_connect.py
separately.New files:
Creates a new
test_spark_connect.py
file:Creates a new shell script for running Spark-Connect server:
Types of changes
Further comments
pandas, pyarrow
) and let me know if those versions are correct?run_spark_connect_server.sh
to check if the spark-connect server is running or not. I triednetstat
, which didn't work. I will work on that using a different PR.pr.yml
file should also be updated for this Spark-Connect test case?