-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrap scalar generation into spark session in integration test #9405
Merged
Merged
Changes from 2 commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
b9aadf1
add negative scale config before calling lit
thirtiseven c48f8f2
warp gen scalars with a spark session
thirtiseven ecdb851
change every use of scalar gen to wrap them into a spark session
thirtiseven 84712e8
clean up
thirtiseven 808bc25
Update integration_tests/README.md
thirtiseven 3c43343
address comments
thirtiseven 815f85c
move f.lit into spark session
thirtiseven File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting this in a cpu_session fixes the current problem, but it adds a new one. If
gen_scalars
is called from inside awith_*_session
it will have other problems.with_spark_session
callsreset_spark_session_conf
which does more than just reset the conf. It clears out the catalog too with no way to get the original config or catalog back after it exits. That means with_gpu_session -> gen_scalars will result in the query running on the CPU after the gen_scalars.I see a few ways to properly fix this.
spark.sql.legacy.allowNegativeScaleOfDecimal
when launching spark and have the test framework throw an exception if it is not set. Then we remove references to it in all of the tests for consistency. Then we file a follow on issue to fixwith_spark_session
to not allow nesting and to throw an exception if it is nested.with_spark_session
to not allow nesting and to throw an exception if it is nested.I personally prefer option 1 but I am fine with option 2 or 3. Talking to @jlowe he really prefers option 3. The main difference between option 3 and option 2 for me really about the amount of code that needs to change. If we just fix the one test and add some docs, that feels like a really small change. If we have to fix nesting/etc that feels a bit larger, but it is something we need to do either way and would mean all tests that use gen_scalar would be good to deal with all decimal values properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of 2. It's again surprising behavior (who would expect it to spawn a Spark session?). I'm fine with either 1 or 3, and even with 1, I still think we should fix the test(s). We should be putting all data generation inside a spark session context of some kind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated code to option 3.
Now I wrap all scalar generation with a
with_cpu_session
, no matter if it callsf.lit
or usesDecimalGen
. Not sure if we only want to move the cases that are possible to fail into Spark sessions.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow-on issue: #9412