-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Spark 2.x Explain Api #4529
Add support for Spark 2.x Explain Api #4529
Conversation
…at would have run on GPU Signed-off-by: Thomas Graves <[email protected]>
…nto explainonlymode
Signed-off-by: Thomas Graves <[email protected]>
This reverts commit 1e5b1ab.
Upmerged to the latest again to pull the 3.x explain changes, also picked up another config change. This is ready and nothing else to update at the moment. |
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm through almost all the diffs but skipped the huge GpuOverrides changes for now, posting what I have so far. Wondering if this would be less delta change from the baseline if we had the spark2 jar pull in the few cudf classes it needs or even encapsulate/require the cudf jar. We already do not require CUDA or a GPU on the driver, so even if we pull in cudf classes during planning it shouldn't try to load native code or find CUDA or a GPU during planning. That might simplify quite a bit since we wouldn't have to hack out anything that might touch a cudf class (even one as benign as DType
).
sed -n '/class GpuBroadcastNestedLoopJoinMeta/,/override def convertToGpu/{/override def convertToGpu/!p}' ../spark2-sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuBroadcastNestedLoopJoinMeta.scala > $tmp_dir/GpuBroadcastNestedLoopJoinMeta_new.out | ||
sed -n '/class GpuBroadcastNestedLoopJoinMeta/,/override def convertToGpu/{/override def convertToGpu/!p}' ../sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuBroadcastNestedLoopJoinExec.scala > $tmp_dir/GpuBroadcastNestedLoopJoinMeta_old.out | ||
diff $tmp_dir/GpuBroadcastNestedLoopJoinMeta_new.out $tmp_dir/GpuBroadcastNestedLoopJoinMeta_old.out > $tmp_dir/GpuBroadcastNestedLoopJoinMeta.newdiff | ||
diff -c spark2diffs/GpuBroadcastNestedLoopJoinMeta.diff $tmp_dir/GpuBroadcastNestedLoopJoinMeta.newdiff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Seems like we could factor out this code into a function parameterized by the class name, filename, and the sed pattern, and then we can simply build an an array or string of intputs to for loop which would probably reduce the boilerplate and thus file size quite a bit. Would also be nice to have the list ordered in some manner so it's easy to scan for a particular file/class. Not must-fix, IMHO.
spark2-sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuCastMeta.scala
Outdated
Show resolved
Hide resolved
spark2-sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuCastMeta.scala
Outdated
Show resolved
Hide resolved
spark2-sql-plugin/src/main/scala/com/nvidia/spark/rapids/DateUtils.scala
Outdated
Show resolved
Hide resolved
spark2-sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuBroadcastJoinMeta.scala
Outdated
Show resolved
Hide resolved
Co-authored-by: Jason Lowe <[email protected]>
commented out code and some uneeded AQE checks
…spark-rapids into spark2shim-sep-module-upmerge
Pulling in just DType might reduce the diffs quite a bit. Now that I know the diffs I have some other ideas for integrating it with the sql-plugin code for next release. |
build |
extra imports, don't build tests jar if empty
build |
build |
build |
build |
<groupId>org.apache.maven.plugins</groupId> | ||
<artifactId>maven-jar-plugin</artifactId> | ||
<configuration> | ||
<classifier>${spark.version.classifier}</classifier> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want/need the classifier on this jar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, at least my intention was to leave it to be able to clearly see this was for spark 2.4. Also the plan was if we needed to support 2.3 for this release it would just be a separate jar rather then doing an entire parallel world like setup again. if you have other thoughts let me know
contributes to #4360
This PR adds a new module that is a copy of a bunch of the code from sql-plugin but only keeps the tagging functionality. So all convert functions and dependencies on cudf were removed. This is to allow running with a single jar (rapids-4-spark-sql-meta_2.11-22.02.0-SNAPSHOT-spark24.jar) against Spark 2.x and call the explain api val output=com.nvidia.spark.rapids.ExplainPlan.explainPotentialGpuPlan(mydf)
to get an idea of how the users application may run on the GPU. It also gives us an idea if the plugin doesn't support some functions the users application is using.
I unfortunately was unable to find a good way to split the main sql-plugin code into tagging and convert before 22.02 so that is why this is a separate module with a lot of copied code. In the next release we will look at commonizing the code.
Note I named the jar rapids-4-spark-sql-meta_2.11-22.02.0-SNAPSHOT-spark24.jar thinking if we are able to commonize the code perhaps the rapids-4-spark-sql-meta name would apply there. If people have other ideas please let me know.
As of testing this one jar worked for all the Spark 2.4.x versions I have tried. It even works against CDH which has a 2.4.X jar version. I only tested one version of CDH though as well. Note that management said we only have to support Spark 2.4 and newer.
Also note that Spark 2.4 by default builds with scala 2.11 so that is what I use here. They did optionally support 2.12 but I haven't supported there here. scala 2.11 doesn't support -Ywarn-unused:imports so I had to copy some build code into the module pom file.
Since its completely separate code it won't break if someone changes the sql-plugin in an incompatible way. so I don't have it build by default. It can be built in 2 ways:
I tried to make comments about what 2.x supported in various places to make diffing the code easier. Its also possible I missed some unused imports as intellij doesn't seem to like to import that module properly.
All testing is manual right now. Testing involved running all nds queries with Spark 2.4.X and getting explain output and comparing it to the explain output run on Spark 3.x. Very little differences were there, the diffs were all because Spark 2.x doesn't support the exact same functions the same way in catalyst. I also manually hacked the integration tests to run the explain to look for it causing any exceptions. This only kind of worked. Lots of tests failed due to it relying on features not there in 2.x. I manually went through a bunch of the output, but want to see if I can make another pass and get rid of things not supported.
Some things Spark 2 doesn't support or are different (I'm sure there are things I'm missing):
once this is merged people will have to update the meta information in 2 places until we can get it commonized. I'm sure I will have to make another pass to get things up to date after this PR goes in.
If all this goes in I will have to modify deploy scripts to release the jar.
I added in a script that attempts to diff all the classes and functions added here. Its in the scripts directory. It also has a directory scripts/spark2diffs that have a bunch of diff files used by that script. I did use the script to upmerge to the latest and it worked in the sense it found a diff, which I manually resolved. I may try to run the script in the premerge but can do taht in a separate PR.