-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create non-shim specific version of ParquetCachedBatchSerializer #3473
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Raza Jafri <[email protected]>
Signed-off-by: Raza Jafri <[email protected]>
Signed-off-by: Raza Jafri <[email protected]>
This reverts commit cafaa08. Signed-off-by: Raza Jafri <[email protected]>
Signed-off-by: Raza Jafri <[email protected]>
Signed-off-by: Thomas Graves <[email protected]>
batch serializer Signed-off-by: Thomas Graves <[email protected]>
build |
tgravescs
commented
Sep 14, 2021
* @param conf the configuration for the job. | ||
* @return an RDD of the input cached batches transformed into the ColumnarBatch format. | ||
*/ | ||
override def gpuConvertCachedBatchToColumnarBatch( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the only function not in the Spark CachedBatchSerializer
build |
build |
revans2
reviewed
Sep 14, 2021
sql-plugin/src/main/311+-all/scala/com/nvidia/spark/ParquetCachedBatchSerializer.scala
Show resolved
Hide resolved
revans2
approved these changes
Sep 14, 2021
gerashegalov
added a commit
that referenced
this pull request
Nov 12, 2021
- Upgrade to Scala 2.12.15 - Add `-Xfatal-warnings` to scalac params - Add `nowarn` annotations to existing warnings Closes #3473
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This adds upon #3390 (credit to @razajafri as well) to create a common user facing generic class for the ParquetCachedBatchSerializer that then loads the shim specific version of it as necessary.
fixes #3314
User just has to set:
--conf spark.sql.cache.serializer=com.nvidia.spark.ParquetCachedBatchSerializer
And that should handle loading the shim. Right now there is only one version underneath the 311+-all directory.
This includes the spark 3.1.1 version of com.nvidia.spark.ParquetCachedBatchSerializer into the base of the jar to be loaded after that the shim version load will be the one under the spark specific directory. spark312/, spark311, etc..
I did not add the function to ShimLoader (like newDriverPlugin) because the code is in the spark311+-all directory since the CachedBatchSerializer isn't available until spark 3.1.0
Fixed the databricks 8.2 build and ran tests.
Ran tests on spark 3.1.1, 3.1.2, 3.2.0 and databricks 8.2
I was not able to validate the test script updates.