Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable testing parquet with zstd for spark releases 3.2.0 and later #5898

Merged
merged 1 commit into from
Jun 24, 2022

Conversation

jbrennan333
Copy link
Contributor

Signed-off-by: Jim Brennan [email protected]

Closes #5580

Spark releases starting with 3.2.0 include support for zstd compression without requiring any additional jars/libs. Now that we have zstd decompression support in cuDF, we should add zstd to the list of compressors to use in test_parquet_compress_read_round_trip.

@jbrennan333
Copy link
Contributor Author

I have verified this with spark-3.1.3, spark-3.2.0, and spark-3.2.1.

@jbrennan333
Copy link
Contributor Author

build

@jlowe jlowe added this to the Jun 20 - Jul 8 milestone Jun 23, 2022
@jlowe jlowe added the test Only impacts tests label Jun 23, 2022
@jbrennan333
Copy link
Contributor Author

I'm not sure what failed here. @tgravescs should I rerun tests?

@jlowe
Copy link
Contributor

jlowe commented Jun 24, 2022

It failed in the Scala unit tests:

22/06/23 20:27:33.626 dispatcher-event-loop-1 INFO ExecutorPluginContainer: Exception while shutting down plugin com.nvidia.spark.SQLPlugin.
ai.rapids.cudf.RmmException: Could not shut down RMM there appear to be outstanding allocations
	at ai.rapids.cudf.Rmm.shutdown(Rmm.java:219)
	at ai.rapids.cudf.Rmm.shutdown(Rmm.java:179)
	at com.nvidia.spark.rapids.GpuDeviceManager$.shutdown(GpuDeviceManager.scala:146)
	at com.nvidia.spark.rapids.RapidsExecutorPlugin.shutdown(Plugin.scala:285)
	at org.apache.spark.internal.plugin.ExecutorPluginContainer.$anonfun$shutdown$4(PluginContainer.scala:144)
	at org.apache.spark.internal.plugin.ExecutorPluginContainer.$anonfun$shutdown$4$adapted(PluginContainer.scala:141)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at org.apache.spark.internal.plugin.ExecutorPluginContainer.shutdown(PluginContainer.scala:141)
	at org.apache.spark.executor.Executor.$anonfun$stop$4(Executor.scala:332)
	at org.apache.spark.executor.Executor.$anonfun$stop$4$adapted(Executor.scala:332)
	at scala.Option.foreach(Option.scala:407)
	at org.apache.spark.executor.Executor.$anonfun$stop$3(Executor.scala:332)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:222)
	at org.apache.spark.executor.Executor.stop(Executor.scala:332)
	at org.apache.spark.scheduler.local.LocalEndpoint$$anonfun$receiveAndReply$1.applyOrElse(LocalSchedulerBackend.scala:83)
	at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103)
	at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
	at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
	at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
	at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

[...]

22/06/23 20:27:33.710 ScalaTest-main-running-GpuDeviceManagerSuite INFO RapidsExecutorPlugin: RAPIDS Accelerator build: {version=22.08.0-SNAPSHOT, user=, url=https://github.com/NVIDIA/spark-rapids.git, date=2022-06-23T19:54:14Z, revision=64fbd7de41d37016e9a7014732be3d97b7bbeecf, cudf_version=22.08.0-SNAPSHOT, branch=HEAD}
22/06/23 20:27:33.710 ScalaTest-main-running-GpuDeviceManagerSuite INFO RapidsExecutorPlugin: cudf build: {version=22.08.0-SNAPSHOT, user=, url=https://github.com/rapidsai/cudf.git, date=2022-06-23T02:28:56Z, revision=31ad35c583fad22d6b976af7e1990df50efd7bc7, branch=HEAD}
22/06/23 20:27:33.711 ScalaTest-main-running-GpuDeviceManagerSuite INFO RapidsExecutorPlugin: Initializing memory from Executor Plugin
22/06/23 20:27:33.711 ScalaTest-main-running-GpuDeviceManagerSuite ERROR RapidsExecutorPlugin: Exception in the executor plugin, shutting down!
java.lang.IllegalStateException: Cannot initialize memory due to previous shutdown failing
	at com.nvidia.spark.rapids.GpuDeviceManager$.initializeMemory(GpuDeviceManager.scala:327)
	at com.nvidia.spark.rapids.GpuDeviceManager$.initializeGpuAndMemory(GpuDeviceManager.scala:137)
	at com.nvidia.spark.rapids.RapidsExecutorPlugin.init(Plugin.scala:232)
	at org.apache.spark.internal.plugin.ExecutorPluginContainer.$anonfun$executorPlugins$1(PluginContainer.scala:125)
	at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
	at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)
	at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
	at org.apache.spark.internal.plugin.ExecutorPluginContainer.<init>(PluginContainer.scala:113)
	at org.apache.spark.internal.plugin.PluginContainer$.apply(PluginContainer.scala:211)
	at org.apache.spark.internal.plugin.PluginContainer$.apply(PluginContainer.scala:199)
	at org.apache.spark.executor.Executor.$anonfun$plugins$1(Executor.scala:253)
	at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:222)
	at org.apache.spark.executor.Executor.<init>(Executor.scala:253)
	at org.apache.spark.scheduler.local.LocalEndpoint.<init>(LocalSchedulerBackend.scala:64)
	at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:132)
	at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:220)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:579)
	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2678)
	at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:942)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:936)
	at com.nvidia.spark.rapids.TestUtils$.withGpuSparkSession(TestUtils.scala:126)
	at com.nvidia.spark.rapids.GpuDeviceManagerSuite.$anonfun$new$3(GpuDeviceManagerSuite.scala:50)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
	at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
	at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
	at org.scalatest.Transformer.apply(Transformer.scala:22)
	at org.scalatest.Transformer.apply(Transformer.scala:20)
	at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
	at org.scalatest.TestSuite.withFixture(TestSuite.scala:196)
	at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195)
	at org.scalatest.FunSuite.withFixture(FunSuite.scala:1560)
	at org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
	at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
	at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
	at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
	at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
	at com.nvidia.spark.rapids.GpuDeviceManagerSuite.org$scalatest$BeforeAndAfter$$super$runTest(GpuDeviceManagerSuite.scala:26)
	at org.scalatest.BeforeAndAfter.runTest(BeforeAndAfter.scala:203)
	at org.scalatest.BeforeAndAfter.runTest$(BeforeAndAfter.scala:192)
	at com.nvidia.spark.rapids.GpuDeviceManagerSuite.runTest(GpuDeviceManagerSuite.scala:26)
	at org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229)
	at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:396)
	at scala.collection.immutable.List.foreach(List.scala:431)
	at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384)
	at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:379)
	at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:461)
	at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229)
	at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228)
	at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
	at org.scalatest.Suite.run(Suite.scala:1147)
	at org.scalatest.Suite.run$(Suite.scala:1129)
	at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
	at org.scalatest.FunSuiteLike.$anonfun$run$1(FunSuiteLike.scala:233)
	at org.scalatest.SuperEngine.runImpl(Engine.scala:521)
	at org.scalatest.FunSuiteLike.run(FunSuiteLike.scala:233)
	at org.scalatest.FunSuiteLike.run$(FunSuiteLike.scala:232)
	at com.nvidia.spark.rapids.GpuDeviceManagerSuite.org$scalatest$BeforeAndAfter$$super$run(GpuDeviceManagerSuite.scala:26)
	at org.scalatest.BeforeAndAfter.run(BeforeAndAfter.scala:258)
	at org.scalatest.BeforeAndAfter.run$(BeforeAndAfter.scala:256)
	at com.nvidia.spark.rapids.GpuDeviceManagerSuite.run(GpuDeviceManagerSuite.scala:26)
	at org.scalatest.Suite.callExecuteOnSuite$1(Suite.scala:1210)
	at org.scalatest.Suite.$anonfun$runNestedSuites$1(Suite.scala:1257)
	at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
	at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
	at org.scalatest.Suite.runNestedSuites(Suite.scala:1255)
	at org.scalatest.Suite.runNestedSuites$(Suite.scala:1189)
	at org.scalatest.tools.DiscoverySuite.runNestedSuites(DiscoverySuite.scala:30)
	at org.scalatest.Suite.run(Suite.scala:1144)
	at org.scalatest.Suite.run$(Suite.scala:1129)
	at org.scalatest.tools.DiscoverySuite.run(DiscoverySuite.scala:30)
	at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:45)
	at org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13(Runner.scala:1346)
	at org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13$adapted(Runner.scala:1340)
	at scala.collection.immutable.List.foreach(List.scala:431)
	at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:1340)
	at org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24(Runner.scala:1031)
	at org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24$adapted(Runner.scala:1010)
	at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:1506)
	at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1010)
	at org.scalatest.tools.Runner$.main(Runner.scala:827)
	at org.scalatest.tools.Runner.main(Runner.scala)

Looks like there's a unit test that may have a leak in it somewhere. Does not appear to be related to this PR, since it doesn't modify the unit tests.

@jlowe
Copy link
Contributor

jlowe commented Jun 24, 2022

build

@tgravescs tgravescs changed the title Enable testing zstd for spark releases 3.2.0 and later Enable testing parquet with zstd for spark releases 3.2.0 and later Jun 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Only impacts tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Enable zstd integration tests for parquet and orc
3 participants