You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[2024-05-12T14:16:03.380Z] _ test_write_round_trip_corner[native-SetValues(MapType(StringType(), StringType(), True),[{}, None, {'A': ''}, {'B': None}])] _
[2024-05-12T14:16:03.380Z] [gw2] linux -- Python 3.9.19 /opt/conda/bin/python
[2024-05-12T14:16:03.380Z]
[2024-05-12T14:16:03.380Z] spark_tmp_path = '/tmp/pyspark_tests//it-test-340-213-151-km8zv-r0k7f-gw2-1812-1083524667/'
[2024-05-12T14:16:03.380Z] orc_gen = SetValues(MapType(StringType(), StringType(), True),[{}, None, {'A': ''}, {'B': None}])
[2024-05-12T14:16:03.380Z] orc_impl = 'native'
[2024-05-12T14:16:03.380Z]
[2024-05-12T14:16:03.380Z] @pytest.mark.parametrize('orc_gen', orc_write_odd_empty_strings_gens_sample, ids=idfn)
[2024-05-12T14:16:03.381Z] @pytest.mark.parametrize('orc_impl', ["native", "hive"])
[2024-05-12T14:16:03.381Z] def test_write_round_trip_corner(spark_tmp_path, orc_gen, orc_impl):
[2024-05-12T14:16:03.381Z] gen_list = [('_c0', orc_gen)]
[2024-05-12T14:16:03.381Z] data_path = spark_tmp_path + '/ORC_DATA'
[2024-05-12T14:16:03.381Z] > assert_gpu_and_cpu_writes_are_equal_collect(
[2024-05-12T14:16:03.381Z] lambda spark, path: gen_df(spark, gen_list, 128000, num_slices=1).write.orc(path),
[2024-05-12T14:16:03.381Z] lambda spark, path: spark.read.orc(path),
[2024-05-12T14:16:03.381Z] data_path,
[2024-05-12T14:16:03.381Z] conf={'spark.sql.orc.impl': orc_impl, 'spark.rapids.sql.format.orc.write.enabled': True})
[2024-05-12T14:16:03.381Z]
[2024-05-12T14:16:03.381Z] ../../src/main/python/orc_write_test.py:99:
[2024-05-12T14:16:03.381Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[2024-05-12T14:16:03.381Z] ../../src/main/python/asserts.py:285: in assert_gpu_and_cpu_writes_are_equal_collect
[2024-05-12T14:16:03.381Z] _assert_gpu_and_cpu_writes_are_equal(write_func, read_func, base_path, 'COLLECT', conf=conf)
[2024-05-12T14:16:03.381Z] ../../src/main/python/asserts.py:272: in _assert_gpu_and_cpu_writes_are_equal
[2024-05-12T14:16:03.381Z] from_gpu = with_cpu_session(gpu_bring_back, conf=conf)
[2024-05-12T14:16:03.381Z] ../../src/main/python/spark_session.py:147: in with_cpu_session
[2024-05-12T14:16:03.381Z] return with_spark_session(func, conf=copy)
[2024-05-12T14:16:03.381Z] /opt/conda/lib/python3.9/contextlib.py:79: in inner
[2024-05-12T14:16:03.381Z] return func(*args, **kwds)
[2024-05-12T14:16:03.381Z] ../../src/main/python/spark_session.py:131: in with_spark_session
[2024-05-12T14:16:03.381Z] ret = func(_spark)
[2024-05-12T14:16:03.381Z] ../../src/main/python/asserts.py:205: in
[2024-05-12T14:16:03.381Z] bring_back = lambda spark: limit_func(spark).collect()
[2024-05-12T14:16:03.381Z] ../../../spark-3.4.0-bin-hadoop3-scala2.13/python/pyspark/sql/dataframe.py:1216: in collect
[2024-05-12T14:16:03.381Z] sock_info = self._jdf.collectToPython()
[2024-05-12T14:16:03.381Z] /home/jenkins/agent/workspace/jenkins-rapids_integration-scala213-dev-github-151-3.4.0/jars/spark-3.4.0-bin-hadoop3-scala2.13/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322: in __call__
[2024-05-12T14:16:03.381Z] return_value = get_return_value(
[2024-05-12T14:16:03.381Z] ../../../spark-3.4.0-bin-hadoop3-scala2.13/python/pyspark/errors/exceptions/captured.py:169: in deco
[2024-05-12T14:16:03.381Z] return f(*a, **kw)
[2024-05-12T14:16:03.381Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[2024-05-12T14:16:03.381Z]
[2024-05-12T14:16:03.381Z] answer = 'xro1529653'
[2024-05-12T14:16:03.381Z] gateway_client =
[2024-05-12T14:16:03.381Z] target_id = 'o1529652', name = 'collectToPython'
[2024-05-12T14:16:03.381Z]
[2024-05-12T14:16:03.381Z] def get_return_value(answer, gateway_client, target_id=None, name=None):
[2024-05-12T14:16:03.381Z] """Converts an answer received from the Java gateway into a Python object.
[2024-05-12T14:16:03.381Z]
[2024-05-12T14:16:03.381Z] For example, string representation of integers are converted to Python
[2024-05-12T14:16:03.381Z] integer, string representation of objects are converted to JavaObject
[2024-05-12T14:16:03.381Z] instances, etc.
[2024-05-12T14:16:03.381Z]
[2024-05-12T14:16:03.381Z] :param answer: the string returned by the Java gateway
[2024-05-12T14:16:03.381Z] :param gateway_client: the gateway client used to communicate with the Java
[2024-05-12T14:16:03.381Z] Gateway. Only necessary if the answer is a reference (e.g., object,
[2024-05-12T14:16:03.381Z] list, map)
[2024-05-12T14:16:03.381Z] :param target_id: the name of the object from which the answer comes from
[2024-05-12T14:16:03.381Z] (e.g., *object1* in `object1.hello()`). Optional.
[2024-05-12T14:16:03.381Z] :param name: the name of the member from which the answer comes from
[2024-05-12T14:16:03.381Z] (e.g., *hello* in `object1.hello()`). Optional.
[2024-05-12T14:16:03.381Z] """
[2024-05-12T14:16:03.381Z] if is_error(answer)[0]:
[2024-05-12T14:16:03.381Z] if len(answer) > 1:
[2024-05-12T14:16:03.381Z] type = answer[1]
[2024-05-12T14:16:03.381Z] value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
[2024-05-12T14:16:03.381Z] if answer[1] == REFERENCE_TYPE:
[2024-05-12T14:16:03.381Z] > raise Py4JJavaError(
[2024-05-12T14:16:03.381Z] "An error occurred while calling {0}{1}{2}.\n".
[2024-05-12T14:16:03.381Z] format(target_id, ".", name), value)
[2024-05-12T14:16:03.381Z] E py4j.protocol.Py4JJavaError: An error occurred while calling o1529652.collectToPython.
[2024-05-12T14:16:03.381Z] E : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 23550.0 failed 1 times, most recent failure: Lost task 0.0 in stage 23550.0 (TID 78185) (100.103.204.21 executor 0): java.io.IOException: Error reading file: file:/tmp/pyspark_tests/it-test-340-213-151-km8zv-r0k7f-gw2-1812-1083524667/ORC_DATA/GPU/part-00000-302369e3-93aa-4290-82a2-f949443489f1-c000.snappy.orc
[2024-05-12T14:16:03.381Z] E at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1450)
[2024-05-12T14:16:03.381Z] E at org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextBatch(OrcColumnarBatchReader.java:207)
[2024-05-12T14:16:03.381Z] E at org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextKeyValue(OrcColumnarBatchReader.java:100)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:125)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:594)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:888)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:888)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.scheduler.Task.run(Task.scala:139)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
[2024-05-12T14:16:03.382Z] E at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
[2024-05-12T14:16:03.382Z] E at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[2024-05-12T14:16:03.382Z] E at java.base/java.lang.Thread.run(Thread.java:840)
[2024-05-12T14:16:03.382Z] E Caused by: java.io.EOFException: Read past end of RLE integer from compressed stream column 3 kind LENGTH position: 341 length: 341 range: 0 offset: 341 limit: 341 range 0 = 83086 to 83427 uncompressed: 512 to 512
[2024-05-12T14:16:03.382Z] E at org.apache.orc.impl.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:60)
[2024-05-12T14:16:03.382Z] E at org.apache.orc.impl.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:329)
[2024-05-12T14:16:03.382Z] E at org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:379)
[2024-05-12T14:16:03.382Z] E at org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1984)
[2024-05-12T14:16:03.382Z] E at org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:2022)
[2024-05-12T14:16:03.382Z] E at org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:2120)
[2024-05-12T14:16:03.382Z] E at org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1963)
[2024-05-12T14:16:03.382Z] E at org.apache.orc.impl.TreeReaderFactory$MapTreeReader.nextVector(TreeReaderFactory.java:2888)
[2024-05-12T14:16:03.382Z] E at org.apache.orc.impl.reader.tree.StructBatchReader.readBatchColumn(StructBatchReader.java:65)
[2024-05-12T14:16:03.382Z] E at org.apache.orc.impl.reader.tree.StructBatchReader.nextBatchForLevel(StructBatchReader.java:100)
[2024-05-12T14:16:03.382Z] E at org.apache.orc.impl.reader.tree.StructBatchReader.nextBatch(StructBatchReader.java:77)
[2024-05-12T14:16:03.382Z] E at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1425)
[2024-05-12T14:16:03.382Z] E ... 24 more
[2024-05-12T14:16:03.382Z] E
[2024-05-12T14:16:03.382Z] E Driver stacktrace:
[2024-05-12T14:16:03.382Z] E at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2785)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2721)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2720)
[2024-05-12T14:16:03.382Z] E at scala.collection.immutable.List.foreach(List.scala:333)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2720)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1206)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1206)
[2024-05-12T14:16:03.382Z] E at scala.Option.foreach(Option.scala:437)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1206)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2984)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2923)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2912)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:971)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.SparkContext.runJob(SparkContext.scala:2263)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.SparkContext.runJob(SparkContext.scala:2284)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.SparkContext.runJob(SparkContext.scala:2303)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.SparkContext.runJob(SparkContext.scala:2328)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1019)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
[2024-05-12T14:16:03.382Z] E at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
[2024-05-12T14:16:03.383Z] E at org.apache.spark.rdd.RDD.withScope(RDD.scala:405)
[2024-05-12T14:16:03.383Z] E at org.apache.spark.rdd.RDD.collect(RDD.scala:1018)
[2024-05-12T14:16:03.383Z] E at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:448)
[2024-05-12T14:16:03.383Z] E at org.apache.spark.sql.Dataset.$anonfun$collectToPython$1(Dataset.scala:3997)
[2024-05-12T14:16:03.383Z] E at org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:4167)
[2024-05-12T14:16:03.383Z] E at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:526)
[2024-05-12T14:16:03.383Z] E at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:4165)
[2024-05-12T14:16:03.383Z] E at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:118)
[2024-05-12T14:16:03.383Z] E at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195)
[2024-05-12T14:16:03.383Z] E at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103)
[2024-05-12T14:16:03.383Z] E at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
[2024-05-12T14:16:03.383Z] E at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
[2024-05-12T14:16:03.383Z] E at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4165)
[2024-05-12T14:16:03.383Z] E at org.apache.spark.sql.Dataset.collectToPython(Dataset.scala:3994)
[2024-05-12T14:16:03.383Z] E at jdk.internal.reflect.GeneratedMethodAccessor96.invoke(Unknown Source)
[2024-05-12T14:16:03.383Z] E at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[2024-05-12T14:16:03.383Z] E at java.base/java.lang.reflect.Method.invoke(Method.java:568)
[2024-05-12T14:16:03.383Z] E at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
[2024-05-12T14:16:03.383Z] E at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
[2024-05-12T14:16:03.383Z] E at py4j.Gateway.invoke(Gateway.java:282)
[2024-05-12T14:16:03.383Z] E at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
[2024-05-12T14:16:03.383Z] E at py4j.commands.CallCommand.execute(CallCommand.java:79)
[2024-05-12T14:16:03.383Z] E at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
[2024-05-12T14:16:03.383Z] E at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
[2024-05-12T14:16:03.383Z] E at java.base/java.lang.Thread.run(Thread.java:840)
[2024-05-12T14:16:03.383Z] E Caused by: java.io.IOException: Error reading file: file:/tmp/pyspark_tests/it-test-340-213-151-km8zv-r0k7f-gw2-1812-1083524667/ORC_DATA/GPU/part-00000-302369e3-93aa-4290-82a2-f949443489f1-c000.snappy.orc
[2024-05-12T14:16:03.383Z] E at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1450)
[2024-05-12T14:16:03.383Z] E at org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextBatch(OrcColumnarBatchReader.java:207)
[2024-05-12T14:16:03.383Z] E at org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextKeyValue(OrcColumnarBatchReader.java:100)
[2024-05-12T14:16:03.383Z] E at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
[2024-05-12T14:16:03.383Z] E at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:125)
[2024-05-12T14:16:03.383Z] E at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:594)
[2024-05-12T14:16:03.383Z] E at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source)
[2024-05-12T14:16:03.383Z] E at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
[2024-05-12T14:16:03.383Z] E at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
[2024-05-12T14:16:03.383Z] E at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
[2024-05-12T14:16:03.383Z] E at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
[2024-05-12T14:16:03.383Z] E at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:888)
[2024-05-12T14:16:03.383Z] E at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:888)
[2024-05-12T14:16:03.383Z] E at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
[2024-05-12T14:16:03.383Z] E at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
[2024-05-12T14:16:03.383Z] E at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
[2024-05-12T14:16:03.383Z] E at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
[2024-05-12T14:16:03.383Z] E at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
[2024-05-12T14:16:03.383Z] E at org.apache.spark.scheduler.Task.run(Task.scala:139)
[2024-05-12T14:16:03.383Z] E at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
[2024-05-12T14:16:03.383Z] E at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
[2024-05-12T14:16:03.383Z] E at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
[2024-05-12T14:16:03.383Z] E at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
[2024-05-12T14:16:03.383Z] E at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[2024-05-12T14:16:03.383Z] E ... 1 more
[2024-05-12T14:16:03.383Z] E Caused by: java.io.EOFException: Read past end of RLE integer from compressed stream column 3 kind LENGTH position: 341 length: 341 range: 0 offset: 341 limit: 341 range 0 = 83086 to 83427 uncompressed: 512 to 512
[2024-05-12T14:16:03.383Z] E at org.apache.orc.impl.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:60)
[2024-05-12T14:16:03.383Z] E at org.apache.orc.impl.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:329)
[2024-05-12T14:16:03.383Z] E at org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:379)
[2024-05-12T14:16:03.383Z] E at org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1984)
[2024-05-12T14:16:03.383Z] E at org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:2022)
[2024-05-12T14:16:03.384Z] E at org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:2120)
[2024-05-12T14:16:03.384Z] E at org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1963)
[2024-05-12T14:16:03.384Z] E at org.apache.orc.impl.TreeReaderFactory$MapTreeReader.nextVector(TreeReaderFactory.java:2888)
[2024-05-12T14:16:03.384Z] E at org.apache.orc.impl.reader.tree.StructBatchReader.readBatchColumn(StructBatchReader.java:65)
[2024-05-12T14:16:03.384Z] E at org.apache.orc.impl.reader.tree.StructBatchReader.nextBatchForLevel(StructBatchReader.java:100)
[2024-05-12T14:16:03.384Z] E at org.apache.orc.impl.reader.tree.StructBatchReader.nextBatch(StructBatchReader.java:77)
[2024-05-12T14:16:03.384Z] E at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1425)
[2024-05-12T14:16:03.384Z] E ... 24 more
[2024-05-12T14:16:03.384Z]
[2024-05-12T14:16:03.384Z] /home/jenkins/agent/workspace/jenkins-rapids_integration-scala213-dev-github-151-3.4.0/jars/spark-3.4.0-bin-hadoop3-scala2.13/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py:326: Py4JJavaError
[2024-05-12T14:16:03.384Z] ----------------------------- Captured stdout call -----------------------------
[2024-05-12T14:16:03.384Z] ### CPU RUN ###
[2024-05-12T14:16:03.384Z] ### GPU RUN ###
[2024-05-12T14:16:03.384Z] ### WRITE: GPU TOOK 0.3831362724304199 CPU TOOK 0.46770763397216797 ###
[2024-05-12T14:16:03.384Z] _ test_write_round_trip_corner[hive-SetValues(MapType(StringType(), StringType(), True),[{}, None, {'A': ''}, {'B': None}])] _
[2024-05-12T14:16:03.384Z] [gw2] linux -- Python 3.9.19 /opt/conda/bin/python
[2024-05-12T14:16:03.384Z]
[2024-05-12T14:16:03.384Z] spark_tmp_path = '/tmp/pyspark_tests//it-test-340-213-151-km8zv-r0k7f-gw2-1812-294764446/'
[2024-05-12T14:16:03.384Z] orc_gen = SetValues(MapType(StringType(), StringType(), True),[{}, None, {'A': ''}, {'B': None}])
[2024-05-12T14:16:03.384Z] orc_impl = 'hive'
[2024-05-12T14:16:03.384Z]
[2024-05-12T14:16:03.384Z] @pytest.mark.parametrize('orc_gen', orc_write_odd_empty_strings_gens_sample, ids=idfn)
[2024-05-12T14:16:03.384Z] @pytest.mark.parametrize('orc_impl', ["native", "hive"])
[2024-05-12T14:16:03.384Z] def test_write_round_trip_corner(spark_tmp_path, orc_gen, orc_impl):
[2024-05-12T14:16:03.384Z] gen_list = [('_c0', orc_gen)]
[2024-05-12T14:16:03.384Z] data_path = spark_tmp_path + '/ORC_DATA'
[2024-05-12T14:16:03.384Z] > assert_gpu_and_cpu_writes_are_equal_collect(
[2024-05-12T14:16:03.384Z] lambda spark, path: gen_df(spark, gen_list, 128000, num_slices=1).write.orc(path),
[2024-05-12T14:16:03.384Z] lambda spark, path: spark.read.orc(path),
[2024-05-12T14:16:03.384Z] data_path,
[2024-05-12T14:16:03.384Z] conf={'spark.sql.orc.impl': orc_impl, 'spark.rapids.sql.format.orc.write.enabled': True})
[2024-05-12T14:16:03.384Z]
[2024-05-12T14:16:03.384Z] ../../src/main/python/orc_write_test.py:99:
[2024-05-12T14:16:03.384Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[2024-05-12T14:16:03.384Z] ../../src/main/python/asserts.py:285: in assert_gpu_and_cpu_writes_are_equal_collect
[2024-05-12T14:16:03.384Z] _assert_gpu_and_cpu_writes_are_equal(write_func, read_func, base_path, 'COLLECT', conf=conf)
[2024-05-12T14:16:03.384Z] ../../src/main/python/asserts.py:272: in _assert_gpu_and_cpu_writes_are_equal
[2024-05-12T14:16:03.384Z] from_gpu = with_cpu_session(gpu_bring_back, conf=conf)
[2024-05-12T14:16:03.384Z] ../../src/main/python/spark_session.py:147: in with_cpu_session
[2024-05-12T14:16:03.384Z] return with_spark_session(func, conf=copy)
[2024-05-12T14:16:03.384Z] /opt/conda/lib/python3.9/contextlib.py:79: in inner
[2024-05-12T14:16:03.384Z] return func(*args, **kwds)
[2024-05-12T14:16:03.384Z] ../../src/main/python/spark_session.py:131: in with_spark_session
[2024-05-12T14:16:03.384Z] ret = func(_spark)
[2024-05-12T14:16:03.384Z] ../../src/main/python/asserts.py:205: in
[2024-05-12T14:16:03.384Z] bring_back = lambda spark: limit_func(spark).collect()
[2024-05-12T14:16:03.384Z] ../../../spark-3.4.0-bin-hadoop3-scala2.13/python/pyspark/sql/dataframe.py:1216: in collect
[2024-05-12T14:16:03.384Z] sock_info = self._jdf.collectToPython()
[2024-05-12T14:16:03.384Z] /home/jenkins/agent/workspace/jenkins-rapids_integration-scala213-dev-github-151-3.4.0/jars/spark-3.4.0-bin-hadoop3-scala2.13/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322: in __call__
[2024-05-12T14:16:03.384Z] return_value = get_return_value(
[2024-05-12T14:16:03.384Z] ../../../spark-3.4.0-bin-hadoop3-scala2.13/python/pyspark/errors/exceptions/captured.py:169: in deco
[2024-05-12T14:16:03.384Z] return f(*a, **kw)
[2024-05-12T14:16:03.384Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[2024-05-12T14:16:03.384Z]
[2024-05-12T14:16:03.384Z] answer = 'xro1533235'
[2024-05-12T14:16:03.384Z] gateway_client =
[2024-05-12T14:16:03.384Z] target_id = 'o1533234', name = 'collectToPython'
[2024-05-12T14:16:03.384Z]
[2024-05-12T14:16:03.384Z] def get_return_value(answer, gateway_client, target_id=None, name=None):
[2024-05-12T14:16:03.384Z] """Converts an answer received from the Java gateway into a Python object.
[2024-05-12T14:16:03.384Z]
[2024-05-12T14:16:03.384Z] For example, string representation of integers are converted to Python
[2024-05-12T14:16:03.384Z] integer, string representation of objects are converted to JavaObject
[2024-05-12T14:16:03.384Z] instances, etc.
[2024-05-12T14:16:03.384Z]
[2024-05-12T14:16:03.384Z] :param answer: the string returned by the Java gateway
[2024-05-12T14:16:03.384Z] :param gateway_client: the gateway client used to communicate with the Java
[2024-05-12T14:16:03.384Z] Gateway. Only necessary if the answer is a reference (e.g., object,
[2024-05-12T14:16:03.384Z] list, map)
[2024-05-12T14:16:03.384Z] :param target_id: the name of the object from which the answer comes from
[2024-05-12T14:16:03.384Z] (e.g., *object1* in `object1.hello()`). Optional.
[2024-05-12T14:16:03.384Z] :param name: the name of the member from which the answer comes from
[2024-05-12T14:16:03.384Z] (e.g., *hello* in `object1.hello()`). Optional.
[2024-05-12T14:16:03.384Z] """
[2024-05-12T14:16:03.384Z] if is_error(answer)[0]:
[2024-05-12T14:16:03.384Z] if len(answer) > 1:
[2024-05-12T14:16:03.385Z] type = answer[1]
[2024-05-12T14:16:03.385Z] value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
[2024-05-12T14:16:03.385Z] if answer[1] == REFERENCE_TYPE:
[2024-05-12T14:16:03.385Z] > raise Py4JJavaError(
[2024-05-12T14:16:03.385Z] "An error occurred while calling {0}{1}{2}.\n".
[2024-05-12T14:16:03.385Z] format(target_id, ".", name), value)
[2024-05-12T14:16:03.385Z] E py4j.protocol.Py4JJavaError: An error occurred while calling o1533234.collectToPython.
[2024-05-12T14:16:03.385Z] E : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 23600.0 failed 1 times, most recent failure: Lost task 0.0 in stage 23600.0 (TID 78235) (100.103.204.21 executor 0): java.io.IOException: Error reading file: file:/tmp/pyspark_tests/it-test-340-213-151-km8zv-r0k7f-gw2-1812-294764446/ORC_DATA/GPU/part-00000-1075a7c6-a73f-47be-8317-a6aafaed34f6-c000.snappy.orc
[2024-05-12T14:16:03.385Z] E at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1450)
[2024-05-12T14:16:03.385Z] E at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
[2024-05-12T14:16:03.385Z] E at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:93)
[2024-05-12T14:16:03.385Z] E at org.apache.hadoop.hive.ql.io.orc.SparkOrcNewRecordReader.nextKeyValue(SparkOrcNewRecordReader.java:82)
[2024-05-12T14:16:03.385Z] E at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
[2024-05-12T14:16:03.385Z] E at org.apache.spark.sql.execution.datasources.RecordReaderIterator$$anon$1.hasNext(RecordReaderIterator.scala:61)
[2024-05-12T14:16:03.385Z] E at scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:576)
[2024-05-12T14:16:03.385Z] E at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:125)
[2024-05-12T14:16:03.385Z] E at scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:576)
[2024-05-12T14:16:03.385Z] E at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
[2024-05-12T14:16:03.385Z] E at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:888)
[2024-05-12T14:16:03.385Z] E at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:888)
[2024-05-12T14:16:03.385Z] E at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
[2024-05-12T14:16:03.385Z] E at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
[2024-05-12T14:16:03.385Z] E at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
[2024-05-12T14:16:03.385Z] E at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
[2024-05-12T14:16:03.385Z] E at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
[2024-05-12T14:16:03.385Z] E at org.apache.spark.scheduler.Task.run(Task.scala:139)
[2024-05-12T14:16:03.385Z] E at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
[2024-05-12T14:16:03.385Z] E at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
[2024-05-12T14:16:03.385Z] E at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
[2024-05-12T14:16:03.385Z] E at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
[2024-05-12T14:16:03.385Z] E at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[2024-05-12T14:16:03.385Z] E at java.base/java.lang.Thread.run(Thread.java:840)
[2024-05-12T14:16:03.385Z] E Caused by: java.io.EOFException: Read past end of RLE integer from compressed stream column 3 kind LENGTH position: 341 length: 341 range: 0 offset: 341 limit: 341 range 0 = 83086 to 83427 uncompressed: 512 to 512
[2024-05-12T14:16:03.385Z] E at org.apache.orc.impl.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:60)
[2024-05-12T14:16:03.385Z] E at org.apache.orc.impl.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:329)
[2024-05-12T14:16:03.385Z] E at org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:379)
[2024-05-12T14:16:03.385Z] E at org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1984)
[2024-05-12T14:16:03.385Z] E at org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:2022)
[2024-05-12T14:16:03.385Z] E at org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:2120)
[2024-05-12T14:16:03.385Z] E at org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1963)
[2024-05-12T14:16:03.385Z] E at org.apache.orc.impl.TreeReaderFactory$MapTreeReader.nextVector(TreeReaderFactory.java:2888)
[2024-05-12T14:16:03.385Z] E at org.apache.orc.impl.reader.tree.StructBatchReader.readBatchColumn(StructBatchReader.java:65)
[2024-05-12T14:16:03.385Z] E at org.apache.orc.impl.reader.tree.StructBatchReader.nextBatchForLevel(StructBatchReader.java:100)
[2024-05-12T14:16:03.385Z] E at org.apache.orc.impl.reader.tree.StructBatchReader.nextBatch(StructBatchReader.java:77)
[2024-05-12T14:16:03.385Z] E at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1425)
[2024-05-12T14:16:03.385Z] E ... 23 more
[2024-05-12T14:16:03.385Z] E
[2024-05-12T14:16:03.385Z] E Driver stacktrace:
[2024-05-12T14:16:03.385Z] E at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2785)
[2024-05-12T14:16:03.385Z] E at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2721)
[2024-05-12T14:16:03.385Z] E at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2720)
[2024-05-12T14:16:03.385Z] E at scala.collection.immutable.List.foreach(List.scala:333)
[2024-05-12T14:16:03.385Z] E at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2720)
[2024-05-12T14:16:03.385Z] E at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1206)
[2024-05-12T14:16:03.385Z] E at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1206)
[2024-05-12T14:16:03.385Z] E at scala.Option.foreach(Option.scala:437)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1206)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2984)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2923)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2912)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:971)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.SparkContext.runJob(SparkContext.scala:2263)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.SparkContext.runJob(SparkContext.scala:2284)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.SparkContext.runJob(SparkContext.scala:2303)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.SparkContext.runJob(SparkContext.scala:2328)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1019)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.rdd.RDD.withScope(RDD.scala:405)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.rdd.RDD.collect(RDD.scala:1018)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:448)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.sql.Dataset.$anonfun$collectToPython$1(Dataset.scala:3997)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:4167)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:526)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:4165)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:118)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4165)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.sql.Dataset.collectToPython(Dataset.scala:3994)
[2024-05-12T14:16:03.386Z] E at jdk.internal.reflect.GeneratedMethodAccessor96.invoke(Unknown Source)
[2024-05-12T14:16:03.386Z] E at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[2024-05-12T14:16:03.386Z] E at java.base/java.lang.reflect.Method.invoke(Method.java:568)
[2024-05-12T14:16:03.386Z] E at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
[2024-05-12T14:16:03.386Z] E at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
[2024-05-12T14:16:03.386Z] E at py4j.Gateway.invoke(Gateway.java:282)
[2024-05-12T14:16:03.386Z] E at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
[2024-05-12T14:16:03.386Z] E at py4j.commands.CallCommand.execute(CallCommand.java:79)
[2024-05-12T14:16:03.386Z] E at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
[2024-05-12T14:16:03.386Z] E at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
[2024-05-12T14:16:03.386Z] E at java.base/java.lang.Thread.run(Thread.java:840)
[2024-05-12T14:16:03.386Z] E Caused by: java.io.IOException: Error reading file: file:/tmp/pyspark_tests/it-test-340-213-151-km8zv-r0k7f-gw2-1812-294764446/ORC_DATA/GPU/part-00000-1075a7c6-a73f-47be-8317-a6aafaed34f6-c000.snappy.orc
[2024-05-12T14:16:03.386Z] E at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1450)
[2024-05-12T14:16:03.386Z] E at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
[2024-05-12T14:16:03.386Z] E at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:93)
[2024-05-12T14:16:03.386Z] E at org.apache.hadoop.hive.ql.io.orc.SparkOrcNewRecordReader.nextKeyValue(SparkOrcNewRecordReader.java:82)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.sql.execution.datasources.RecordReaderIterator$$anon$1.hasNext(RecordReaderIterator.scala:61)
[2024-05-12T14:16:03.386Z] E at scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:576)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:125)
[2024-05-12T14:16:03.386Z] E at scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:576)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:888)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:888)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.scheduler.Task.run(Task.scala:139)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
[2024-05-12T14:16:03.386Z] E at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
[2024-05-12T14:16:03.386Z] E at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
[2024-05-12T14:16:03.387Z] E at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[2024-05-12T14:16:03.387Z] E ... 1 more
[2024-05-12T14:16:03.387Z] E Caused by: java.io.EOFException: Read past end of RLE integer from compressed stream column 3 kind LENGTH position: 341 length: 341 range: 0 offset: 341 limit: 341 range 0 = 83086 to 83427 uncompressed: 512 to 512
[2024-05-12T14:16:03.387Z] E at org.apache.orc.impl.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:60)
[2024-05-12T14:16:03.387Z] E at org.apache.orc.impl.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:329)
[2024-05-12T14:16:03.387Z] E at org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:379)
[2024-05-12T14:16:03.387Z] E at org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1984)
[2024-05-12T14:16:03.387Z] E at org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:2022)
[2024-05-12T14:16:03.387Z] E at org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:2120)
[2024-05-12T14:16:03.387Z] E at org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1963)
[2024-05-12T14:16:03.387Z] E at org.apache.orc.impl.TreeReaderFactory$MapTreeReader.nextVector(TreeReaderFactory.java:2888)
[2024-05-12T14:16:03.387Z] E at org.apache.orc.impl.reader.tree.StructBatchReader.readBatchColumn(StructBatchReader.java:65)
[2024-05-12T14:16:03.387Z] E at org.apache.orc.impl.reader.tree.StructBatchReader.nextBatchForLevel(StructBatchReader.java:100)
[2024-05-12T14:16:03.387Z] E at org.apache.orc.impl.reader.tree.StructBatchReader.nextBatch(StructBatchReader.java:77)
[2024-05-12T14:16:03.387Z] E at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1425)
[2024-05-12T14:16:03.387Z] E ... 23 more
[2024-05-12T14:16:03.387Z]
[2024-05-12T14:16:03.387Z] /home/jenkins/agent/workspace/jenkins-rapids_integration-scala213-dev-github-151-3.4.0/jars/spark-3.4.0-bin-hadoop3-scala2.13/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py:326: Py4JJavaError
Steps/Code to reproduce bug
Failed in rapids_integration-scala213-dev-github
Expected behavior
Test cases pass
Environment details (please complete the following information)
Additional context
The text was updated successfully, but these errors were encountered:
sameerz
changed the title
[BUG] orc_write_test.py::test_write_round_trip_corner failed
[BUG] orc_write_test.py::test_write_round_trip_corner failed with DATAGEN_SEED=1715517863
May 14, 2024
Describe the bug
Two cases of orc_write_test.py::test_write_round_trip_corner failed:
test_write_round_trip_corner[native-SetValues(MapType(StringType(), StringType(), True),[{}, None, {'A': ''}, {'B': None}])]
test_write_round_trip_corner[hive-SetValues(MapType(StringType(), StringType(), True),[{}, None, {'A': ''}, {'B': None}])]
Details
Steps/Code to reproduce bug
Failed in rapids_integration-scala213-dev-github
Expected behavior
Test cases pass
Environment details (please complete the following information)
Additional context
The text was updated successfully, but these errors were encountered: