[BUG] test_parquet_read_merge_schema failed w/ TITAN V #5493

pxLi · 2022-05-16T00:57:44Z

Describe the bug
blossom rapids_it-ubuntu16-dev-github build ID 14. This pipeline use titan_v which has relatively less gpu memory as T4/V100

mostly Caused by: java.lang.AssertionError: End address is too high for setBytes 0x7fcfe5a57628 < 0x7fcfe5a57624

[2022-05-15T15:44:49.997Z] FAILED ../../src/main/python/parquet_test.py::test_parquet_read_merge_schema[-reader_confs3]
[2022-05-15T15:44:49.997Z] FAILED ../../src/main/python/parquet_test.py::test_parquet_read_merge_schema[-reader_confs4]
[2022-05-15T15:44:49.997Z] FAILED ../../src/main/python/parquet_test.py::test_parquet_read_merge_schema[parquet-reader_confs3]
[2022-05-15T15:44:49.997Z] FAILED ../../src/main/python/parquet_test.py::test_parquet_read_merge_schema[parquet-reader_confs4]
[2022-05-15T15:44:49.997Z] FAILED ../../src/main/python/parquet_test.py::test_parquet_read_merge_schema_from_conf[-reader_confs3]
[2022-05-15T15:44:49.997Z] FAILED ../../src/main/python/parquet_test.py::test_parquet_read_merge_schema_from_conf[-reader_confs4]
[2022-05-15T15:44:49.997Z] FAILED ../../src/main/python/parquet_test.py::test_parquet_read_merge_schema_from_conf[parquet-reader_confs3]
[2022-05-15T15:44:49.997Z] FAILED ../../src/main/python/parquet_test.py::test_parquet_read_merge_schema_from_conf[parquet-reader_confs4]

[2022-05-15T15:44:49.702Z] =================================== FAILURES ===================================
[2022-05-15T15:44:49.702Z] �[31m�[1m________________ test_parquet_read_merge_schema[-reader_confs3] ________________�[0m
[2022-05-15T15:44:49.702Z] 
[2022-05-15T15:44:49.702Z] spark_tmp_path = '/tmp/pyspark_tests//it-ub16-302-4-zgm59-x6qfb-main-1147-1482896633/'
[2022-05-15T15:44:49.702Z] v1_enabled_list = ''
[2022-05-15T15:44:49.702Z] reader_confs = {'spark.rapids.sql.format.parquet.reader.footer.type': 'NATIVE', 'spark.rapids.sql.format.parquet.reader.type': 'PERFILE'}
[2022-05-15T15:44:49.702Z] 
[2022-05-15T15:44:49.702Z]     @pytest.mark.parametrize('reader_confs', reader_opt_confs)
[2022-05-15T15:44:49.702Z]     @pytest.mark.parametrize('v1_enabled_list', ["", "parquet"])
[2022-05-15T15:44:49.702Z]     def test_parquet_read_merge_schema(spark_tmp_path, v1_enabled_list, reader_confs):
[2022-05-15T15:44:49.702Z]         # Once https://github.com/NVIDIA/spark-rapids/issues/133 and https://github.com/NVIDIA/spark-rapids/issues/132 are fixed
[2022-05-15T15:44:49.702Z]         # we should go with a more standard set of generators
[2022-05-15T15:44:49.702Z]         parquet_gens = [byte_gen, short_gen, int_gen, long_gen, float_gen, double_gen,
[2022-05-15T15:44:49.702Z]         string_gen, boolean_gen, DateGen(start=date(1590, 1, 1)),
[2022-05-15T15:44:49.702Z]         TimestampGen(start=datetime(1900, 1, 1, tzinfo=timezone.utc))] + decimal_gens
[2022-05-15T15:44:49.702Z]         first_gen_list = [('_c' + str(i), gen) for i, gen in enumerate(parquet_gens)]
[2022-05-15T15:44:49.702Z]         first_data_path = spark_tmp_path + '/PARQUET_DATA/key=0'
[2022-05-15T15:44:49.702Z]         with_cpu_session(
[2022-05-15T15:44:49.702Z]                 lambda spark : gen_df(spark, first_gen_list).write.parquet(first_data_path),
[2022-05-15T15:44:49.702Z]                 conf=rebase_write_legacy_conf)
[2022-05-15T15:44:49.702Z]         second_gen_list = [(('_c' if i % 2 == 0 else '_b') + str(i), gen) for i, gen in enumerate(parquet_gens)]
[2022-05-15T15:44:49.702Z]         second_data_path = spark_tmp_path + '/PARQUET_DATA/key=1'
[2022-05-15T15:44:49.702Z]         with_cpu_session(
[2022-05-15T15:44:49.702Z]                 lambda spark : gen_df(spark, second_gen_list).write.parquet(second_data_path),
[2022-05-15T15:44:49.702Z]                 conf=rebase_write_corrected_conf)
[2022-05-15T15:44:49.703Z]         data_path = spark_tmp_path + '/PARQUET_DATA'
[2022-05-15T15:44:49.703Z]         all_confs = copy_and_update(reader_confs, {'spark.sql.sources.useV1SourceList': v1_enabled_list})
[2022-05-15T15:44:49.703Z] >       assert_gpu_and_cpu_are_equal_collect(
[2022-05-15T15:44:49.703Z]                 lambda spark : spark.read.option('mergeSchema', 'true').parquet(data_path),
[2022-05-15T15:44:49.703Z]                 conf=all_confs)
[2022-05-15T15:44:49.703Z] 
[2022-05-15T15:44:49.703Z] �[1m�[31m../../src/main/python/parquet_test.py�[0m:364: 
[2022-05-15T15:44:49.703Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2022-05-15T15:44:49.703Z] �[1m�[31m../../src/main/python/asserts.py�[0m:508: in assert_gpu_and_cpu_are_equal_collect
[2022-05-15T15:44:49.703Z]     _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first)
[2022-05-15T15:44:49.703Z] �[1m�[31m../../src/main/python/asserts.py�[0m:428: in _assert_gpu_and_cpu_are_equal
[2022-05-15T15:44:49.703Z]     run_on_gpu()
[2022-05-15T15:44:49.703Z] �[1m�[31m../../src/main/python/asserts.py�[0m:422: in run_on_gpu
[2022-05-15T15:44:49.703Z]     from_gpu = with_gpu_session(bring_back, conf=conf)
[2022-05-15T15:44:49.703Z] �[1m�[31m../../src/main/python/spark_session.py�[0m:131: in with_gpu_session
[2022-05-15T15:44:49.703Z]     return with_spark_session(func, conf=copy)
[2022-05-15T15:44:49.703Z] �[1m�[31m../../src/main/python/spark_session.py�[0m:98: in with_spark_session
[2022-05-15T15:44:49.703Z]     ret = func(_spark)
[2022-05-15T15:44:49.703Z] �[1m�[31m../../src/main/python/asserts.py�[0m:201: in <lambda>
[2022-05-15T15:44:49.703Z]     bring_back = lambda spark: limit_func(spark).collect()
[2022-05-15T15:44:49.703Z] �[1m�[31m/home/jenkins/agent/workspace/jenkins-rapids_it-ubuntu16-dev-github-4/jars/spark-3.1.2-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/dataframe.py�[0m:677: in collect
[2022-05-15T15:44:49.703Z]     sock_info = self._jdf.collectToPython()
[2022-05-15T15:44:49.703Z] �[1m�[31m/home/jenkins/agent/workspace/jenkins-rapids_it-ubuntu16-dev-github-4/jars/spark-3.1.2-bin-hadoop3.2/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py�[0m:1304: in __call__
[2022-05-15T15:44:49.703Z]     return_value = get_return_value(
[2022-05-15T15:44:49.703Z] �[1m�[31m/home/jenkins/agent/workspace/jenkins-rapids_it-ubuntu16-dev-github-4/jars/spark-3.1.2-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/utils.py�[0m:111: in deco
[2022-05-15T15:44:49.703Z]     return f(*a, **kw)
[2022-05-15T15:44:49.703Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2022-05-15T15:44:49.703Z] 
[2022-05-15T15:44:49.703Z] answer = 'xro2558741'
[2022-05-15T15:44:49.703Z] gateway_client = <py4j.java_gateway.GatewayClient object at 0x7f7b3ecf1c10>
[2022-05-15T15:44:49.703Z] target_id = 'o2558740', name = 'collectToPython'
[2022-05-15T15:44:49.703Z] 
[2022-05-15T15:44:49.703Z]     def get_return_value(answer, gateway_client, target_id=None, name=None):
[2022-05-15T15:44:49.703Z]         """Converts an answer received from the Java gateway into a Python object.
[2022-05-15T15:44:49.703Z]     
[2022-05-15T15:44:49.703Z]         For example, string representation of integers are converted to Python
[2022-05-15T15:44:49.703Z]         integer, string representation of objects are converted to JavaObject
[2022-05-15T15:44:49.703Z]         instances, etc.
[2022-05-15T15:44:49.703Z]     
[2022-05-15T15:44:49.703Z]         :param answer: the string returned by the Java gateway
[2022-05-15T15:44:49.703Z]         :param gateway_client: the gateway client used to communicate with the Java
[2022-05-15T15:44:49.703Z]             Gateway. Only necessary if the answer is a reference (e.g., object,
[2022-05-15T15:44:49.703Z]             list, map)
[2022-05-15T15:44:49.703Z]         :param target_id: the name of the object from which the answer comes from
[2022-05-15T15:44:49.703Z]             (e.g., *object1* in `object1.hello()`). Optional.
[2022-05-15T15:44:49.703Z]         :param name: the name of the member from which the answer comes from
[2022-05-15T15:44:49.703Z]             (e.g., *hello* in `object1.hello()`). Optional.
[2022-05-15T15:44:49.703Z]         """
[2022-05-15T15:44:49.703Z]         if is_error(answer)[0]:
[2022-05-15T15:44:49.703Z]             if len(answer) > 1:
[2022-05-15T15:44:49.703Z]                 type = answer[1]
[2022-05-15T15:44:49.703Z]                 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
[2022-05-15T15:44:49.703Z]                 if answer[1] == REFERENCE_TYPE:
[2022-05-15T15:44:49.703Z] >                   raise Py4JJavaError(
[2022-05-15T15:44:49.703Z]                         "An error occurred while calling {0}{1}{2}.\n".
[2022-05-15T15:44:49.703Z]                         format(target_id, ".", name), value)
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   py4j.protocol.Py4JJavaError: An error occurred while calling o2558740.collectToPython.�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   : org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 45957.0 failed 1 times, most recent failure: Lost task 5.0 in stage 45957.0 (TID 198266) (10.233.113.151 executor 0): java.lang.AssertionError: End address is too high for setBytes 0x7fcfe5a57628 < 0x7fcfe5a57624�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at ai.rapids.cudf.MemoryBuffer.addressOutOfBoundsCheck(MemoryBuffer.java:138)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at ai.rapids.cudf.HostMemoryBuffer.setBytes(HostMemoryBuffer.java:313)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.HostMemoryOutputStream.write(HostMemoryStreams.scala:39)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReaderBase.$anonfun$readPartFile$3(GpuParquetScan.scala:1404)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.closeOnExcept(Arm.scala:87)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.closeOnExcept$(Arm.scala:85)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.FilePartitionReaderBase.closeOnExcept(GpuMultiFileReader.scala:263)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReaderBase.$anonfun$readPartFile$2(GpuParquetScan.scala:1396)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.FilePartitionReaderBase.withResource(GpuMultiFileReader.scala:263)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReaderBase.$anonfun$readPartFile$1(GpuParquetScan.scala:1394)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.FilePartitionReaderBase.withResource(GpuMultiFileReader.scala:263)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReaderBase.readPartFile(GpuParquetScan.scala:1393)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReaderBase.readPartFile$(GpuParquetScan.scala:1388)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReader.readPartFile(GpuParquetScan.scala:2008)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReader.readToTable(GpuParquetScan.scala:2080)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReader.$anonfun$readBatch$1(GpuParquetScan.scala:2061)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)�[0m
[2022-05-15T15:44:49.703Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.FilePartitionReaderBase.withResource(GpuMultiFileReader.scala:263)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReader.readBatch(GpuParquetScan.scala:2049)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReader.next(GpuParquetScan.scala:2035)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.PartitionReaderWithBytesRead.next(dataSourceUtil.scala:62)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarPartitionReaderWithPartitionValues.next(ColumnarPartitionReaderWithPartitionValues.scala:36)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.v2.PartitionedFileReader.next(FilePartitionReaderFactory.scala:54)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.v2.FilePartitionReader.next(FilePartitionReader.scala:67)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.PartitionIterator.hasNext(dataSourceUtil.scala:29)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.MetricsBatchIterator.hasNext(dataSourceUtil.scala:46)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarToRowIterator.$anonfun$fetchNextBatch$2(GpuColumnarToRowExec.scala:239)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarToRowIterator.withResource(GpuColumnarToRowExec.scala:187)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarToRowIterator.fetchNextBatch(GpuColumnarToRowExec.scala:238)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarToRowIterator.loadNextBatch(GpuColumnarToRowExec.scala:215)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarToRowIterator.hasNext(GpuColumnarToRowExec.scala:255)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:345)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.scheduler.Task.run(Task.scala:131)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at java.lang.Thread.run(Thread.java:748)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   �[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   Driver stacktrace:�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2258)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2207)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2206)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2206)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1079)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1079)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at scala.Option.foreach(Option.scala:407)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1079)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2445)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2387)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2376)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:868)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2196)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2217)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2236)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2261)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1030)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.rdd.RDD.collect(RDD.scala:1029)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:390)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.sql.Dataset.$anonfun$collectToPython$1(Dataset.scala:3519)�[0m
[2022-05-15T15:44:49.704Z] �[1m�[31mE                   	at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3687)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3685)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at org.apache.spark.sql.Dataset.collectToPython(Dataset.scala:3516)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at sun.reflect.GeneratedMethodAccessor82.invoke(Unknown Source)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at java.lang.reflect.Method.invoke(Method.java:498)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at py4j.Gateway.invoke(Gateway.java:282)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at py4j.commands.CallCommand.execute(CallCommand.java:79)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at py4j.GatewayConnection.run(GatewayConnection.java:238)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at java.lang.Thread.run(Thread.java:748)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   Caused by: java.lang.AssertionError: End address is too high for setBytes 0x7fcfe5a57628 < 0x7fcfe5a57624�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at ai.rapids.cudf.MemoryBuffer.addressOutOfBoundsCheck(MemoryBuffer.java:138)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at ai.rapids.cudf.HostMemoryBuffer.setBytes(HostMemoryBuffer.java:313)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.HostMemoryOutputStream.write(HostMemoryStreams.scala:39)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReaderBase.$anonfun$readPartFile$3(GpuParquetScan.scala:1404)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.closeOnExcept(Arm.scala:87)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.closeOnExcept$(Arm.scala:85)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.FilePartitionReaderBase.closeOnExcept(GpuMultiFileReader.scala:263)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReaderBase.$anonfun$readPartFile$2(GpuParquetScan.scala:1396)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.FilePartitionReaderBase.withResource(GpuMultiFileReader.scala:263)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReaderBase.$anonfun$readPartFile$1(GpuParquetScan.scala:1394)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.FilePartitionReaderBase.withResource(GpuMultiFileReader.scala:263)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReaderBase.readPartFile(GpuParquetScan.scala:1393)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReaderBase.readPartFile$(GpuParquetScan.scala:1388)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReader.readPartFile(GpuParquetScan.scala:2008)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReader.readToTable(GpuParquetScan.scala:2080)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReader.$anonfun$readBatch$1(GpuParquetScan.scala:2061)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.FilePartitionReaderBase.withResource(GpuMultiFileReader.scala:263)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReader.readBatch(GpuParquetScan.scala:2049)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ParquetPartitionReader.next(GpuParquetScan.scala:2035)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.PartitionReaderWithBytesRead.next(dataSourceUtil.scala:62)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarPartitionReaderWithPartitionValues.next(ColumnarPartitionReaderWithPartitionValues.scala:36)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.v2.PartitionedFileReader.next(FilePartitionReaderFactory.scala:54)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.datasources.v2.FilePartitionReader.next(FilePartitionReader.scala:67)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.PartitionIterator.hasNext(dataSourceUtil.scala:29)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.MetricsBatchIterator.hasNext(dataSourceUtil.scala:46)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarToRowIterator.$anonfun$fetchNextBatch$2(GpuColumnarToRowExec.scala:239)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarToRowIterator.withResource(GpuColumnarToRowExec.scala:187)�[0m
[2022-05-15T15:44:49.705Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarToRowIterator.fetchNextBatch(GpuColumnarToRowExec.scala:238)�[0m
[2022-05-15T15:44:49.706Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarToRowIterator.loadNextBatch(GpuColumnarToRowExec.scala:215)�[0m
[2022-05-15T15:44:49.706Z] �[1m�[31mE                   	at com.nvidia.spark.rapids.ColumnarToRowIterator.hasNext(GpuColumnarToRowExec.scala:255)�[0m
[2022-05-15T15:44:49.706Z] �[1m�[31mE                   	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)�[0m
[2022-05-15T15:44:49.706Z] �[1m�[31mE                   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:345)�[0m
[2022-05-15T15:44:49.706Z] �[1m�[31mE                   	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)�[0m
[2022-05-15T15:44:49.706Z] �[1m�[31mE                   	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)�[0m
[2022-05-15T15:44:49.706Z] �[1m�[31mE                   	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)�[0m
[2022-05-15T15:44:49.706Z] �[1m�[31mE                   	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)�[0m
[2022-05-15T15:44:49.706Z] �[1m�[31mE                   	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)�[0m
[2022-05-15T15:44:49.706Z] �[1m�[31mE                   	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)�[0m
[2022-05-15T15:44:49.706Z] �[1m�[31mE                   	at org.apache.spark.scheduler.Task.run(Task.scala:131)�[0m
[2022-05-15T15:44:49.706Z] �[1m�[31mE                   	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)�[0m
[2022-05-15T15:44:49.706Z] �[1m�[31mE                   	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)�[0m
[2022-05-15T15:44:49.706Z] �[1m�[31mE                   	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)�[0m
[2022-05-15T15:44:49.706Z] �[1m�[31mE                   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)�[0m
[2022-05-15T15:44:49.706Z] �[1m�[31mE                   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)�[0m
[2022-05-15T15:44:49.706Z] �[1m�[31mE                   	... 1 more�[0m
[2022-05-15T15:44:49.706Z]

The text was updated successfully, but these errors were encountered:

revans2 · 2022-05-17T11:53:51Z

I am skeptical that this has anything to do with the TITAN V. The tests fail in a part of the code that has not even toughed the GPU yet. Low memory on the TITAN V would reduce the parallelism of the tests, but the test settings are set up so that it should not matter. I have tried to recreate the failures with Spark 3.1.1 which is the version that precommit was using and Spark 3.1.2 which is the version in this CI that is failing. Neither of them were able to reproduce the problem.

I was able to verify that the test failure is reproducible in CI, so now I am going to try and slowly work towards reproducing it myself. Perhaps it is ubuntu 16 instead of 20? Or it could be running all of the tests in the same application? Not sure.

revans2 · 2022-05-17T12:09:09Z

The only other idea that I have right now is that it might be the order in which files and directories are returned. It could be that they are being returned in different orders and that is causing schema discovery to come up with something different? Not really sure because it should be merging the schemas to produce the read schema.

…5500) Native footer reader for parquet fetches data fields totally based on read schema, which may lead to overflow if merge schema is enabled. When merge schema is enabled, the file schema of each file partition may not contain the complete (read) schema. In this situation, native footer reader will come up with incorrect footers. Fallback the parquet reading to CPU if merge schema and native footer reader are both enabled, in case of buffer overflow like #5493

pxLi added bug Something isn't working ? - Needs Triage Need team to review and classify labels May 16, 2022

sperlingxx mentioned this issue May 16, 2022

Fallback parquet reading with merged schema and native footer reader #5500

Merged

revans2 self-assigned this May 16, 2022

sameerz added P1 Nice to have for release and removed ? - Needs Triage Need team to review and classify labels May 17, 2022

revans2 mentioned this issue Jul 11, 2022

Add in better native parquet footer implementation and remove the old one NVIDIA/spark-rapids-jni#365

Merged

revans2 closed this as completed in NVIDIA/spark-rapids-jni#365 Jul 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] test_parquet_read_merge_schema failed w/ TITAN V #5493

[BUG] test_parquet_read_merge_schema failed w/ TITAN V #5493

pxLi commented May 16, 2022

revans2 commented May 17, 2022

revans2 commented May 17, 2022

[BUG] test_parquet_read_merge_schema failed w/ TITAN V #5493

[BUG] test_parquet_read_merge_schema failed w/ TITAN V #5493

Comments

pxLi commented May 16, 2022

revans2 commented May 17, 2022

revans2 commented May 17, 2022