You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While testing the coalesce code for compressed batches using LZ4 I got the following exception, I thought it was an issue with a branch I am working on, but this was with 23.04 without my change:
ai.rapids.cudf.nvcomp.NvcompException: nvcomp decompress output size mismatch
at ai.rapids.cudf.nvcomp.NvcompJni.batchedLZ4DecompressAsync(Native Method)
at ai.rapids.cudf.nvcomp.BatchedLZ4Decompressor.decompressAsync(BatchedLZ4Decompressor.java:79)
at com.nvidia.spark.rapids.BatchedNvcompLZ4Decompressor.decompressAsync(NvcompLZ4CompressionCodec.scala:94)
at com.nvidia.spark.rapids.BatchedBufferDecompressor.$anonfun$decompressBatch$1(TableCompressionCodec.scala:323)
at com.nvidia.spark.rapids.BatchedBufferDecompressor.$anonfun$decompressBatch$1$adapted(TableCompressionCodec.scala:321)
at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
at com.nvidia.spark.rapids.BatchedBufferDecompressor.withResource(TableCompressionCodec.scala:258)
at com.nvidia.spark.rapids.BatchedBufferDecompressor.decompressBatch(TableCompressionCodec.scala:321)
at com.nvidia.spark.rapids.BatchedBufferDecompressor.finishAsync(TableCompressionCodec.scala:305)
at com.nvidia.spark.rapids.GpuCompressionAwareCoalesceIterator.$anonfun$popAll$8(GpuCoalesceBatches.scala:639)
at com.nvidia.spark.rapids.GpuCompressionAwareCoalesceIterator.$anonfun$popAll$8$adapted(GpuCoalesceBatches.scala:631)
at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
at com.nvidia.spark.rapids.AbstractGpuCoalesceIterator.withResource(GpuCoalesceBatches.scala:237)
at com.nvidia.spark.rapids.GpuCompressionAwareCoalesceIterator.$anonfun$popAll$4(GpuCoalesceBatches.scala:631)
at com.nvidia.spark.rapids.Arm.closeOnExcept(Arm.scala:109)
at com.nvidia.spark.rapids.Arm.closeOnExcept$(Arm.scala:107)
at com.nvidia.spark.rapids.AbstractGpuCoalesceIterator.closeOnExcept(GpuCoalesceBatches.scala:237)
at com.nvidia.spark.rapids.GpuCompressionAwareCoalesceIterator.popAll(GpuCoalesceBatches.scala:615)
at com.nvidia.spark.rapids.GpuCoalesceIterator.concatAllAndPutOnGPU(GpuCoalesceBatches.scala:543)
at com.nvidia.spark.rapids.AbstractGpuCoalesceIterator.$anonfun$next$6(GpuCoalesceBatches.scala:478)
I also see this:
23/03/06 22:49:51 WARN TaskSetManager: Lost task 13.0 in stage 1.0 (TID 18) (127.0.0.1 executor 0): ai.rapids.cudf.CudaException: CUDA error encountered at: /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-363-cuda11/thirdparty/cudf/java/src/main/native/src/CudaJni.cpp:377: 1 cudaErrorInvalidValue invalid argument
at ai.rapids.cudf.Cuda.asyncMemcpyOnStream(Native Method)
at ai.rapids.cudf.Cuda.asyncMemcpy(Cuda.java:529)
at ai.rapids.cudf.Cuda.multiBufferCopyAsync(Cuda.java:582)
at ai.rapids.cudf.nvcomp.BatchedLZ4Decompressor.fetchMetadata(BatchedLZ4Decompressor.java:192)
at ai.rapids.cudf.nvcomp.BatchedLZ4Decompressor.buildAddrsSizesBuffer(BatchedLZ4Decompressor.java:121)
at ai.rapids.cudf.nvcomp.BatchedLZ4Decompressor.decompressAsync(BatchedLZ4Decompressor.java:67)
at com.nvidia.spark.rapids.BatchedNvcompLZ4Decompressor.decompressAsync(NvcompLZ4CompressionCodec.scala:94)
at com.nvidia.spark.rapids.BatchedBufferDecompressor.$anonfun$decompressBatch$1(TableCompressionCodec.scala:323)
at com.nvidia.spark.rapids.BatchedBufferDecompressor.$anonfun$decompressBatch$1$adapted(TableCompressionCodec.scala:321)
at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
at com.nvidia.spark.rapids.BatchedBufferDecompressor.withResource(TableCompressionCodec.scala:258)
at com.nvidia.spark.rapids.BatchedBufferDecompressor.decompressBatch(TableCompressionCodec.scala:321)
at com.nvidia.spark.rapids.BatchedBufferDecompressor.finishAsync(TableCompressionCodec.scala:305)
at com.nvidia.spark.rapids.GpuCompressionAwareCoalesceIterator.$anonfun$popAll$8(GpuCoalesceBatches.scala:639)
at com.nvidia.spark.rapids.GpuCompressionAwareCoalesceIterator.$anonfun$popAll$8$adapted(GpuCoalesceBatches.scala:631)
at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
The text was updated successfully, but these errors were encountered:
I can confirm that if I compress/decompress in the same stack, I am not seeing an issue. I've done this with LZ4 and with the copy codec. The problem I am seeing is after buffers are cached in the spill framework, so this is a spill framework issue with compressed vectors as far as I can tell.
While testing the coalesce code for compressed batches using LZ4 I got the following exception, I thought it was an issue with a branch I am working on, but this was with 23.04 without my change:
I also see this:
The text was updated successfully, but these errors were encountered: