[BUG] cudf should throw exception when reading parquet file with unsupported codec #10602

wbo4958 · 2022-04-06T09:21:01Z

Describe the bug
Cudf returns wrong values when reading parquet file with zstd codec in the latested 22.04 nightly build. The file is test.zstd.txt.

Or Cudf throws exception rmm-src/include/rmm/cuda_stream_view.hpp:81: cudaErrorIllegalAddress an illegal memory access was encountered, which I did not reproduce.

Meta info of test.zstd.txt

creator:     parquet-mr version 1.12.1 (build 2a5c06c58fa987f85aa22170be14d927d5ff6e7d) 
extra:       org.apache.spark.version = 3.2.0 
extra:       org.apache.spark.sql.parquet.row.metadata = {"type":"struct","fields":[{"name":"a","type":"integer","nullable":false,"metadata":{}}]} 

file schema: spark_schema 
--------------------------------------------------------------------------------
a:           REQUIRED INT32 R:0 D:0

row group 1: RC:3 TS:35 OFFSET:4 
--------------------------------------------------------------------------------
a:            INT32 ZSTD DO:0 FPO:4 SZ:44/35/0.80 VC:3 ENC:BIT_PACKED,PLAIN ST:[min: 1, max: 3, num_nulls: 0]

cudf read

In [1]: import cudf

In [2]: df = cudf.read_parquet("parquet-zstd")

In [3]: df.a
Out[3]: 
0    768
1    768
2      0
Name: a, dtype: int32

pandas read

In [14]: import pandas as pd

In [15]: df = pd.read_parquet("parquet-zstd")

In [16]: df.a
Out[16]: 
0    1
1    2
2    3
Name: a, dtype: int32

Expected behavior

Although Cudf does not support zstd compression, it should throw an exception with "unsupported zstd codec" instead of returning the wrong value or cudaErrorIllegalAddress ...

The text was updated successfully, but these errors were encountered:

Closes #10602 This PR adds a compression type check for each chunk in the input file. Reader throws in an unsupported compression is used. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - https://github.com/brandon-b-miller - Yunsong Wang (https://github.com/PointKernel) - Bradley Dice (https://github.com/bdice) URL: #10610

wbo4958 added bug Something isn't working Needs Triage Need team to review and classify cuIO cuIO issue labels Apr 6, 2022

jlowe added the Spark Functionality that helps Spark RAPIDS label Apr 6, 2022

vuule mentioned this issue Apr 6, 2022

Verify compression type in Parquet reader #10610

Merged

rapids-bot bot closed this as completed in #10610 Apr 7, 2022

bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] cudf should throw exception when reading parquet file with unsupported codec #10602

[BUG] cudf should throw exception when reading parquet file with unsupported codec #10602

wbo4958 commented Apr 6, 2022

[BUG] cudf should throw exception when reading parquet file with unsupported codec #10602

[BUG] cudf should throw exception when reading parquet file with unsupported codec #10602

Comments

wbo4958 commented Apr 6, 2022

Meta info of test.zstd.txt

cudf read

pandas read