[TEST] Compatibility tests for data formats #8666
Labels
reliability
Features to improve reliability or bugs that severly impact the reliability of the plugin
task
Work required that improves the product but is not user facing
test
Only impacts tests
Here is a list of tests to confirm data format compatibility with Apache Spark, in the Spark RAPIDS plugin. This list is a work in progress:
Orc:
OrcQuerySuite.scala#359
. (Note: Not sure how this test is verifying dictionary_v2). #8797 SPARK-5309: ORC STRING column uses dictionary compression. OrcQuerySuite.scala#359. (Note: Not sure how this test is verifying dictionary_v2).OrcQuerySuite.scala#L173
, but with large number of rows. #8731 ORC reads at scale with all null values: LikeOrcQuerySuite.scala#L173
, but with large number of rows.orc.compress
on writes, whencompress
is unset:OrcQuerySuite.scala#L189
. #8781 SPARK-16610: Honour orc.compress on writes, when compress is unset: OrcQuerySuite.scala#L189.compress
should be honoured when set (ZLIB
,Snappy
,None
). Refer toOrcQuerySuite.scala#L224
. #8782 compress should be honoured when set (ZLIB, Snappy, None). Refer to OrcQuerySuite.scala#L224.OrcQuerySuite.scala:371
#8793 SPARK-9170: Upper case ORC column names are not implicitly stored in lowercase. Refer to OrcQuerySuite.scala:371SQLConf.IGNORE_CORRUPT_FILES
should be honored #8840OrcQuerySuite.scala#L464
. #8823 Test predicate pushdown (PPD) with timestamps, decimals, booleans, etc. Refer to OrcQuerySuite.scala#L464.Parquet:
Add test for the timestamp error case described in SPARK-10177 #8693
Add test for Parquet schema interpretation problem described in SPARK-16344 #8694
Add an xfail test for Parquet reads for
LIST<STRUCT<int, string>>
#8708Add tests for column names with dots #8704 Support for Parquet columns with names containing dot('.')
Selecting complex fields: (Sampling from SchemaPruningSuite.)
Add test to verify Parquet predicate pushdown for fields having dots in the names #9094
[Test] Test predicate pushdown (PPD) with timestamps, decimals, booleans, etc for Parquet file #9127
P3: [P3] Parquet reads with
CONVERT_METASTORE_PARQUET_WITH_SCHEMA_MERGING=true
. #9074 (No reference Spark tests. Not sure how this is tested.)Support parquet.block.size to control row group size for parquet [FEA] Support parquet.block.size to control row group size for parquet #9126
Statistics tests for Parquet files written by GPU #8762
Add test to verify the fallback for UDT for parquet, refer to link1 and link2 Add test to verify the fallback for UDT for parquet #9137
Test compatibility between pyarrow and GPU
Test compatibility between fastparquet and GPUTest compatibility between fastparquet and GPU #9550
P1: Written/read by Spark and read/written by GPU. Generate files in parquet testing cases
P1: Written/read by Hive and read/written by GPU. Generate files in parquet testing cases
Verify compression codec in Parquet file #9151
Tasks
LIST<STRUCT<int, string>>
#8708OrcQuerySuite.scala#L173
, but with large number of rows. #8731OrcQuerySuite.scala#L464
. #8823The text was updated successfully, but these errors were encountered: