[BUG] test_iceberg_parquet_read_round_trip FAILED "TypeError: object of type 'NoneType' has no len()" #6718

NvTimLiu · 2022-10-07T03:49:31Z

Describe the bug

iceberg_test.py::test_iceberg_parquet_read_round_trip[COALESCING-[Byte, Short, Integer, ...

TypeError: object of type 'NoneType' has no len()

spark_tmp_table_factory = <conftest.TmpTableFactory object at 0x7f93403b2670> data_gens = [Byte, Short, Integer, Long, Float, Double, ...] reader_type = 'COALESCING' @iceberg @ignore_order(local=True) 
# Iceberg plans with a thread pool and is not deterministic in file ordering @pytest.mark.parametrize("data_gens", iceberg_gens_list, ids=idfn) @pytest.mark.parametrize('reader_type', rapids_reader_types) 
    def test_iceberg_parquet_read_round_trip(spark_tmp_table_factory, data_gens, reader_type): gen_list = [('_c' + str(i), gen) for i, gen in enumerate(data_gens)] table = spark_tmp_table_factory.get() tmpview = spark_tmp_table_factory.get() 
    def setup_iceberg_table(spark): df = gen_df(spark, gen_list) df.createOrReplaceTempView(tmpview) spark.sql("CREATE TABLE {} USING ICEBERG AS SELECT * FROM {}".format(table, tmpview)) with_cpu_session(setup_iceberg_table) > 
    assert_gpu_and_cpu_are_equal_collect( lambda spark : spark.sql("SELECT * FROM {}".format(table)), conf={'spark.rapids.sql.format.parquet.reader.type': reader_type}) 
../../src/main/python/iceberg_test.py:88:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
    ../../src/main/python/asserts.py:548: in assert_gpu_and_cpu_are_equal_collect _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first) 
    ../../src/main/python/asserts.py:479: in _assert_gpu_and_cpu_are_equal assert_equal(from_cpu, from_gpu) 
    ../../src/main/python/asserts.py:106: in assert_equal _assert_equal(cpu, gpu, float_check=get_float_check(), path=[]) 
    ../../src/main/python/asserts.py:42: in _assert_equal _assert_equal(cpu[index], gpu[index], float_check, path + [index]) 
    ../../src/main/python/asserts.py:35: in _assert_equal _assert_equal(cpu[field], gpu[field], float_check, path + [field])
 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
 cpu = Row(child0=[110, -70, 109, -17, 97, 0, -66], child1=108, child2=-4.712884395247157e+25, child3=Decimal('3306845829.53')) 
 gpu = None float_check = <function get_float_check.<locals>.<lambda> at 0x7f9336140550> path = [0, '_c19'] 
 def _assert_equal(cpu, gpu, float_check, path): t = type(cpu) if (t is Row): > 
 assert len(cpu) == len(gpu), "CPU and GPU row have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu)) 
 E TypeError: object of type 'NoneType' has no len() 
    ../../src/main/python/asserts.py:31: TypeError</failure>
**Steps/Code to reproduce bug**
Please provide a list of steps or a code sample to reproduce the issue.
Avoid posting private or sensitive data.

The text was updated successfully, but these errors were encountered:

tgravescs · 2022-10-07T13:46:02Z

I can reproduce locally on 22.12 but wasn't able to on 22.10

tgravescs · 2022-10-07T13:49:29Z

gpu output seems to be missing a column entry, gpu has this as None cpu has it as: _c19=Row(child0=[110, -70, 109, -17, 97, 0, -66], child1=108, child2=-4.712884395247157e+25, child3=Decimal('3306845829.53')), on the first row of output

tgravescs · 2022-10-07T14:17:19Z

cudf from 10/4 works so something must have changed there

tgravescs · 2022-10-07T14:37:16Z

fails with spark-rapids-jni jar from 10/6 so likely something on 4th or 5th that went in

tgravescs · 2022-10-10T15:48:39Z

Note this is happening when data type is:
StructGen([['child0', ArrayGen(float_gen)]])

And it only happens when you select enough data to make the coalescing kick in. It also only happens with iceberg, reading the raw parquet files the coalescing reader works fine.

With iceberg a ton of these columns come back with null instead of the actual values.
Selecting the exact same iceberg table with 22.10 works fine.

tgravescs · 2022-10-10T17:45:37Z

I finally got a parquet file that would reproduce this and sent to cudf folks

tgravescs · 2022-10-11T13:56:59Z

going to xfail the test temporarily

Fixes NVIDIA/spark-rapids#6718 There was a bug introduced recently #11752 where an insufficient check for whether an input column contained repetition information could cause incorrect results for column hierarchies with structs at the root. Authors: - https://github.com/nvdbaranec Approvers: - Jim Brennan (https://github.com/jbrennan333) - Nghia Truong (https://github.com/ttnghia) - Mike Wilson (https://github.com/hyperbolic2346) URL: #11910

NvTimLiu added bug Something isn't working ? - Needs Triage Need team to review and classify labels Oct 7, 2022

tgravescs added the P0 Must have for release label Oct 7, 2022

tgravescs mentioned this issue Oct 7, 2022

[BUG] null pointer exception selecting single column from iceberg table #6723

Closed

tgravescs self-assigned this Oct 10, 2022

tgravescs removed the ? - Needs Triage Need team to review and classify label Oct 10, 2022

tgravescs assigned nvdbaranec Oct 11, 2022

tgravescs mentioned this issue Oct 11, 2022

Temporarily xfail failing test_iceberg_parquet_read_round_trip test #6756

Merged

nvdbaranec mentioned this issue Oct 12, 2022

Fix an issue reading struct-of-list types in Parquet. rapidsai/cudf#11910

Merged

tgravescs mentioned this issue Oct 13, 2022

Revert "Temporarily xfail failing test_iceberg_parquet_read_round_trip test" #6783

Merged

tgravescs closed this as completed in #6783 Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] test_iceberg_parquet_read_round_trip FAILED "TypeError: object of type 'NoneType' has no len()" #6718

[BUG] test_iceberg_parquet_read_round_trip FAILED "TypeError: object of type 'NoneType' has no len()" #6718

NvTimLiu commented Oct 7, 2022

tgravescs commented Oct 7, 2022

tgravescs commented Oct 7, 2022

tgravescs commented Oct 7, 2022

tgravescs commented Oct 7, 2022

tgravescs commented Oct 10, 2022

tgravescs commented Oct 10, 2022

tgravescs commented Oct 11, 2022

[BUG] test_iceberg_parquet_read_round_trip FAILED "TypeError: object of type 'NoneType' has no len()" #6718

[BUG] test_iceberg_parquet_read_round_trip FAILED "TypeError: object of type 'NoneType' has no len()" #6718

Comments

NvTimLiu commented Oct 7, 2022

tgravescs commented Oct 7, 2022

tgravescs commented Oct 7, 2022

tgravescs commented Oct 7, 2022

tgravescs commented Oct 7, 2022

tgravescs commented Oct 10, 2022

tgravescs commented Oct 10, 2022

tgravescs commented Oct 11, 2022