Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] CPU mismatch GPU result in test_hash_groupby_collect_with_single_distinct intermittently #7104

Closed
pxLi opened this issue Nov 18, 2022 · 4 comments
Assignees
Labels
bug Something isn't working duplicate This issue or pull request already exists test Only impacts tests

Comments

@pxLi
Copy link
Collaborator

pxLi commented Nov 18, 2022

Describe the bug
test_hash_groupby_collect_with_single_distinct[[('a', RepeatSeq(Long)), ('b', RepeatSeq(Boolean)), ('c', LongRange(not_null))]][IGNORE_ORDER({'local': True})] - AssertionError: GPU and CPU boolean values are different at [18, 'sort_array(collect_list(b), true)', 3]

[2022-11-18T03:41:02.798Z] CPU OUTPUT: [Row(a=-7540734677356764604, sort_array(collect_list(b), true)=[False, False, True, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=-5831592707909023540, sort_array(collect_list(b), true)=[True, True, True, True, True], sort_array(collect_set(b), true)=[True], count(c)=5, count(c)=5), Row(a=-5133656973475552689, sort_array(collect_list(b), true)=[True, True, True, True, True], sort_array(collect_set(b), true)=[True], count(c)=5, count(c)=5), Row(a=-4426181692283497353, sort_array(collect_list(b), true)=[False, True, True, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=-3917032101531217289, sort_array(collect_list(b), true)=[True, True, True, True, True], sort_array(collect_set(b), true)=[True], count(c)=5, count(c)=5), Row(a=-3502159106106506455, sort_array(collect_list(b), true)=[True, True, True, True, True], sort_array(collect_set(b), true)=[True], count(c)=5, count(c)=5), Row(a=-2697073954890740236, sort_array(collect_list(b), true)=[False, True, True, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=-2123199122092230623, sort_array(collect_list(b), true)=[False, False, True, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=-1, sort_array(collect_list(b), true)=[True, True, True, True, True], sort_array(collect_set(b), true)=[True], count(c)=5, count(c)=5), Row(a=207981845540287738, sort_array(collect_list(b), true)=[False, False, True, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=393905103838704542, sort_array(collect_list(b), true)=[False, False, False, False, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=875130347651831881, sort_array(collect_list(b), true)=[False, False, False, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=4751953708995107450, sort_array(collect_list(b), true)=[False, False, False, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=6084712057446794809, sort_array(collect_list(b), true)=[True, True, True, True, True], sort_array(collect_set(b), true)=[True], count(c)=5, count(c)=5), Row(a=7198729688045931692, sort_array(collect_list(b), true)=[True, True, True, True, True], sort_array(collect_set(b), true)=[True], count(c)=5, count(c)=5), Row(a=7528354001793048440, sort_array(collect_list(b), true)=[False, False, False, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=7618709293599214015, sort_array(collect_list(b), true)=[True, True, True, True, True], sort_array(collect_set(b), true)=[True], count(c)=5, count(c)=5), Row(a=7984374766242566542, sort_array(collect_list(b), true)=[False, False, True, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=9223372036854775807, sort_array(collect_list(b), true)=[False, False, False, True, True, True, True, True, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=10, count(c)=10)]

[2022-11-18T03:41:02.799Z] GPU OUTPUT: [Row(a=-7540734677356764604, sort_array(collect_list(b), true)=[False, False, True, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=-5831592707909023540, sort_array(collect_list(b), true)=[True, True, True, True, True], sort_array(collect_set(b), true)=[True], count(c)=5, count(c)=5), Row(a=-5133656973475552689, sort_array(collect_list(b), true)=[True, True, True, True, True], sort_array(collect_set(b), true)=[True], count(c)=5, count(c)=5), Row(a=-4426181692283497353, sort_array(collect_list(b), true)=[False, True, True, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=-3917032101531217289, sort_array(collect_list(b), true)=[True, True, True, True, True], sort_array(collect_set(b), true)=[True], count(c)=5, count(c)=5), Row(a=-3502159106106506455, sort_array(collect_list(b), true)=[True, True, True, True, True], sort_array(collect_set(b), true)=[True], count(c)=5, count(c)=5), Row(a=-2697073954890740236, sort_array(collect_list(b), true)=[False, True, True, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=-2123199122092230623, sort_array(collect_list(b), true)=[False, False, True, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=-1, sort_array(collect_list(b), true)=[True, True, True, True, True], sort_array(collect_set(b), true)=[True], count(c)=5, count(c)=5), Row(a=207981845540287738, sort_array(collect_list(b), true)=[False, False, True, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=393905103838704542, sort_array(collect_list(b), true)=[False, False, False, False, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=875130347651831881, sort_array(collect_list(b), true)=[False, False, False, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=4751953708995107450, sort_array(collect_list(b), true)=[False, False, False, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=6084712057446794809, sort_array(collect_list(b), true)=[True, True, True, True, True], sort_array(collect_set(b), true)=[True], count(c)=5, count(c)=5), Row(a=7198729688045931692, sort_array(collect_list(b), true)=[True, True, True, True, True], sort_array(collect_set(b), true)=[True], count(c)=5, count(c)=5), Row(a=7528354001793048440, sort_array(collect_list(b), true)=[False, False, False, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=7618709293599214015, sort_array(collect_list(b), true)=[True, True, True, True, True], sort_array(collect_set(b), true)=[True], count(c)=5, count(c)=5), Row(a=7984374766242566542, sort_array(collect_list(b), true)=[False, False, True, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=9223372036854775807, sort_array(collect_list(b), true)=[False, False, False, False, True, True, True, True, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=10, count(c)=10)]

[2022-11-18T03:41:02.797Z] _ test_hash_groupby_collect_with_single_distinct[[('a', RepeatSeq(Long)), ('b', RepeatSeq(Boolean)), ('c', LongRange(not_null))]] _

[2022-11-18T03:41:02.797Z] [gw3] linux -- Python 3.8.15 /usr/bin/python

[2022-11-18T03:41:02.797Z] 

[2022-11-18T03:41:02.797Z] data_gen = [('a', RepeatSeq(Long)), ('b', RepeatSeq(Boolean)), ('c', LongRange(not_null))]

[2022-11-18T03:41:02.797Z] 

[2022-11-18T03:41:02.797Z]     @ignore_order(local=True)

[2022-11-18T03:41:02.797Z]     @pytest.mark.parametrize('data_gen', _full_gen_data_for_collect_op, ids=idfn)

[2022-11-18T03:41:02.797Z]     def test_hash_groupby_collect_with_single_distinct(data_gen):

[2022-11-18T03:41:02.797Z]         # test collect_ops with other distinct aggregations

[2022-11-18T03:41:02.797Z] >       assert_gpu_and_cpu_are_equal_collect(

[2022-11-18T03:41:02.797Z]             lambda spark: gen_df(spark, data_gen, length=100)

[2022-11-18T03:41:02.797Z]                 .groupby('a')

[2022-11-18T03:41:02.797Z]                 .agg(f.sort_array(f.collect_list('b')),

[2022-11-18T03:41:02.797Z]                      f.sort_array(f.collect_set('b')),

[2022-11-18T03:41:02.797Z]                      f.countDistinct('c'),

[2022-11-18T03:41:02.797Z]                      f.count('c')))

[2022-11-18T03:41:02.797Z] 

[2022-11-18T03:41:02.797Z] ../../src/main/python/hash_aggregate_test.py:736: 

[2022-11-18T03:41:02.797Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

[2022-11-18T03:41:02.797Z] ../../src/main/python/asserts.py:548: in assert_gpu_and_cpu_are_equal_collect

[2022-11-18T03:41:02.797Z]     _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first)

[2022-11-18T03:41:02.797Z] ../../src/main/python/asserts.py:479: in _assert_gpu_and_cpu_are_equal

[2022-11-18T03:41:02.797Z]     assert_equal(from_cpu, from_gpu)

[2022-11-18T03:41:02.797Z] ../../src/main/python/asserts.py:106: in assert_equal

[2022-11-18T03:41:02.797Z]     _assert_equal(cpu, gpu, float_check=get_float_check(), path=[])

[2022-11-18T03:41:02.797Z] ../../src/main/python/asserts.py:42: in _assert_equal

[2022-11-18T03:41:02.797Z]     _assert_equal(cpu[index], gpu[index], float_check, path + [index])

[2022-11-18T03:41:02.797Z] ../../src/main/python/asserts.py:35: in _assert_equal

[2022-11-18T03:41:02.797Z]     _assert_equal(cpu[field], gpu[field], float_check, path + [field])

[2022-11-18T03:41:02.797Z] ../../src/main/python/asserts.py:42: in _assert_equal

[2022-11-18T03:41:02.797Z]     _assert_equal(cpu[index], gpu[index], float_check, path + [index])

[2022-11-18T03:41:02.797Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

[2022-11-18T03:41:02.797Z] 

[2022-11-18T03:41:02.797Z] cpu = True, gpu = False

[2022-11-18T03:41:02.797Z] float_check = <function get_float_check.<locals>.<lambda> at 0x7fcbd6f26ee0>

[2022-11-18T03:41:02.797Z] path = [18, 'sort_array(collect_list(b), true)', 3]

[2022-11-18T03:41:02.797Z] 

[2022-11-18T03:41:02.797Z]     def _assert_equal(cpu, gpu, float_check, path):

[2022-11-18T03:41:02.797Z]         t = type(cpu)

[2022-11-18T03:41:02.797Z]         if (t is Row):

[2022-11-18T03:41:02.797Z]             assert len(cpu) == len(gpu), "CPU and GPU row have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))

[2022-11-18T03:41:02.797Z]             if hasattr(cpu, "__fields__") and hasattr(gpu, "__fields__"):

[2022-11-18T03:41:02.797Z]                 assert cpu.__fields__ == gpu.__fields__, "CPU and GPU row have different fields at {} CPU: {} GPU: {}".format(path, cpu.__fields__, gpu.__fields__)

[2022-11-18T03:41:02.797Z]                 for field in cpu.__fields__:

[2022-11-18T03:41:02.797Z]                     _assert_equal(cpu[field], gpu[field], float_check, path + [field])

[2022-11-18T03:41:02.797Z]             else:

[2022-11-18T03:41:02.797Z]                 for index in range(len(cpu)):

[2022-11-18T03:41:02.797Z]                     _assert_equal(cpu[index], gpu[index], float_check, path + [index])

[2022-11-18T03:41:02.797Z]         elif (t is list):

[2022-11-18T03:41:02.797Z]             assert len(cpu) == len(gpu), "CPU and GPU list have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))

[2022-11-18T03:41:02.798Z]             for index in range(len(cpu)):

[2022-11-18T03:41:02.798Z]                 _assert_equal(cpu[index], gpu[index], float_check, path + [index])

[2022-11-18T03:41:02.798Z]         elif (t is tuple):

[2022-11-18T03:41:02.798Z]             assert len(cpu) == len(gpu), "CPU and GPU list have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))

[2022-11-18T03:41:02.798Z]             for index in range(len(cpu)):

[2022-11-18T03:41:02.798Z]                 _assert_equal(cpu[index], gpu[index], float_check, path + [index])

[2022-11-18T03:41:02.798Z]         elif (t is pytypes.GeneratorType):

[2022-11-18T03:41:02.798Z]             index = 0

[2022-11-18T03:41:02.798Z]             # generator has no zip :( so we have to do this the hard way

[2022-11-18T03:41:02.798Z]             done = False

[2022-11-18T03:41:02.798Z]             while not done:

[2022-11-18T03:41:02.798Z]                 sub_cpu = None

[2022-11-18T03:41:02.798Z]                 sub_gpu = None

[2022-11-18T03:41:02.798Z]                 try:

[2022-11-18T03:41:02.798Z]                     sub_cpu = next(cpu)

[2022-11-18T03:41:02.798Z]                 except StopIteration:

[2022-11-18T03:41:02.798Z]                     done = True

[2022-11-18T03:41:02.798Z]     

[2022-11-18T03:41:02.798Z]                 try:

[2022-11-18T03:41:02.798Z]                     sub_gpu = next(gpu)

[2022-11-18T03:41:02.798Z]                 except StopIteration:

[2022-11-18T03:41:02.798Z]                     done = True

[2022-11-18T03:41:02.798Z]     

[2022-11-18T03:41:02.798Z]                 if done:

[2022-11-18T03:41:02.798Z]                     assert sub_cpu == sub_gpu and sub_cpu == None, "CPU and GPU generators have different lengths at {}".format(path)

[2022-11-18T03:41:02.798Z]                 else:

[2022-11-18T03:41:02.798Z]                     _assert_equal(sub_cpu, sub_gpu, float_check, path + [index])

[2022-11-18T03:41:02.798Z]     

[2022-11-18T03:41:02.798Z]                 index = index + 1

[2022-11-18T03:41:02.798Z]         elif (t is dict):

[2022-11-18T03:41:02.798Z]             # The order of key/values is not guaranteed in python dicts, nor are they guaranteed by Spark

[2022-11-18T03:41:02.798Z]             # so sort the items to do our best with ignoring the order of dicts

[2022-11-18T03:41:02.798Z]             cpu_items = list(cpu.items()).sort(key=_RowCmp)

[2022-11-18T03:41:02.798Z]             gpu_items = list(gpu.items()).sort(key=_RowCmp)

[2022-11-18T03:41:02.798Z]             _assert_equal(cpu_items, gpu_items, float_check, path + ["map"])

[2022-11-18T03:41:02.798Z]         elif (t is int):

[2022-11-18T03:41:02.798Z]             assert cpu == gpu, "GPU and CPU int values are different at {}".format(path)

[2022-11-18T03:41:02.798Z]         elif (t is float):

[2022-11-18T03:41:02.798Z]             if (math.isnan(cpu)):

[2022-11-18T03:41:02.798Z]                 assert math.isnan(gpu), "GPU and CPU float values are different at {}".format(path)

[2022-11-18T03:41:02.798Z]             else:

[2022-11-18T03:41:02.798Z]                 assert float_check(cpu, gpu), "GPU and CPU float values are different {}".format(path)

[2022-11-18T03:41:02.798Z]         elif isinstance(cpu, str):

[2022-11-18T03:41:02.798Z]             assert cpu == gpu, "GPU and CPU string values are different at {}".format(path)

[2022-11-18T03:41:02.798Z]         elif isinstance(cpu, datetime):

[2022-11-18T03:41:02.798Z]             assert cpu == gpu, "GPU and CPU timestamp values are different at {}".format(path)

[2022-11-18T03:41:02.798Z]         elif isinstance(cpu, date):

[2022-11-18T03:41:02.798Z]             assert cpu == gpu, "GPU and CPU date values are different at {}".format(path)

[2022-11-18T03:41:02.798Z]         elif isinstance(cpu, bool):

[2022-11-18T03:41:02.798Z] >           assert cpu == gpu, "GPU and CPU boolean values are different at {}".format(path)

[2022-11-18T03:41:02.798Z] E           AssertionError: GPU and CPU boolean values are different at [18, 'sort_array(collect_list(b), true)', 3]

[2022-11-18T03:41:02.798Z] 

[2022-11-18T03:41:02.798Z] ../../src/main/python/asserts.py:90: AssertionError

[2022-11-18T03:41:02.798Z] ----------------------------- Captured stdout call -----------------------------

[2022-11-18T03:41:02.798Z] ### CPU RUN ###

[2022-11-18T03:41:02.798Z] ### GPU RUN ###

[2022-11-18T03:41:02.798Z] ### COLLECT: GPU TOOK 0.2924313545227051 CPU TOOK 0.29257845878601074 ###

[2022-11-18T03:41:02.798Z] CPU OUTPUT: [Row(a=-7540734677356764604, sort_array(collect_list(b), true)=[False, False, True, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=-5831592707909023540, sort_array(collect_list(b), true)=[True, True, True, True, True], sort_array(collect_set(b), true)=[True], count(c)=5, count(c)=5), Row(a=-5133656973475552689, sort_array(collect_list(b), true)=[True, True, True, True, True], sort_array(collect_set(b), true)=[True], count(c)=5, count(c)=5), Row(a=-4426181692283497353, sort_array(collect_list(b), true)=[False, True, True, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=-3917032101531217289, sort_array(collect_list(b), true)=[True, True, True, True, True], sort_array(collect_set(b), true)=[True], count(c)=5, count(c)=5), Row(a=-3502159106106506455, sort_array(collect_list(b), true)=[True, True, True, True, True], sort_array(collect_set(b), true)=[True], count(c)=5, count(c)=5), Row(a=-2697073954890740236, sort_array(collect_list(b), true)=[False, True, True, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=-2123199122092230623, sort_array(collect_list(b), true)=[False, False, True, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=-1, sort_array(collect_list(b), true)=[True, True, True, True, True], sort_array(collect_set(b), true)=[True], count(c)=5, count(c)=5), Row(a=207981845540287738, sort_array(collect_list(b), true)=[False, False, True, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=393905103838704542, sort_array(collect_list(b), true)=[False, False, False, False, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=875130347651831881, sort_array(collect_list(b), true)=[False, False, False, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=4751953708995107450, sort_array(collect_list(b), true)=[False, False, False, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=6084712057446794809, sort_array(collect_list(b), true)=[True, True, True, True, True], sort_array(collect_set(b), true)=[True], count(c)=5, count(c)=5), Row(a=7198729688045931692, sort_array(collect_list(b), true)=[True, True, True, True, True], sort_array(collect_set(b), true)=[True], count(c)=5, count(c)=5), Row(a=7528354001793048440, sort_array(collect_list(b), true)=[False, False, False, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=7618709293599214015, sort_array(collect_list(b), true)=[True, True, True, True, True], sort_array(collect_set(b), true)=[True], count(c)=5, count(c)=5), Row(a=7984374766242566542, sort_array(collect_list(b), true)=[False, False, True, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=9223372036854775807, sort_array(collect_list(b), true)=[False, False, False, True, True, True, True, True, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=10, count(c)=10)]

[2022-11-18T03:41:02.799Z] GPU OUTPUT: [Row(a=-7540734677356764604, sort_array(collect_list(b), true)=[False, False, True, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=-5831592707909023540, sort_array(collect_list(b), true)=[True, True, True, True, True], sort_array(collect_set(b), true)=[True], count(c)=5, count(c)=5), Row(a=-5133656973475552689, sort_array(collect_list(b), true)=[True, True, True, True, True], sort_array(collect_set(b), true)=[True], count(c)=5, count(c)=5), Row(a=-4426181692283497353, sort_array(collect_list(b), true)=[False, True, True, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=-3917032101531217289, sort_array(collect_list(b), true)=[True, True, True, True, True], sort_array(collect_set(b), true)=[True], count(c)=5, count(c)=5), Row(a=-3502159106106506455, sort_array(collect_list(b), true)=[True, True, True, True, True], sort_array(collect_set(b), true)=[True], count(c)=5, count(c)=5), Row(a=-2697073954890740236, sort_array(collect_list(b), true)=[False, True, True, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=-2123199122092230623, sort_array(collect_list(b), true)=[False, False, True, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=-1, sort_array(collect_list(b), true)=[True, True, True, True, True], sort_array(collect_set(b), true)=[True], count(c)=5, count(c)=5), Row(a=207981845540287738, sort_array(collect_list(b), true)=[False, False, True, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=393905103838704542, sort_array(collect_list(b), true)=[False, False, False, False, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=875130347651831881, sort_array(collect_list(b), true)=[False, False, False, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=4751953708995107450, sort_array(collect_list(b), true)=[False, False, False, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=6084712057446794809, sort_array(collect_list(b), true)=[True, True, True, True, True], sort_array(collect_set(b), true)=[True], count(c)=5, count(c)=5), Row(a=7198729688045931692, sort_array(collect_list(b), true)=[True, True, True, True, True], sort_array(collect_set(b), true)=[True], count(c)=5, count(c)=5), Row(a=7528354001793048440, sort_array(collect_list(b), true)=[False, False, False, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=7618709293599214015, sort_array(collect_list(b), true)=[True, True, True, True, True], sort_array(collect_set(b), true)=[True], count(c)=5, count(c)=5), Row(a=7984374766242566542, sort_array(collect_list(b), true)=[False, False, True, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=5, count(c)=5), Row(a=9223372036854775807, sort_array(collect_list(b), true)=[False, False, False, False, True, True, True, True, True, True], sort_array(collect_set(b), true)=[False, True], count(c)=10, count(c)=10)]

Steps/Code to reproduce bug
Not always reproducible, just saw one fail right now

@pxLi pxLi added bug Something isn't working ? - Needs Triage Need team to review and classify test Only impacts tests labels Nov 18, 2022
@pxLi
Copy link
Collaborator Author

pxLi commented Nov 18, 2022

@pxLi pxLi changed the title [BUG] CPU mismatch GPU result in test_hash_groupby_collect_with_single_distinct [BUG] CPU mismatch GPU result in test_hash_groupby_collect_with_single_distinct intermittently Nov 18, 2022
@firestarman
Copy link
Collaborator

firestarman commented Nov 18, 2022

I also repro this locally on Spark 313 and 322, the test failed in both two runs.

@jlowe jlowe self-assigned this Nov 18, 2022
@sameerz
Copy link
Collaborator

sameerz commented Nov 21, 2022

@jlowe is this a duplicate of issue #7092?

@jlowe jlowe added duplicate This issue or pull request already exists and removed ? - Needs Triage Need team to review and classify labels Nov 21, 2022
@jlowe
Copy link
Contributor

jlowe commented Nov 21, 2022

Yes, this is a duplicate, sorting a list of booleans.

@jlowe jlowe closed this as completed Nov 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working duplicate This issue or pull request already exists test Only impacts tests
Projects
None yet
Development

No branches or pull requests

4 participants