[BUG] test_hash_groupby_approx_percentile_long_repeated_keys failed with TypeError #3703

jlowe · 2021-09-29T16:17:54Z

test_hash_groupby_approx_percentile_long_repeated_keys failed during a Databricks premerge CI run with the following error:

[2021-09-29T00:36:22.861Z] =================================== FAILURES ===================================
[2021-09-29T00:36:22.861Z] ____________ test_hash_groupby_approx_percentile_long_repeated_keys ____________
[2021-09-29T00:36:22.861Z] [gw3] linux -- Python 3.7.10 /databricks/conda/envs/databricks-ml-gpu/bin/python
[2021-09-29T00:36:22.861Z] 
[2021-09-29T00:36:22.861Z]     @ignore_order(local=True)
[2021-09-29T00:36:22.861Z]     def test_hash_groupby_approx_percentile_long_repeated_keys():
[2021-09-29T00:36:22.861Z]         compare_percentile_approx(
[2021-09-29T00:36:22.861Z]             lambda spark: gen_df(spark, [('k', RepeatSeqGen(LongGen(), length=20)),
[2021-09-29T00:36:22.861Z]                                          ('v', LongRangeGen())], length=100),
[2021-09-29T00:36:22.862Z] >           [0.05, 0.25, 0.5, 0.75, 0.95])
[2021-09-29T00:36:22.862Z] 
[2021-09-29T00:36:22.862Z] ../../src/main/python/hash_aggregate_test.py:1084: 
[2021-09-29T00:36:22.862Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2021-09-29T00:36:22.862Z] 
[2021-09-29T00:36:22.862Z] df_fun = <function test_hash_groupby_approx_percentile_long_repeated_keys.<locals>.<lambda> at 0x7f088f769f80>
[2021-09-29T00:36:22.862Z] percentiles = [0.05, 0.25, 0.5, 0.75, 0.95]
[2021-09-29T00:36:22.862Z] 
[2021-09-29T00:36:22.862Z]     def compare_percentile_approx(df_fun, percentiles):
[2021-09-29T00:36:22.862Z]     
[2021-09-29T00:36:22.862Z]         # create SQL statements for exact and approx percentiles
[2021-09-29T00:36:22.862Z]         p_exact_sql = create_percentile_sql("percentile", percentiles)
[2021-09-29T00:36:22.862Z]         p_approx_sql = create_percentile_sql("approx_percentile", percentiles)
[2021-09-29T00:36:22.862Z]     
[2021-09-29T00:36:22.862Z]         def run_exact(spark):
[2021-09-29T00:36:22.862Z]             df = df_fun(spark)
[2021-09-29T00:36:22.862Z]             df.createOrReplaceTempView("t")
[2021-09-29T00:36:22.862Z]             return spark.sql(p_exact_sql)
[2021-09-29T00:36:22.862Z]     
[2021-09-29T00:36:22.862Z]         def run_approx(spark):
[2021-09-29T00:36:22.862Z]             df = df_fun(spark)
[2021-09-29T00:36:22.862Z]             df.createOrReplaceTempView("t")
[2021-09-29T00:36:22.862Z]             return spark.sql(p_approx_sql)
[2021-09-29T00:36:22.862Z]     
[2021-09-29T00:36:22.862Z]         # run exact percentile on CPU
[2021-09-29T00:36:22.862Z]         exact = run_with_cpu(run_exact, 'COLLECT', _approx_percentile_conf)
[2021-09-29T00:36:22.862Z]     
[2021-09-29T00:36:22.862Z]         # run approx_percentile on CPU and GPU
[2021-09-29T00:36:22.862Z]         approx_cpu, approx_gpu = run_with_cpu_and_gpu(run_approx, 'COLLECT', _approx_percentile_conf)
[2021-09-29T00:36:22.862Z]     
[2021-09-29T00:36:22.862Z]         for result in zip(exact, approx_cpu, approx_gpu):
[2021-09-29T00:36:22.862Z]             # assert that keys match
[2021-09-29T00:36:22.862Z]             assert result[0]['k'] == result[1]['k']
[2021-09-29T00:36:22.862Z]             assert result[1]['k'] == result[2]['k']
[2021-09-29T00:36:22.862Z]     
[2021-09-29T00:36:22.862Z]             exact = result[0]['the_percentile']
[2021-09-29T00:36:22.862Z]             cpu = result[1]['the_percentile']
[2021-09-29T00:36:22.862Z]             gpu = result[2]['the_percentile']
[2021-09-29T00:36:22.862Z]     
[2021-09-29T00:36:22.862Z]             if exact is not None:
[2021-09-29T00:36:22.862Z]                 if isinstance(exact, list):
[2021-09-29T00:36:22.862Z] >                   for x in zip(exact, cpu, gpu):
[2021-09-29T00:36:22.862Z] E                   TypeError: zip argument #3 must support iteration
[2021-09-29T00:36:22.862Z] 
[2021-09-29T00:36:22.862Z] ../../src/main/python/hash_aggregate_test.py:1151: TypeError

The text was updated successfully, but these errors were encountered:

jlowe added bug Something isn't working ? - Needs Triage Need team to review and classify labels Sep 29, 2021

jlowe mentioned this issue Sep 29, 2021

Added support for Array[Struct] to GpuCreateArray [databricks] #3690

Merged

abellina mentioned this issue Sep 29, 2021

[BUG] test_hash_groupby_approx_percentile_long_repeated_keys could hang in integration tests intermittently #3692

Closed

sameerz added P0 Must have for release and removed ? - Needs Triage Need team to review and classify labels Oct 5, 2021

sameerz assigned andygrove Oct 5, 2021

sameerz mentioned this issue Oct 5, 2021

[BUG] YARN illegal memory access GpuOutOfCoreSortIterator. #3697

Closed

andygrove added this to the Oct 4 - Oct 15 milestone Oct 7, 2021

andygrove mentioned this issue Oct 8, 2021

Enable approx percentile tests #3770

Merged

sameerz modified the milestones: Oct 4 - Oct 15, Oct 18 - Oct 29 Oct 15, 2021

Salonijain27 modified the milestones: Oct 18 - Oct 29, Nov 1 - Nov 12 Oct 29, 2021

andygrove closed this as completed in #3770 Nov 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] test_hash_groupby_approx_percentile_long_repeated_keys failed with TypeError #3703

[BUG] test_hash_groupby_approx_percentile_long_repeated_keys failed with TypeError #3703

jlowe commented Sep 29, 2021

[BUG] test_hash_groupby_approx_percentile_long_repeated_keys failed with TypeError #3703

[BUG] test_hash_groupby_approx_percentile_long_repeated_keys failed with TypeError #3703

Comments

jlowe commented Sep 29, 2021