Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] test_hash_groupby_approx_percentile_long_repeated_keys failed with TypeError #3703

Closed
jlowe opened this issue Sep 29, 2021 · 0 comments · Fixed by #3770
Closed

[BUG] test_hash_groupby_approx_percentile_long_repeated_keys failed with TypeError #3703

jlowe opened this issue Sep 29, 2021 · 0 comments · Fixed by #3770
Assignees
Labels
bug Something isn't working P0 Must have for release

Comments

@jlowe
Copy link
Contributor

jlowe commented Sep 29, 2021

test_hash_groupby_approx_percentile_long_repeated_keys failed during a Databricks premerge CI run with the following error:

[2021-09-29T00:36:22.861Z] =================================== FAILURES ===================================
[2021-09-29T00:36:22.861Z] ____________ test_hash_groupby_approx_percentile_long_repeated_keys ____________
[2021-09-29T00:36:22.861Z] [gw3] linux -- Python 3.7.10 /databricks/conda/envs/databricks-ml-gpu/bin/python
[2021-09-29T00:36:22.861Z] 
[2021-09-29T00:36:22.861Z]     @ignore_order(local=True)
[2021-09-29T00:36:22.861Z]     def test_hash_groupby_approx_percentile_long_repeated_keys():
[2021-09-29T00:36:22.861Z]         compare_percentile_approx(
[2021-09-29T00:36:22.861Z]             lambda spark: gen_df(spark, [('k', RepeatSeqGen(LongGen(), length=20)),
[2021-09-29T00:36:22.861Z]                                          ('v', LongRangeGen())], length=100),
[2021-09-29T00:36:22.862Z] >           [0.05, 0.25, 0.5, 0.75, 0.95])
[2021-09-29T00:36:22.862Z] 
[2021-09-29T00:36:22.862Z] ../../src/main/python/hash_aggregate_test.py:1084: 
[2021-09-29T00:36:22.862Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2021-09-29T00:36:22.862Z] 
[2021-09-29T00:36:22.862Z] df_fun = <function test_hash_groupby_approx_percentile_long_repeated_keys.<locals>.<lambda> at 0x7f088f769f80>
[2021-09-29T00:36:22.862Z] percentiles = [0.05, 0.25, 0.5, 0.75, 0.95]
[2021-09-29T00:36:22.862Z] 
[2021-09-29T00:36:22.862Z]     def compare_percentile_approx(df_fun, percentiles):
[2021-09-29T00:36:22.862Z]     
[2021-09-29T00:36:22.862Z]         # create SQL statements for exact and approx percentiles
[2021-09-29T00:36:22.862Z]         p_exact_sql = create_percentile_sql("percentile", percentiles)
[2021-09-29T00:36:22.862Z]         p_approx_sql = create_percentile_sql("approx_percentile", percentiles)
[2021-09-29T00:36:22.862Z]     
[2021-09-29T00:36:22.862Z]         def run_exact(spark):
[2021-09-29T00:36:22.862Z]             df = df_fun(spark)
[2021-09-29T00:36:22.862Z]             df.createOrReplaceTempView("t")
[2021-09-29T00:36:22.862Z]             return spark.sql(p_exact_sql)
[2021-09-29T00:36:22.862Z]     
[2021-09-29T00:36:22.862Z]         def run_approx(spark):
[2021-09-29T00:36:22.862Z]             df = df_fun(spark)
[2021-09-29T00:36:22.862Z]             df.createOrReplaceTempView("t")
[2021-09-29T00:36:22.862Z]             return spark.sql(p_approx_sql)
[2021-09-29T00:36:22.862Z]     
[2021-09-29T00:36:22.862Z]         # run exact percentile on CPU
[2021-09-29T00:36:22.862Z]         exact = run_with_cpu(run_exact, 'COLLECT', _approx_percentile_conf)
[2021-09-29T00:36:22.862Z]     
[2021-09-29T00:36:22.862Z]         # run approx_percentile on CPU and GPU
[2021-09-29T00:36:22.862Z]         approx_cpu, approx_gpu = run_with_cpu_and_gpu(run_approx, 'COLLECT', _approx_percentile_conf)
[2021-09-29T00:36:22.862Z]     
[2021-09-29T00:36:22.862Z]         for result in zip(exact, approx_cpu, approx_gpu):
[2021-09-29T00:36:22.862Z]             # assert that keys match
[2021-09-29T00:36:22.862Z]             assert result[0]['k'] == result[1]['k']
[2021-09-29T00:36:22.862Z]             assert result[1]['k'] == result[2]['k']
[2021-09-29T00:36:22.862Z]     
[2021-09-29T00:36:22.862Z]             exact = result[0]['the_percentile']
[2021-09-29T00:36:22.862Z]             cpu = result[1]['the_percentile']
[2021-09-29T00:36:22.862Z]             gpu = result[2]['the_percentile']
[2021-09-29T00:36:22.862Z]     
[2021-09-29T00:36:22.862Z]             if exact is not None:
[2021-09-29T00:36:22.862Z]                 if isinstance(exact, list):
[2021-09-29T00:36:22.862Z] >                   for x in zip(exact, cpu, gpu):
[2021-09-29T00:36:22.862Z] E                   TypeError: zip argument #3 must support iteration
[2021-09-29T00:36:22.862Z] 
[2021-09-29T00:36:22.862Z] ../../src/main/python/hash_aggregate_test.py:1151: TypeError
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P0 Must have for release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants