You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@razajafri found this in one of his PRs, the results from the CPU and GPU do not match for test_no_fallback_when_ansi_enabled, which I added here #3597:
I am not entirely sure how this is happening, after a coalesce(1) and orderBy(every column)
But the orderBy is not the last thing in the query?
df = gen_df(spark, [('a', data_gen), ('b', data_gen)], length=100)
# coalescing because of first/last are not deterministic
df = df.coalesce(1).orderBy("a", "b")
return df.groupBy('a').agg(f.first("b"), f.last("b"), f.min("b"), f.max("b"))
What's preventing a Spark implementation from hash-aggregating the grouping and therefore the resulting order of the output rows being non-deterministic because it's dumping hash table contents?
@razajafri found this in one of his PRs, the results from the CPU and GPU do not match for
test_no_fallback_when_ansi_enabled
, which I added here #3597:First few rows from the CPU:
First few rows from the GPU:
Last row on the GPU:
I am not entirely sure how this is happening, after a
coalesce(1)
andorderBy(every column)
.The text was updated successfully, but these errors were encountered: