Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Integration tests failing on Databricks-11.3 due to mixing of aggregations in HashAggregateExec and SortAggregateExec #7345

Closed
5 tasks done
Tracked by #7325
nartal1 opened this issue Dec 13, 2022 · 0 comments · Fixed by #7385
Assignees
Labels
bug Something isn't working

Comments

@nartal1
Copy link
Collaborator

nartal1 commented Dec 13, 2022

Describe the bug
Below tests are failing because there seems to be cpu aggregations(which we don't support) along with gpu aggregations. Needs more investigation.

  • test_hash_groupby_collect_partial_replace_with_distinct_fallback
  • test_groupby_std_variance_partial_replace_fallback
  • test_hash_avg_nulls_partial_only
  • test_groupby_std_variance
  • test_groupby_std_variance_nulls

Reason for failure(Exec name changes but the reason is same) -

!Exec <SortAggregateExec> cannot run on GPU because mixing CPU and GPU aggregations is not supported. The data type of following expressions will be converted in GPU runtime: buf#216: Converted BinaryType to ArrayType(IntegerType,false); buf#218: Converted BinaryType to ArrayType(IntegerType,false)
!Exec <HashAggregateExec> cannot run on GPU because mixing CPU and GPU aggregations is not supported

Steps/Code to reproduce bug
Build and run integration tests on Databricks-11.3 environment.
Build steps - Pull in this PR locally if not merged - #7152
/home/ubuntu/spark-rapids$ ./jenkins/databricks/build.sh

Test steps: Modify the test.sh under jenkins/databrikcs to run a specific test.
Example: Comment all lines starting from export TEST_PARALLEL=${TEST_PARALLEL:-4} in the file and add below lines:

export TEST_PARALLEL=${TEST_PARALLEL:-1}
TEST_NAME=$1
bash /home/ubuntu/spark-rapids/integration_tests/run_pyspark_from_build.sh -k $TEST_NAME

Run Test - ./jenkins/databricks/test.sh test_hash_avg_nulls_partial_only

Expected behavior
All above tests should pass on Databricks-11.3 shim

Environment details (please complete the following information)
Databricks-11.3 runtime.

Additional context
Add any other context about the problem here.

@nartal1 nartal1 added bug Something isn't working ? - Needs Triage Need team to review and classify labels Dec 13, 2022
@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Dec 13, 2022
@gerashegalov gerashegalov self-assigned this Dec 15, 2022
gerashegalov added a commit to gerashegalov/spark-rapids that referenced this issue Dec 16, 2022
Compare minor components only as a tie breaker if major components are
equal

Contributes to NVIDIA#7345

Fixes xfail for is_databricks_<ver>_or_later

Signed-off-by: Gera Shegalov <[email protected]>
gerashegalov added a commit to gerashegalov/spark-rapids that referenced this issue Dec 16, 2022
Compare minor components only as a tie breaker if major components are
equal

Contributes to NVIDIA#7345

Fixes xfail for is_databricks_<ver>_or_later

Signed-off-by: Gera Shegalov <[email protected]>
gerashegalov added a commit that referenced this issue Dec 17, 2022
Compare minor components only as a tie breaker if major components are equal

Fixes #7345

Fixes xfail for is_databricks104_or_later on 11.3

Signed-off-by: Gera Shegalov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants