Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Mark cast from float/double to decimal as incompatible #1372

Merged
merged 9 commits into from
Feb 7, 2025

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented Feb 6, 2025

Which issue does this PR close?

Closes #1354

Follow on issue: #1371

Rationale for this change

Fix a correctness issue

What changes are included in this PR?

  • Add test to demonstrate the bug
  • Mark cast from float/double to decimal as incompatible

How are these changes tested?

New test + existing tests

@andygrove
Copy link
Member Author

I don't understand the following test failure:

2025-02-06T17:32:38.2264489Z - final decimal avg *** FAILED *** (17 milliseconds)
2025-02-06T17:32:38.2265038Z   org.apache.spark.sql.AnalysisException: Inserting into an RDD-based table is not allowed.;
2025-02-06T17:32:38.2265561Z 'InsertIntoStatement Repartition 1, false, false, false
2025-02-06T17:32:38.2265884Z +- LocalRelation [col1#340151, col2#340152]

Comment on lines -873 to +892
val table = "t1"
val table = s"final_decimal_avg_$dictionaryEnabled"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes ended up not being entirely necessary, but the test did have a mix of hard-coded t1 and use of the variable $tableName references in SQL, and I made these consistent.

@codecov-commenter
Copy link

codecov-commenter commented Feb 7, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 39.17%. Comparing base (f09f8af) to head (880328b).
Report is 20 commits behind head on main.

Additional details and impacted files
@@              Coverage Diff              @@
##               main    #1372       +/-   ##
=============================================
- Coverage     56.12%   39.17%   -16.96%     
- Complexity      976     2065     +1089     
=============================================
  Files           119      262      +143     
  Lines         11743    60327    +48584     
  Branches       2251    12836    +10585     
=============================================
+ Hits           6591    23631    +17040     
- Misses         4012    32223    +28211     
- Partials       1140     4473     +3333     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@andygrove andygrove marked this pull request as ready for review February 7, 2025 02:35
withTable(tableName) {
val table = spark.read.parquet(filename).coalesce(1)
table.createOrReplaceTempView(tableName)
checkSparkAnswer(s"SELECT c1, avg(c7) FROM $tableName GROUP BY c1 ORDER BY c1")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding // https://github.com/apache/datafusion-comet/issues/1371 and mention to use checkSparkAnswerAndNumOfAggregates once resolved?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@andygrove andygrove merged commit 26b8d57 into apache:main Feb 7, 2025
74 checks passed
@andygrove andygrove deleted the avg-decimal-fix branch February 7, 2025 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Comet can produce different results to Spark when averaging a decimal
3 participants