-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] improve Spark Connect compatibility for types and count behavior #3352
base: main
Are you sure you want to change the base?
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
45be7ee
to
1c7d6f5
Compare
b9ae4a6
to
5e4207e
Compare
1c7d6f5
to
0682b6d
Compare
5e4207e
to
5d4ee97
Compare
0682b6d
to
f1a8779
Compare
5d4ee97
to
3aad182
Compare
f1a8779
to
9b6314a
Compare
3aad182
to
8cd271d
Compare
9b6314a
to
cf9aff6
Compare
8cd271d
to
684dc1e
Compare
cf9aff6
to
9b3bf4e
Compare
684dc1e
to
6ce1244
Compare
9b3bf4e
to
12087cb
Compare
6ce1244
to
0a9d925
Compare
12087cb
to
f9d9dee
Compare
0a9d925
to
aa13751
Compare
f9d9dee
to
074f460
Compare
aa13751
to
4159ac9
Compare
074f460
to
7d970c1
Compare
4159ac9
to
29ba8d3
Compare
7d970c1
to
62c976f
Compare
29ba8d3
to
f0d600f
Compare
62c976f
to
3402dc6
Compare
f0d600f
to
7d2e33a
Compare
3402dc6
to
88edc4a
Compare
7d2e33a
to
d9c5542
Compare
57a2dd0
to
72aa46c
Compare
CodSpeed Performance ReportMerging #3352 will degrade performances by 24.95%Comparing Summary
Benchmarks breakdown
|
Graphite Automations"Notify author when CI fails" took an action on this PR • (11/21/24)1 teammate was notified to this PR based on Andrew Gazelka's automation. |
72aa46c
to
815fcda
Compare
ab26cbc
to
b625819
Compare
b625819
to
37418b0
Compare
37418b0
to
84937d7
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3352 +/- ##
==========================================
+ Coverage 77.35% 77.44% +0.08%
==========================================
Files 684 684
Lines 83637 83680 +43
==========================================
+ Hits 64694 64802 +108
+ Misses 18943 18878 -65
|
84937d7
to
ae31759
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we need to do a physical cast on the tables. Shouldn't this all be handled during planning, not execution?
Iterating over the materialized tables like this is extremely expensive and inefficient.
@universalmind303 how would I do casting on a |
This commit enhances Spark Connect compatibility in two key areas:
Type Compatibility:
to_spark_compatible_datatype
for proper type castingCount Behavior:
count(lit(1))
to match Spark's behavior1
This addresses compatibility issues when running Daft through Spark Connect,
particularly for unsigned integer types which aren't supported in Spark's type
system and count aggregations that need to match Spark's semantics.
Related: #3421