Ballista shuffle is finally working as intended, providing scalable distributed joins #750

andygrove · 2021-07-18T20:25:10Z

Which issue does this PR close?

Builds on #738 Closes #707.

With this PR we finally have scalable distributed joins.

Query 12 performance at SF=100

Executor concurrent tasks	Time (ms)
2	32592.94
4	17865.03
8	11641.62
16	9296.60

+------------+-----------------+----------------+
| l_shipmode | high_line_count | low_line_count |
+------------+-----------------+----------------+
| MAIL       | 623097          | 934694         |
| SHIP       | 622959          | 934510         |
+------------+-----------------+----------------+
Query 12 avg time: 9896.37 ms

Integration tests pass.

Rationale for this change

This is making Ballista work as it was intended to work.

What changes are included in this PR?

Tons of bug fixes around shuffles.

Are there any user-facing changes?

No

andygrove · 2021-07-20T04:19:02Z

@houqp @Dandandan @edrevo @alamb @jorgecarleitao Ballista is finally working with scalable distributed joins, at least it is for TPC-H. I plan on following up with some further smaller code cleanup PRs now that the functionality is working.

alamb

I reviewed the code -- while I am not a ballista expert it seems reasonable to me.

One thing I did notice was that there don't appear to be any new / updated tests in this PR.

alamb · 2021-07-20T12:03:57Z

ballista/rust/core/src/execution_plans/unresolved_shuffle.rs

@@ -69,7 +78,8 @@ impl ExecutionPlan for UnresolvedShuffleExec {
    }

    fn output_partitioning(&self) -> Partitioning {
-        Partitioning::UnknownPartitioning(self.partition_count)
+        //TODO the output partition is known and should be populated here!


is this something that you want to finish up in this PR?

I've filed https://github.com/apache/arrow-datafusion/issues/758 as a follow-up for implementing this since it involves more serde work.

andygrove · 2021-07-20T13:48:48Z

One thing I did notice was that there don't appear to be any new / updated tests in this PR.

I've added an additional test to check that TPC-H query 12 gets planned with correct partitioning information in the shuffle readers.

alamb · 2021-07-21T11:19:33Z

🎉

* feat: Optimze CreateNamedStruct preserve dictionaries Instead of serializing the return data_type we just serialize the field names. The original implmentation was done as it lead to slightly simpler implementation, but it clear from apache#750 that this was the wrong choice and leads to issues with the physical data_type. * Support dictionary data_types in StructVector and MapVector * Add length checks

andygrove added 5 commits July 17, 2021 09:03

CompletedTask now includes meta-data for shuffle output partitions

85a1401

Bug fix

bb64c62

Save

45a4aa7

save

6a9476a

Ballista shuffle mechanism now works as intended

c775d2f

github-actions bot added the ballista label Jul 18, 2021

andygrove added 3 commits July 18, 2021 14:55

code cleanup

a6657f2

merge from master

c460784

integration tests pass

274d092

andygrove marked this pull request as ready for review July 20, 2021 04:15

andygrove requested review from Dandandan, alamb and jorgecarleitao July 20, 2021 04:15

error handling

bf8e087

alamb approved these changes Jul 20, 2021

View reviewed changes

Additional test

94c5dc6

andygrove mentioned this pull request May 19, 2022

Ballista: UnresolvedShuffleExec and ShuffleReaderExec should show correct partitioning scheme apache/datafusion-ballista#16

Closed

add links to follow on issue

1696e7e

andygrove merged commit ed5746d into apache:master Jul 21, 2021

andygrove deleted the ballista-shuffle-working branch July 21, 2021 00:25

houqp added the enhancement New feature or request label Jul 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ballista shuffle is finally working as intended, providing scalable distributed joins #750

Ballista shuffle is finally working as intended, providing scalable distributed joins #750

andygrove commented Jul 18, 2021 •

edited

Loading

andygrove commented Jul 20, 2021

alamb left a comment

alamb Jul 20, 2021

andygrove Jul 20, 2021

andygrove commented Jul 20, 2021

alamb commented Jul 21, 2021

Ballista shuffle is finally working as intended, providing scalable distributed joins #750

Ballista shuffle is finally working as intended, providing scalable distributed joins #750

Conversation

andygrove commented Jul 18, 2021 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

andygrove commented Jul 20, 2021

alamb left a comment

Choose a reason for hiding this comment

alamb Jul 20, 2021

Choose a reason for hiding this comment

andygrove Jul 20, 2021

Choose a reason for hiding this comment

andygrove commented Jul 20, 2021

alamb commented Jul 21, 2021

andygrove commented Jul 18, 2021 •

edited

Loading