New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Ballista: Implement map-side shuffle #543

Merged

andygrove merged 5 commits into apache:master from andygrove:shuffle-write-many

Jun 26, 2021

Member

andygrove commented Jun 11, 2021

Which issue does this PR close?

Closes #456

Rationale for this change

Another step towards implementing full shuffle support.

What changes are included in this PR?

Are there any user-facing changes?

The result meta-data from executing a query stage now has an additional column with a partition number.

andygrove added 3 commits

June 11, 2021 13:51


          Rough out shuffle writer for multiple partitions

0ffc108


          save

319405c


          save

12f2f5e

codecov-commenter commented Jun 11, 2021 •

edited

Loading

Codecov Report

Merging #543 (eb2d673) into master (63e3045) will increase coverage by 0.08%.
The diff coverage is 90.90%.

@@            Coverage Diff             @@
##           master     apache/arrow-datafusion#543      +/-   ##
==========================================
+ Coverage   76.08%   76.17%   +0.08%     
==========================================
  Files         156      156              
  Lines       27035    27174     +139     
==========================================
+ Hits        20570    20699     +129     
- Misses       6465     6475      +10

Impacted Files	Coverage Δ
...lista/rust/core/src/execution_plans/query_stage.rs	`85.13% <90.78%> (+9.34%)`	⬆️
ballista/rust/core/src/serde/scheduler/mod.rs	`60.71% <100.00%> (+1.78%)`	⬆️
datafusion/src/physical_plan/mod.rs	`79.09% <100.00%> (+0.38%)`	⬆️
datafusion/src/physical_plan/planner.rs	`77.53% <0.00%> (-2.66%)`	⬇️
ballista/rust/core/src/utils.rs	`25.53% <0.00%> (-2.06%)`	⬇️
...ista/rust/core/src/serde/physical_plan/to_proto.rs	`49.38% <0.00%> (-0.93%)`	⬇️
datafusion/src/physical_plan/hash_join.rs	`84.89% <0.00%> (-0.63%)`	⬇️
datafusion/src/physical_plan/expressions/case.rs	`75.00% <0.00%> (-0.57%)`	⬇️
datafusion/src/execution/context.rs	`92.00% <0.00%> (-0.09%)`	⬇️
ballista/rust/client/src/context.rs	`0.00% <0.00%> (ø)`
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 63e3045...eb2d673. Read the comment docs.


          clippy

3dedcc5

andygrove marked this pull request as ready for review

June 11, 2021 22:27

Member Author

andygrove commented Jun 11, 2021

andygrove requested review from alamb, Dandandan and jorgecarleitao

June 11, 2021 22:27

Dandandan reviewed

View reviewed changes

ballista/rust/core/src/execution_plans/query_stage.rs

+                                              .into_array(input_batch.num_rows()))
+                                      })
+                                      .collect::<Result<Vec<_>>>()?;
+                                  hashes_buf.clear();

Contributor

Dandandan Jun 13, 2021

Maybe we could reuse the code better at some moment?

Dandandan approved these changes

View reviewed changes

Contributor

Dandandan left a comment

LGTM

houqp reviewed

View reviewed changes

ballista/rust/core/src/execution_plans/query_stage.rs Outdated Show resolved Hide resolved


          Update ballista/rust/core/src/execution_plans/query_stage.rs

eb2d673

Co-authored-by: QP Hou <[email protected]>

edrevo reviewed

View reviewed changes

ballista/rust/core/src/execution_plans/query_stage.rs

+                                      })
+                                      .collect::<Result<Vec<_>>>()?;
+                                  hashes_buf.clear();
+                                  hashes_buf.resize(arrays[0].len(), 0);

Contributor

edrevo Jun 14, 2021

noob question: is there a guarantee that all recordbatches have at least one element?

Contributor

Dandandan Jun 14, 2021

There needs to be at least one column based on the expressions in hash repartitioning - which I think should be a prerequisite when doing hash repartitioning - I am not sure whether DataFusion checks on that explicitly when constructing it.

Contributor

alamb Jun 14, 2021

Yes, I believe so: https://github.com/apache/arrow-rs/blob/master/arrow/src/record_batch.rs#L114-L118

ballista/rust/core/src/execution_plans/query_stage.rs

-                              Err(DataFusionError::NotImplemented(
-                                  "Shuffle partitioning not implemented yet".to_owned(),
-                              ))
+                          Some(Partitioning::Hash(exprs, n)) => {

Contributor

edrevo Jun 14, 2021 •

edited

Loading

just thinking out loud without any data to back me up, but maybe it is worth special-casing when n==1, so we don't actually perform the hash of everything, since all of the data is going to end up in the same partition anyway.

Member Author

andygrove Jun 26, 2021

That makes sense. I filed https://github.com/apache/arrow-datafusion/issues/626 for this. I'd like to get the basic end-to-end shuffle mechanism working before we start optimizing too much.

ballista/rust/core/src/execution_plans/query_stage.rs

+                              // we won't necessary produce output for every possible partition, so we
+                              // create writers on demand
+                              let mut writers: Vec<Option<Arc<Mutex<ShuffleWriter>>>> = vec![];

Contributor

edrevo Jun 14, 2021

Looks like Arc + Mutex is unnecessary if you use .iter_mut() when necessary

Member Author

andygrove Jun 26, 2021

I tried changing this but ran into ownership issues. I'll go ahead and merge and perhaps someone can help me with fixing this as a follow up PR.

andygrove added the ballista label

andygrove mentioned this pull request

Implement fast path for QueryStageExec when writing 1 shuffle partition apache/datafusion-ballista#21

Open

andygrove merged commit 61199b9 into apache:master

andygrove deleted the shuffle-write-many branch

June 26, 2021 16:05

edrevo mentioned this pull request

Remove unnecessary mutex #639

Merged

houqp added api change enhancement labels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

edrevo edrevo left review comments

houqp houqp left review comments

Dandandan Dandandan approved these changes

alamb Awaiting requested review from alamb

jorgecarleitao Awaiting requested review from jorgecarleitao

Labels

api change enhancement