Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ballista: Prep for fixing shuffle mechansim, part 1 #738

Merged
merged 4 commits into from
Jul 19, 2021

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented Jul 17, 2021

Which issue does this PR close?

Closes #737.

Query 12 iteration 0 took 29591.0 ms
+------------+-----------------+----------------+
| l_shipmode | high_line_count | low_line_count |
+------------+-----------------+----------------+
| MAIL       | 623092          | 934696         |
| SHIP       | 622964          | 934514         |
+------------+-----------------+----------------+
Query 12 avg time: 29591.04 ms

Rationale for this change

This fixes a bug introduced in #712 and also gets us closer to truly supporting shuffle.

What changes are included in this PR?

  • Executors now collect meta-data from ShuffleWriterExec about the output partitions and return this information to the scheduler
  • Comments are added where the current design is broken, with links to the relevant GitHub issue
  • Added inputRows and outputRows metrics to ShuffleWriterExec

Are there any user-facing changes?

No

@andygrove andygrove marked this pull request as draft July 17, 2021 16:20
@andygrove
Copy link
Member Author

If there are no objections I will go ahead and merge this tonight since it only touches Ballista files.

@andygrove andygrove merged commit e4df37a into apache:master Jul 19, 2021
@andygrove andygrove deleted the ballista-shuffle-prep-1 branch July 19, 2021 23:58
@alamb
Copy link
Contributor

alamb commented Jul 20, 2021

Looks good @andygrove -- I am sorry I have been focusing on arrow and DataFusion and don't have much experience with Ballista -- are there others in the Ballista community we can ask for feedback in the future so these PRs don't hang out for too long?

@andygrove
Copy link
Member Author

@alamb No problem at all. Ballista is still very early and experimental (although now it is actually close to usable) so I think it has been hard for others to start contributing. My plan now is to focus on performance and scalability testing and optimizations and generally tidy things up and add documentation to make it easier for others to get involved. I think once we can demonstrate the value of Ballista then it will start to get some adoption and this will drive contributions.

@houqp houqp added the bug Something isn't working label Jul 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ballista queries (and integration tests) returning incorrect results
3 participants