Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Ballista examples #775

Merged
merged 8 commits into from
Jul 27, 2021
Merged

Conversation

andygrove
Copy link
Member

Which issue does this PR close?

Closes #774 .

Rationale for this change

Examples help new users get started.

What changes are included in this PR?

  • There are now Ballista versions of the DataFusion DataFrame and SQL examples
  • The Ballista README provides an extremely brief overview of how Ballista works. This will need much more work but it is a start.

Are there any user-facing changes?

No

Copy link
Member

@houqp houqp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the scheduler overview section is very clear 👍


// execute the query - note that calling collect on the DataFrame
// trait will execute the query with DataFusion so we have to call
// collect on the BallistaContext instead and pass it the DataFusion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- documentation for the win!

edition = "2018"
publish = false


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it is worth adding bin targets here?

As it is I can't run these examples:

(arrow_dev) alamb@MacBook-Pro:~/Software/arrow-datafusion/ballista-examples$ cargo run 
error: a bin target must be available for `cargo run`

Maybe something like

[[bin]]
name = "dataframe"
path = "src/ballista_dataframe.rs"

Copy link
Member Author

@andygrove andygrove Jul 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following the same pattern that we use in datafusion-examples where we use cargo run --example rather than cargo run --bin.

% cargo run --example
error: "--example" takes one argument.
Available examples:
    ballista-dataframe
    ballista-sql

It is a little odd that we package the examples in their own crate, so maybe packaging them as binaries makes more sense now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main motivation from extracting them into a separate folder/crate for datafusion-examples was to reduce the nr of dependencies and compilation time.
Maybe bin works just as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated this to use --bin now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

ballista/README.md Outdated Show resolved Hide resolved
@andygrove andygrove merged commit c74136d into apache:master Jul 27, 2021
@andygrove andygrove deleted the ballista-examples-docs branch July 27, 2021 18:31
@houqp houqp added the documentation Improvements or additions to documentation label Jul 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Ballista examples
4 participants