Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Python bindings for BallistaContext #15

Closed
andygrove opened this issue Aug 11, 2021 · 12 comments
Closed

Implement Python bindings for BallistaContext #15

andygrove opened this issue Aug 11, 2021 · 12 comments
Labels
enhancement New feature or request help wanted Extra attention is needed python

Comments

@andygrove
Copy link
Member

andygrove commented Aug 11, 2021

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
We have Python bindings for DataFusion's ExecutionContext. It would be good to also support Ballista's BallistaContext so that we can use Python to run distributed queries.

Describe the solution you'd like
Probably something like this?

import ballista

ctx = ballista.BallistaContext
df = ctx.read_parquet(...)

Describe alternatives you've considered
Another approach might be to have ballista be an optional feature of DataFusion and then enable new methods on the DataFusion ExecutionContext instead but that would probably result in tons of additional dependencies and blur the lines between DataFusion and Ballista and I think there is a strong case for DataFusion=lib/embedded and Ballista=distributed.

Additional context
N/A

@andygrove andygrove added enhancement New feature or request ballista python help wanted Extra attention is needed labels Aug 11, 2021
@kination
Copy link
Contributor

@andygrove hope to work on this, if you don't have any other plan.

@andygrove
Copy link
Member Author

Thank you @djKooks that would be great

@kination
Copy link
Contributor

kination commented Aug 16, 2021

@andygrove
Would it be okay to put like following?

...
ballista/
   - rust/
   - ui/
   - python/     <- create binding here
datafusion/
datafusion-cli/
...

or should I update current datafusion python binding inside existing python/ directory?

@andygrove
Copy link
Member Author

Yes, I think that makes sense.

@andygrove
Copy link
Member Author

I would be interested to hear what others think though. @alamb @Dandandan @jorgecarleitao @houqp do you have an opinion on this?

@jorgecarleitao
Copy link
Member

I agree that this makes the most sense 👍

Out of curiosity, do the bindings come with the client and executors, or just the client?

@alamb
Copy link
Contributor

alamb commented Aug 16, 2021

I agree having a separate python binding for Ballista in the location suggested by @djKooks in #15 makes sense to me

@kination
Copy link
Contributor

@alamb @jorgecarleitao @andygrove thanks for suggestion 🙇

do the bindings come with the client and executors, or just the client?

I think it will be enough to do with client only in first step, but do you have any more suggestion?

@alamb
Copy link
Contributor

alamb commented Aug 29, 2021

I do not have any more to add here -- since I don't use the python bindings myself I don't have a lot to offer with specifics

@kination
Copy link
Contributor

@andygrove @alamb @jorgecarleitao thanks for comment.
Will start implementation in following branch apache/datafusion#988
(will request for review when ready 🙇 )

@andygrove andygrove transferred this issue from apache/datafusion May 19, 2022
@nl5887
Copy link
Contributor

nl5887 commented Jun 6, 2022

Related to #58

@andygrove
Copy link
Member Author

This has now been implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed python
Projects
None yet
Development

No branches or pull requests

5 participants