Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any reason why logical plan is optimized twice? #705

Closed
aprimadi opened this issue Jul 10, 2021 · 6 comments
Closed

Any reason why logical plan is optimized twice? #705

aprimadi opened this issue Jul 10, 2021 · 6 comments
Labels
question Further information is requested

Comments

@aprimadi
Copy link
Contributor

Hello, I am just curious why the logical plan is being optimized twice.

First, it was optimized here:
https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/execution/context.rs#L205

Then it was optimized again when collecting result
https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/execution/dataframe_impl.rs#L147

@houqp
Copy link
Member

houqp commented Jul 11, 2021

probably an oversight? the optimize in sql method does look like redundant.

@Dandandan
Copy link
Contributor

I think it's not an oversight: collect is not necessarily called when executing queries, for example when writing the results to disk. As collect loads the result into memory in one thread/partition, this is not always what you want to do.
We might do something to avoid optimizing twice when calling both functions, but maybe it's not a big deal for now?

@houqp
Copy link
Member

houqp commented Jul 11, 2021

That's a good point. Perhaps the better structure would be to push optimize into query evaluation/materialization methods like save and collect. This way, we can make sure the engine always execute with an optimized plan regardless whether it's built from sql or plan builder.

@alamb
Copy link
Contributor

alamb commented Jul 16, 2021

his way, we can make sure the engine always execute with an optimized plan regardless whether it's built from sql or plan builder.

I like that plan

@alamb
Copy link
Contributor

alamb commented Aug 4, 2021

I wonder if this ticket is tracking anything useful or if the question has been answered and we can close the ticket?

@alamb alamb added the question Further information is requested label Aug 4, 2021
@houqp
Copy link
Member

houqp commented Aug 5, 2021

i think we already got the answer to the question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants