-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change return type of 'DataFrame.collect()' #442
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR appears to change the signature of the free function datafusion::physical_plan::collect
-- I think the definition of DataFrame::collect
is here:
https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/dataframe.rs#L221
@andygrove is this what you had in mind in #47?
The change in this PR looks like a good start and probably necessary, but yes, the goal is to update DataFrame::collect |
@alamb @andygrove thanks for review. Will fix soon 🙏 As far as I review from now, seems it needs to fix return format of Please let me know if there's any misunderstanding 🙇 |
@djKooks do you plan to keep working on this PR? |
@alamb sure~
=> Am I thinking correctly? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @djKooks
#47
I think is referring to DataFrame::collect()
, https://docs.rs/datafusion/4.0.0/datafusion/dataframe/trait.DataFrame.html#tymethod.collect not physical_plan::collect()
which is what this PR currently does
@andygrove can you please weigh in here on the desired API?
#47 says collect()
, however, collect()
elsewhere in the code returns Vec<RecordBatch>
Perhaps would it make sense to add a new function like DataFrame::execute()
that returned a SendableRecordBatch
stream and leave collect()
the way it is?
`
Sorry for the delayed response. I have been working on related areas and plan to look at this later today or tomorrow. |
I have created #789 to implement streaming versions of the collect methods. |
@andygrove okay. Will close this |
Thanks for prodding this along @djKooks |
Which issue does this PR close?
It is for following issue #47
What changes are included in this PR?
Change return type of 'DataFrame.collect()' to 'SendableRecordBatchStream'.
Currently this is draft change, so please let me know if this change is not the one you've intended 🙏