Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add write_ipc to ExecutionContext #1893

Closed
wants to merge 2 commits into from

Conversation

matthewmturner
Copy link
Contributor

@matthewmturner matthewmturner commented Feb 28, 2022

Which issue does this PR close?

Closes #1777 task 3

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label Feb 28, 2022
@matthewmturner
Copy link
Contributor Author

I'm wondering if IpcWriteOptions should implement clone and/or copy like parquet's WriterProperties

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably include some tests as well

let mut tasks = vec![];
for i in 0..plan.output_partitioning().partition_count() {
let plan = plan.clone();
let filename = format!("part-{}.parquet", i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably needs to be named something other than .parquet

Ok(()) => {
let mut tasks = vec![];
for i in 0..plan.output_partitioning().partition_count() {
let plan = plan.clone();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does the plan need to be cloned?

@matthewmturner
Copy link
Contributor Author

@alamb sry i had basically copied this from write_parquet which was source of both your comments.

I dont think the clone on the plan was needed, i removed it in write_ipc, write_parquet, and write_csv. If you would rather i do those other functions in a separate PR i can do that.

I think IpcWriteOptions should be clone - so im going to submit PR for that in arrow.

let path = fs_path.join(&filename);
let file = fs::File::create(path)?;
let mut writer = match writer_properties {
Some(props) => FileWriter::try_new_with_options(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed you open a ticket apache/arrow-rs#1382 to solve compiling failure. 👍

I have another thought is that if we can let FileWriter only has a try_new method and deletes the try_new_with_options method, just like ArrowWriter? This can reduce API exposure to Arrow users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to confirm - you would then add the IpcWriteOptions as a new parameter to try_new, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Option<IpcWriteOptions>. Then we can xxx.unwrap_or_else(|| IpcWriteOptions::default()) to process it.

@alamb
Copy link
Contributor

alamb commented Mar 2, 2022

I think IpcWriteOptions should be clone - so im going to submit PR for that in arrow.

👍

@alamb
Copy link
Contributor

alamb commented Mar 21, 2022

#2048 has the arrow upgrade

@alamb
Copy link
Contributor

alamb commented Apr 15, 2022

Marking as a draft to make it clear this still has some work to do. Please mark it ready for review when it is :)

@alamb alamb marked this pull request as draft April 15, 2022 15:32
@andygrove andygrove removed the datafusion Changes in the datafusion crate label Jun 3, 2022
@alamb
Copy link
Contributor

alamb commented Jan 14, 2023

This PR is more than 6 month old, so closing it down for now to clean up the PR list. Please reopen if this is a mistake and you plan to work on it more

@alamb alamb closed this Jan 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[EPIC] Improve DataFusions ability to work with files
4 participants