-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create ListingTableConfig which includes file format and schema inference #1715
Conversation
Just as FYI I havent had time this week to work on DataFusion PRs. I plan to pick up work on this early next week. |
@alamb @houqp would you mind checking this out before I go updating all the other tests and places where this is used? I confess i struggled a bit with the borrow checker here and there is likely more idiomatic ways to achieve the goal of this - but i think that it at least serves as a starting point. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @matthewmturner -- I think this is looking good
file_schema: SchemaRef, | ||
options: ListingOptions, | ||
) -> Self { | ||
pub async fn new(mut config: ListingTableConfig) -> Result<Self> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this API -- it would be cool if we could add some more documentation (either here or in ListingTableConfig
explaining that the options / schema are inferred if not explicitly created
Also, as this can now return an Err
perhaps renaming to pub async fn try_new(...)
would be a more idiomatic / clearer name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes was planning on adding docs once i was sure API was good. and will update to try_new
!
@alamb any idea why only one CI check is running? |
I think it is because the PR has a conflict |
Oh, duh, that makes sense. Sry. Will fix. |
@alamb do you know of any Rust magic that would allow me to use an async method for |
Thinking out loud here. Im wondering if i can move the required async functionality to all be within |
I do not know 😞
I think that sounds like a very good idea to me 👍 |
As an update, I need to handle the case of inferring file format when a partitioned file / directory is provided. |
@alamb i think this is finally ready. I havent included partition column inference, but i think that can be added as a follow on PR if needed. |
let table = | ||
ListingTable::new(Arc::new(LocalFileSystem {}), filename, schema, opt); | ||
let config = ListingTableConfig::new(Arc::new(LocalFileSystem {}), filename) | ||
.infer() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is cool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is looking really good @matthewmturner 👍
datafusion/tests/path_partition.rs
Outdated
@@ -271,7 +274,7 @@ async fn parquet_overlapping_columns() -> Result<()> { | |||
Ok(()) | |||
} | |||
|
|||
fn register_partitioned_aggregate_csv( | |||
async fn register_partitioned_aggregate_csv( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't need to be async
anymore does it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. ive fixed.
@alamb thx! as always appreciate your guidance. is it ok for me to include this feature in post for 7.0? |
Ok no now it has a conflict! |
working on it |
f5b3e9b
to
33aebb6
Compare
33aebb6
to
f7360d6
Compare
@alamb hopefully were good now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
neat @matthewmturner
"json" => Ok(Arc::new(JsonFormat::default())), | ||
"parquet" => Ok(Arc::new(ParquetFormat::default())), | ||
_ => Err(DataFusionError::Internal( | ||
"Unable to infer file type".into(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can make the log more specifically with the suffix
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I'll update.
…datafusion into add_listing_config
Thanks again for the good work @matthewmturner |
Which issue does this PR close?
Closes #1705
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?