Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split datafusion-object-store module #2065

Merged
merged 2 commits into from
Mar 24, 2022
Merged

split datafusion-object-store module #2065

merged 2 commits into from
Mar 24, 2022

Conversation

yahoNanJing
Copy link
Contributor

@yahoNanJing yahoNanJing commented Mar 23, 2022

Which issue does this PR close?

Closes #1772.

Rationale for this change

By splitting the object store into a separate module, the object stores in datafusion-contrib no longer need to depend on the whole datafusion crate. They only need to depend on this separated module. What's more, if we hope to introduce the object stores in datafusion-contrib as some featuers in the datafusion crate, the issue of dependency cyclic can be avoided.

What changes are included in this PR?

  • Introduces a new crate datafusion-storage for object store interfaces.
  • Moves the object store in previous datasource mod into this new datafusion-storage crate.

The reason of still keeping ObjectStoreRegistry in the datafusion crate is

  1. for future introduces object store features which can make it possible to have many default object stores besides the LocalFileSystem, like HadoopFileSystem in datafusion-objectstore-hdfs, S3FileSystem in datafusion-objectstore-s3.
  2. the object stores in datafusion-contrib don't need it.

Are there any user-facing changes?

@github-actions github-actions bot added ballista datafusion Changes in the datafusion crate labels Mar 23, 2022
@yahoNanJing
Copy link
Contributor Author

Hi @jimexist, @matthewmturner, @yjshen, @alamb, could you help review this patch?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me -- thank you @yahoNanJing

cc @yjshen @rdettai @matthewmturner and @tustvold

datafusion-storage/README.md Outdated Show resolved Hide resolved
datafusion-storage/README.md Outdated Show resolved Hide resolved
@matthewmturner
Copy link
Contributor

Overall looks good. I think my only point, similar to what we discussed before, is on some of the ObjectStore functionality that is still hanging around in datafusion core i.e. ListingTable and functionality in datasource. I know the ObjectStore doesnt need them and this keeps datafusion-storage light, which of course is nice. To me they just seem logically coupled so at first was odd seeing the functionality split like that.

That being said, i have given this more thought and given how the above mentioned functionalities leverage physical plan, etc. i do think it makes sense to have datafusion wrap these functionalities together. just walking through my thought process out loud :)

thanks for this.

@alamb
Copy link
Contributor

alamb commented Mar 23, 2022

To me they just seem logically coupled so at first was odd seeing the functionality split like that.

🤔 maybe it is time for datafusion-listing-table crate

Copy link
Contributor

@liukun4515 liukun4515 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alamb
Copy link
Contributor

alamb commented Mar 24, 2022

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

split datafusion-datasource module
6 participants