Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple paths for source files #394

Closed
MaksymFedorchuk opened this issue Jun 24, 2021 · 2 comments · Fixed by #399
Closed

Multiple paths for source files #394

MaksymFedorchuk opened this issue Jun 24, 2021 · 2 comments · Fixed by #399
Assignees
Labels
accepted Accepted for implementation question Further information is requested

Comments

@MaksymFedorchuk
Copy link

MaksymFedorchuk commented Jun 24, 2021

I need to read files from multiple folders, but so far I didn't find in cobrix an option to achieve this. So is there a way to read multiple folders without creating multiple rdd's or datasets? If not, then this should be an enhancement request.

Example :
source_folders = ["example1/folder_1/","example2/folder_2/"]
spark.read.format("cobol").load(source_folders)

Parquet and other popular formats have support for multiple sources

@MaksymFedorchuk MaksymFedorchuk added the question Further information is requested label Jun 24, 2021
@yruslan
Copy link
Collaborator

yruslan commented Jun 24, 2021

That's a good idea! We'll check it out and implement it

@yruslan yruslan self-assigned this Jun 24, 2021
@yruslan yruslan added the accepted Accepted for implementation label Jun 24, 2021
@yruslan
Copy link
Collaborator

yruslan commented Jun 30, 2021

While looking into that I've noticed that in order to support multiple paths in .load(...) the data source provider needs to be rewritten in terms of FileFormat instead of RelationProvider. So it might take some time to implement.
If data files are in subdirectories of the same root folder, you can use "/path/*", and the data source will look 1 level of recursion into each subfolder.

If rewriting fom RelationProvider to FileFormat is too hard, we'll add a Cobrix extension option, for instance .option("paths", "/comma,/separated,/paths") as a workaround for sometime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Accepted for implementation question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants