Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to ArrowDataSource in SourceInfo #16050

Merged
merged 6 commits into from
Jun 25, 2024

Conversation

lithomas1
Copy link
Contributor

@lithomas1 lithomas1 commented Jun 17, 2024

Description

ArrowDataSources weren't previously supported in SourceInfo.
(since we didn't need it for Avro).

Adding it now so we can pass tests for orc reader and co.
(even though ArrowDataSource may potentially be removed in the future)

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@lithomas1 lithomas1 added feature request New feature or request non-breaking Non-breaking change labels Jun 17, 2024
@github-actions github-actions bot added Python Affects Python cuDF API. CMake CMake build issue pylibcudf Issues specific to the pylibcudf package labels Jun 17, 2024
@lithomas1 lithomas1 marked this pull request as ready for review June 18, 2024 00:07
@lithomas1 lithomas1 requested a review from a team as a code owner June 18, 2024 00:07
@lithomas1 lithomas1 requested review from wence- and charlesbluca June 18, 2024 00:07
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, rather than doing this I think we should go ahead and deprecate the native file support. @rjzamora any objections?

@lithomas1
Copy link
Contributor Author

Actually, rather than doing this I think we should go ahead and deprecate the native file support. @rjzamora any objections?

If we want to port all the I/O readers to pylibcudf, we'll have to add native file support here (at least until the deprecation is executed). I suppose we could read all the bytes from the Arrow NativeFile as an alternative, but that's probably not a good idea.

Is it possible to put this in for now, given the diff is pretty small?
(This PR just makes plc.io.SourceInfo match make_source_info, so we can remove that when everything is ported. It doesn't add anything new.)

I also do a couple other things in this PR in addition to the Arrow support.
(e.g. raising ValueError for unsupported sources, and fixing the case where there's an empty buffer.)

@vyasr
Copy link
Contributor

vyasr commented Jun 25, 2024

Fair enough, I agree that if we're planning a deprecation cycle we don't want to gate the progress here. I'll review as is.

Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to do the deprecation in this PR or a follow-up, I'm approving either way.

@@ -84,6 +86,13 @@ cdef class SourceInfo:

self.c_obj = move(source_info(c_files))
return
elif isinstance(sources[0], Datasource):
for csrc in sources:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go ahead and insert the deprecation warning as part of this PR, either here or in the NativeFileDatasource constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take care of this in a followup, since I have another PR stacked on top of this one.

Thanks for the review!

@lithomas1
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit e4bd9e8 into rapidsai:branch-24.08 Jun 25, 2024
80 checks passed
@lithomas1 lithomas1 deleted the pylibcudf-arrow-io branch June 25, 2024 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue feature request New feature or request non-breaking Non-breaking change pylibcudf Issues specific to the pylibcudf package Python Affects Python cuDF API.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants