Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] cuio: use datasource as the exclusive reader input argument #6185

Closed
cwharris opened this issue Sep 9, 2020 · 5 comments
Closed

[FEA] cuio: use datasource as the exclusive reader input argument #6185

cwharris opened this issue Sep 9, 2020 · 5 comments
Assignees
Labels
cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.

Comments

@cwharris
Copy link
Contributor

cwharris commented Sep 9, 2020

Internally, readers are instantiating datasources (from filepaths or otherwise), presumably out of convenience. This complicates the reader interfaces and implementation. Let's factor the readers filepath arguments in to explicit datasource initialization outside of the reader and accept std::unique_ptr<datasource> or std::vector<std::unique_ptr<datasource>> (in the case of parquet, for example) as file/data input exclusively.

explicit impl(std::unique_ptr<datasource> source,
std::string filepath,
reader_options const &options,
rmm::mr::device_memory_resource *mr);

explicit impl(std::unique_ptr<datasource> source,
std::string filepath,
reader_options const &args,
rmm::mr::device_memory_resource *mr);

reader::reader(std::vector<std::string> const &filepaths,
reader_options const &options,
rmm::mr::device_memory_resource *mr)
{
CUDF_EXPECTS(filepaths.size() == 1, "Only a single source is currently supported.");
_impl = std::make_unique<impl>(datasource::create(filepaths[0]), options, mr);
}

reader::reader(std::vector<std::string> const &filepaths,
reader_options const &options,
rmm::mr::device_memory_resource *mr)
: _impl(std::make_unique<impl>(datasource::create(filepaths), options, mr))
{
}

@cwharris cwharris added feature request New feature or request Needs Triage Need team to review and classify cuIO cuIO issue tech debt labels Sep 9, 2020
@vuule
Copy link
Contributor

vuule commented Sep 9, 2020

The reason why we have a separate path for file names is because JSON and CSV derive compression type from the file extension. If we pass this information in a different way, we can remove the filepath overloads.

@cwharris
Copy link
Contributor Author

Sounds like that makes this issue dependent on #6188

@kkraus14 kkraus14 added libcudf Affects libcudf (C++/CUDA) code. and removed Needs Triage Need team to review and classify labels Sep 15, 2020
@github-actions
Copy link

This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@vuule
Copy link
Contributor

vuule commented Sep 28, 2021

This will be closed by #9088 and #9089

@vuule
Copy link
Contributor

vuule commented Nov 12, 2021

@cwharris can we close this one now? :)

@cwharris cwharris reopened this Nov 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

No branches or pull requests

3 participants