Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify read_csv by removing unnecessary reader/impl classes #9041

Merged
merged 23 commits into from
Oct 27, 2021
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
9c72e56
simplify io/functions.cpp data source/sink factories
cwharris Aug 14, 2021
9e92ca2
begin replacing csv_reader with pure functions
cwharris Aug 14, 2021
6492349
pass parse_options explicitly in csv_reader
cwharris Aug 14, 2021
3e365b5
replace csv reader impl::select_data_types with pure function
cwharris Aug 14, 2021
a4497c0
replace csv reader impl::column_flags_ member with local variable
cwharris Aug 14, 2021
6d708b7
make csv reader impl::find_first_row_start a standalone function
cwharris Aug 14, 2021
26e37e2
make csv reader impl:col_names_ a local variable
cwharris Aug 14, 2021
9d84753
replace csv reader impl::num_records with local variable.
cwharris Aug 14, 2021
7ce862e
convert csv reader impl ::num_actual_columns and ::num_active_columns…
cwharris Aug 14, 2021
9010fe1
remove csv reader class and impl class in favor of fucntions
cwharris Aug 14, 2021
7cda106
rearrange some functions to delete some unneccessary declarations.
cwharris Aug 14, 2021
884bde6
Merge branch 'branch-21.10' of github.com:rapidsai/cudf into io-funct…
cwharris Aug 16, 2021
88e2399
remove filepath-related logic from csv and json readers
cwharris Aug 17, 2021
62b9520
remove filepath logic from avro, parquet, orc readers
cwharris Aug 17, 2021
fb01294
move range size padding calculation out of json/csv reader and in to …
cwharris Aug 18, 2021
d422aeb
remove filepaths from json reader
cwharris Aug 18, 2021
a67150e
Merge branch 'io-functions-simplify' into io-simplify-csv
cwharris Aug 18, 2021
640375b
re-delete csv reader_impl header
cwharris Aug 21, 2021
051f0ce
Merge branch 'branch-21.10' of github.com:rapidsai/cudf into io-simpl…
cwharris Aug 24, 2021
07b05e8
re-remove csv/reader_impl.hpp
cwharris Aug 24, 2021
92033c3
fix bad merge where changes in 9079 were deleted.
cwharris Aug 25, 2021
24b3949
add back read_csv impl function get_data_types_from_column_names
cwharris Aug 26, 2021
e8a8887
Merge branch 'branch-21.12' into io-simplify-csv
cwharris Oct 26, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 13 additions & 47 deletions cpp/include/cudf/io/detail/csv.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -24,55 +24,21 @@ namespace cudf {
namespace io {
namespace detail {
namespace csv {

/**
* @brief Class to read CSV dataset data into columns.
* @brief Reads the entire dataset.
*
* @param sources Input `datasource` object to read the dataset from
* @param options Settings for controlling reading behavior
* @param stream CUDA stream used for device memory operations and kernel launches
* @param mr Device memory resource to use for device memory allocation
*
* @return The set of columns along with table metadata
*/
class reader {
private:
class impl;
std::unique_ptr<impl> _impl;

public:
/**
* @brief Constructor from an array of file paths
*
* @param filepaths Paths to the files containing the input dataset
* @param options Settings for controlling reading behavior
* @param stream CUDA stream used for device memory operations and kernel launches
* @param mr Device memory resource to use for device memory allocation
*/
explicit reader(std::vector<std::string> const& filepaths,
csv_reader_options const& options,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr);

/**
* @brief Constructor from an array of datasources
*
* @param sources Input `datasource` objects to read the dataset from
* @param options Settings for controlling reading behavior
* @param stream CUDA stream used for device memory operations and kernel launches
* @param mr Device memory resource to use for device memory allocation
*/
explicit reader(std::vector<std::unique_ptr<cudf::io::datasource>>&& sources,
csv_reader_options const& options,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr);

/**
* @brief Destructor explicitly-declared to avoid inlined in header
*/
~reader();

/**
* @brief Reads the entire dataset.
*
* @param stream CUDA stream used for device memory operations and kernel launches.
*
* @return The set of columns along with table metadata
*/
table_with_metadata read(rmm::cuda_stream_view stream = rmm::cuda_stream_default);
};
table_with_metadata read_csv(std::unique_ptr<cudf::io::datasource>&& source,
csv_reader_options const& options,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr);

class writer {
public:
Expand Down
Loading