-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support casting of Map type to string in JSON reader #14936
Changes from 17 commits
febcfff
11aa95b
a3cbc4f
ecf4e13
56585c5
e083b4a
9e07e0d
56dacca
4cb8673
55ef545
9ce7875
ce16a53
70dbc31
2ffdc8a
0535d21
4a432cd
4cd6150
6add1c7
53cba69
3cc7403
b44f52c
3df4c41
2f0a2e0
3c394e8
9e7abd7
e600fb5
afbdcb8
107259a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,6 +25,10 @@ | |
#include <map> | ||
#include <vector> | ||
|
||
// Forward declaration of parse_options from parsing_utils.cuh | ||
namespace cudf::io { | ||
struct parse_options; | ||
} | ||
namespace cudf::io::json { | ||
|
||
/** | ||
|
@@ -284,6 +288,15 @@ reduce_to_column_tree(tree_meta_t& tree, | |
device_span<size_type> row_offsets, | ||
rmm::cuda_stream_view stream); | ||
|
||
/** | ||
* @brief Retrieves the parse_options to be used for type inference and type casting | ||
* | ||
* @param options The reader options to influence the relevant type inference and type casting | ||
* options | ||
*/ | ||
cudf::io::parse_options parsing_options(cudf::io::json_reader_options const& options, | ||
rmm::cuda_stream_view stream); | ||
|
||
/** @copydoc host_parse_nested_json | ||
* All processing is done in device memory. | ||
* | ||
|
@@ -293,6 +306,32 @@ table_with_metadata device_parse_nested_json(device_span<SymbolT const> input, | |
rmm::cuda_stream_view stream, | ||
rmm::mr::device_memory_resource* mr); | ||
|
||
/** | ||
* @brief Get the path data type of a column by path if present in input schema | ||
* | ||
* @param path path of the column | ||
* @param options json reader options which holds schema | ||
* @return data type of the column if present | ||
*/ | ||
std::optional<data_type> get_path_data_type( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we have this as a member of the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
host_span<std::pair<std::string, cudf::io::json::NodeT>> path, | ||
cudf::io::json_reader_options const& options); | ||
|
||
/** | ||
* @brief Helper class to get path of a column by column id from reduced column tree | ||
* | ||
*/ | ||
struct path_from_tree { | ||
host_span<NodeT const> column_categories; | ||
host_span<NodeIndexT const> column_parent_ids; | ||
host_span<std::string const> column_names; | ||
bool is_array_of_arrays; | ||
NodeIndexT const row_array_parent_col_id; | ||
|
||
using path_rep = std::pair<std::string, cudf::io::json::NodeT>; | ||
std::vector<path_rep> get_path(NodeIndexT this_col_id); | ||
}; | ||
|
||
/** | ||
* @brief Parses the given JSON string and generates table from the given input. | ||
* | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we need to know the tree path only for mixed types, can we create the object only when the option is enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The object is light weight. It holds span and couple of primitives. So, it may not matter much if the suggestion is for reducing runtime or memory.
I added the struct because in future it can have a memoizer.