-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return per-file metadata from readers #10782
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-22.06 #10782 +/- ##
================================================
+ Coverage 86.40% 86.45% +0.04%
================================================
Files 143 143
Lines 22448 22493 +45
================================================
+ Hits 19396 19446 +50
+ Misses 3052 3047 -5
Continue to review full report at Codecov.
|
// Assumes that all input files have the same metadata | ||
// TODO: verify this assumption | ||
auto it = keyval_maps[0].find("pandas"); | ||
if (it != keyval_maps[0].end()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how it works if the first file is written by pandas (thus has index_columns
) while the second one is written by cudf (has no index_columns
). To be verified once the python changes have been finished.
rerun tests |
@gpucibot merge |
Issue #10775
C++ side of the fix the the issue above.
Adds
pref_file_user_data
totable_metadata
so that readers can return a map per file instead of merging maps from multiple input file into a single map, overwriting elements with the same key.The original
user_data
member now holds the metadata from the first input file, instead of trying (and failing) to merge the maps.Will be removed in the future. Got no good way to deprecate, as the
table_metadata
struct does not have encapsulation :("breaking" label because the logic of
user_data
changed. Not expected to impact and working code.