-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update JNI JSON reader column compatability for Spark #13477
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nits, lgtm.
DType[] ret = new DType[types.size()]; | ||
for (int i = 0; i < types.size(); i++) { | ||
ret[i] = types.get(i); | ||
} | ||
return ret; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DType[] ret = new DType[types.size()]; | |
for (int i = 0; i < types.size(); i++) { | |
ret[i] = types.get(i); | |
} | |
return ret; | |
return types.toArray(new DType[types.size()]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat. That would pack nicely to:
return types == null ? null : types.toArray(new DType[types.size()]);
int[] dTypeIds, int[] dTypeScales, | ||
String filePath, long address, long length, | ||
boolean dayFirst, boolean lines) throws CudfException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: These parameters are now mis-indented
// We might need to rearrange the columns to match what we want. | ||
DType[] types = schema.getTypes(); | ||
ColumnVector[] columns = new ColumnVector[neededColumns.length]; | ||
try (Table tbl = twm.releaseTable()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Seems like this should be the first thing in the method so no matter what happens (NPE, whatever) we're closing the table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TableWithMetadata will close the table if we don't pull it out first.
@@ -968,6 +971,42 @@ public static Table readJSON(Schema schema, JSONOptions opts, byte[] buffer) { | |||
return readJSON(schema, opts, buffer, 0, buffer.length); | |||
} | |||
|
|||
private static Table gatherJSONColumns(Schema schema, TableWithMeta twm) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to have a comment on this method explaining it will release and close the underlying table of twm as a side-effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one nitpick, but LGTM!
/merge |
/merge |
Description
This moves the logic to update the columns returned from the JSON reader to java. It also updated the code to be able to deal with requested columns that were not in the data. It is not perfect because it will not work if the input file had no columns at all in it.
But it fixes issues for a file that has valid columns in it, but none of them are the columns that we requested.
This is a work around for #13473, but is not perfect.
Checklist