Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update JNI JSON reader column compatability for Spark #13477

Merged
merged 5 commits into from
Jun 1, 2023

Conversation

revans2
Copy link
Contributor

@revans2 revans2 commented May 31, 2023

Description

This moves the logic to update the columns returned from the JSON reader to java. It also updated the code to be able to deal with requested columns that were not in the data. It is not perfect because it will not work if the input file had no columns at all in it.

{}
{}

But it fixes issues for a file that has valid columns in it, but none of them are the columns that we requested.

This is a work around for #13473, but is not perfect.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@revans2 revans2 added bug Something isn't working 3 - Ready for Review Ready for review by team 4 - Needs cuDF (Java) Reviewer non-breaking Non-breaking change labels May 31, 2023
@revans2 revans2 self-assigned this May 31, 2023
@revans2 revans2 requested a review from a team as a code owner May 31, 2023 15:35
@github-actions github-actions bot added the Java Affects Java cuDF API. label May 31, 2023
Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nits, lgtm.

Comment on lines +82 to +86
DType[] ret = new DType[types.size()];
for (int i = 0; i < types.size(); i++) {
ret[i] = types.get(i);
}
return ret;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DType[] ret = new DType[types.size()];
for (int i = 0; i < types.size(); i++) {
ret[i] = types.get(i);
}
return ret;
return types.toArray(new DType[types.size()]);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat. That would pack nicely to:

return types == null ? null : types.toArray(new DType[types.size()]); 

Comment on lines 240 to 242
int[] dTypeIds, int[] dTypeScales,
String filePath, long address, long length,
boolean dayFirst, boolean lines) throws CudfException;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: These parameters are now mis-indented

// We might need to rearrange the columns to match what we want.
DType[] types = schema.getTypes();
ColumnVector[] columns = new ColumnVector[neededColumns.length];
try (Table tbl = twm.releaseTable()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Seems like this should be the first thing in the method so no matter what happens (NPE, whatever) we're closing the table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TableWithMetadata will close the table if we don't pull it out first.

@@ -968,6 +971,42 @@ public static Table readJSON(Schema schema, JSONOptions opts, byte[] buffer) {
return readJSON(schema, opts, buffer, 0, buffer.length);
}

private static Table gatherJSONColumns(Schema schema, TableWithMeta twm) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have a comment on this method explaining it will release and close the underlying table of twm as a side-effect.

Copy link
Contributor

@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one nitpick, but LGTM!

@revans2
Copy link
Contributor Author

revans2 commented Jun 1, 2023

/merge

@revans2
Copy link
Contributor Author

revans2 commented Jun 1, 2023

/merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team bug Something isn't working Java Affects Java cuDF API. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants