Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable JsonToStructs for input schema other than Map<String, String> #8557

Merged
merged 3 commits into from
Jun 13, 2023

Conversation

cindyyuanjiang
Copy link
Collaborator

@cindyyuanjiang cindyyuanjiang commented Jun 12, 2023

We ran into JsonToStructs test failure: #8558. To prevent potential data corruption for users, we will tag JsonToStructs to not work on GPU for any input schema other than Map<String, String> and xfail corresponding integration tests until we have a fix for the above issue.

…nd xfail corresponding tests

Signed-off-by: Cindy Jiang <[email protected]>
@cindyyuanjiang cindyyuanjiang self-assigned this Jun 12, 2023
@cindyyuanjiang
Copy link
Collaborator Author

build

@gerashegalov
Copy link
Collaborator

Is it possible to file an OSS issue describing the bug with repro steps and reference it in the PR ?

}
case _ => ()
case _ =>
willNotWorkOnGpu("JsonToStructs only supports MapType<StringType, StringType> " +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not introduced by this PR but may be worth fixing

  1. it' should be GpuJsonToStructs since CPU JsonToStructs very well supports it
  2. The end user does not necessarily know or care about the backing Scala class

Should we say "from_json on GPU only supports MapType<StringType, StringType> ..." ?

I would also include the actual type provided by the user in that message

Copy link
Collaborator Author

@cindyyuanjiang cindyyuanjiang Jun 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I updated the message.
For the actual type, I didn't include it because it would have been included already in the from_json expression, e.g.: !Expression <JsonToStructs> from_json(StructField(a,StringType,true), StructField(b,IntegerType,true), a#116, Some(UTC)) cannot run on GPU because JsonToStructs only supports MapType<StringType, StringType> input schema, but received StructType(StructField(a,StringType,true),StructField(b,IntegerType,true))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to know that input types are already included. agreed no reason to repeat it then.

@@ -380,6 +380,7 @@ def test_from_json_map_fallback():
'JsonToStructs',
conf={"spark.rapids.sql.expression.JsonToStructs": True})

@pytest.mark.xfail(reason='GPU from_json does not support struct type schema')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once the issue is filed we can use its URL as the reason #8557 (comment)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes updated. Thank you!

Signed-off-by: Cindy Jiang <[email protected]>
@sameerz sameerz added the bug Something isn't working label Jun 13, 2023
@cindyyuanjiang
Copy link
Collaborator Author

build

@@ -3382,11 +3382,9 @@ object GpuOverrides extends Logging {
override def tagExprForGpu(): Unit =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 3378 can we remove the STRUCT support? That should update the docs to make it clear that we only supp Map<STRING,STRING>

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Changes updated.

revans2
revans2 previously approved these changes Jun 13, 2023
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit

gerashegalov
gerashegalov previously approved these changes Jun 13, 2023
Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, potentially pending Bobby's comment

Signed-off-by: Cindy Jiang <[email protected]>
@cindyyuanjiang cindyyuanjiang dismissed stale reviews from gerashegalov and revans2 via 1456cb4 June 13, 2023 02:09
@cindyyuanjiang
Copy link
Collaborator Author

build

@pxLi pxLi merged commit d1f681e into NVIDIA:branch-23.06 Jun 13, 2023
@cindyyuanjiang cindyyuanjiang deleted the disable-gpu-json-to-struct branch June 13, 2023 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants