-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable JsonToStructs
for input schema other than Map<String, String>
#8557
Disable JsonToStructs
for input schema other than Map<String, String>
#8557
Conversation
…nd xfail corresponding tests Signed-off-by: Cindy Jiang <[email protected]>
build |
Is it possible to file an OSS issue describing the bug with repro steps and reference it in the PR ? |
} | ||
case _ => () | ||
case _ => | ||
willNotWorkOnGpu("JsonToStructs only supports MapType<StringType, StringType> " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not introduced by this PR but may be worth fixing
- it' should be GpuJsonToStructs since CPU JsonToStructs very well supports it
- The end user does not necessarily know or care about the backing Scala class
Should we say "from_json on GPU only supports MapType<StringType, StringType> ..." ?
I would also include the actual type provided by the user in that message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I updated the message.
For the actual type, I didn't include it because it would have been included already in the from_json
expression, e.g.: !Expression <JsonToStructs> from_json(StructField(a,StringType,true), StructField(b,IntegerType,true), a#116, Some(UTC)) cannot run on GPU because JsonToStructs only supports MapType<StringType, StringType> input schema, but received StructType(StructField(a,StringType,true),StructField(b,IntegerType,true))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good to know that input types are already included. agreed no reason to repeat it then.
@@ -380,6 +380,7 @@ def test_from_json_map_fallback(): | |||
'JsonToStructs', | |||
conf={"spark.rapids.sql.expression.JsonToStructs": True}) | |||
|
|||
@pytest.mark.xfail(reason='GPU from_json does not support struct type schema') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
once the issue is filed we can use its URL as the reason #8557 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes updated. Thank you!
Signed-off-by: Cindy Jiang <[email protected]>
build |
@@ -3382,11 +3382,9 @@ object GpuOverrides extends Logging { | |||
override def tagExprForGpu(): Unit = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On line 3378 can we remove the STRUCT support? That should update the docs to make it clear that we only supp Map<STRING,STRING>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Changes updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a nit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, potentially pending Bobby's comment
Signed-off-by: Cindy Jiang <[email protected]>
1456cb4
build |
We ran into
JsonToStructs
test failure: #8558. To prevent potential data corruption for users, we will tagJsonToStructs
to not work on GPU for any input schema other thanMap<String, String>
and xfail corresponding integration tests until we have a fix for the above issue.