-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use parser from spark to normalize json path in GetJsonObject #10466
Changes from 2 commits
d41de09
2fe8f47
9b62642
0ec06f9
7257633
208abc9
6d1a349
1ab668f
366671a
145243e
d555009
4e73c42
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
/* | ||
* Copyright (c) 2021-2023, NVIDIA CORPORATION. | ||
* Copyright (c) 2021-2024, NVIDIA CORPORATION. | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
|
@@ -16,7 +16,7 @@ | |
|
||
package com.nvidia.spark.rapids | ||
|
||
import ai.rapids.cudf.{ColumnVector,GetJsonObjectOptions} | ||
import ai.rapids.cudf.{ColumnVector, GetJsonObjectOptions, Scalar} | ||
import com.nvidia.spark.rapids.Arm.withResource | ||
|
||
import org.apache.spark.sql.catalyst.expressions.{ExpectsInputTypes, Expression} | ||
|
@@ -32,8 +32,20 @@ case class GpuGetJsonObject(json: Expression, path: Expression) | |
override def nullable: Boolean = true | ||
override def prettyName: String = "get_json_object" | ||
|
||
override def doColumnar(lhs: GpuColumnVector, rhs: GpuScalar): ColumnVector = { | ||
lhs.getBase().getJSONObject(rhs.getBase, | ||
def normalizeJsonPath(path: GpuScalar): Scalar = { | ||
val pathStr = if (path.isValid) { | ||
path.getValue.toString() | ||
} else { | ||
throw new IllegalArgumentException("Invalid path") | ||
} | ||
// Remove all leading whitespaces before names, so we erase all whitespaces after . or [' | ||
val normalizedPath = pathStr.replaceAll("""\.(\s+)""", ".").replaceAll("""\['(\s+)""", "['") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't like us using regular expressions here. It might work, but I don't see it as a good long term solution. Could we please take the code that Spark uses to parse a JSON path. And modify it so that we output a normalized JSON path instead. That would also give us the ability to detect errors in the path before we send it to CUDF and make sure that we are compatible with what CUDF supports. Meaning we really just reuse their code and then convert the
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks! Updated to use JsonPathParser, this approach also looks to fix some other issues related to json path. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated the PR description to explain why this PR can fix these issues. |
||
Scalar.fromString(normalizedPath) | ||
} | ||
|
||
override def doColumnar(lhs: GpuColumnVector, rhs: GpuScalar): ColumnVector = { | ||
val normalizedScalar = normalizeJsonPath(rhs) | ||
lhs.getBase().getJSONObject(normalizedScalar, | ||
GetJsonObjectOptions.builder().allowSingleQuotes(true).build()); | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how big of a deal this is, but it means that the path is null. We should not throw an exception here, but we should return
None
as a null path should result in a null for all resulting values instead of an exception.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done