-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support JSON_EXISTS, JSON_VALUE and JSON_QUERY functions #9831
Conversation
fe236ff
to
70e48c9
Compare
0acac64
to
e1f3ccd
Compare
5fa3108
to
1866670
Compare
a53e24a
to
5b355b5
Compare
4a4b801
to
a17b750
Compare
core/trino-parser/src/main/antlr4/io/trino/sql/parser/SqlBase.g4
Outdated
Show resolved
Hide resolved
core/trino-parser/src/main/antlr4/io/trino/jsonpath/JsonPath.g4
Outdated
Show resolved
Hide resolved
b1c768f
to
b5d5f26
Compare
this(Optional.of(location), inputExpression, inputFormat, jsonPath, pathParameters, errorBehavior); | ||
} | ||
|
||
public JsonExists( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, the rewriter should create the JsonExists node with the same location as the original one, and we should remove the location-less constructor. Same for the other ones.
return toSlice(jsonExpression, UTF_32, errorBehavior, omitQuotes); | ||
} | ||
|
||
private static Slice toSlice(JsonNode json, EncodingSpecificConstants constants, long errorBehavior, boolean omitQuotes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd call this serialize
import static io.trino.spi.type.TypeUtils.readNativeValue; | ||
import static io.trino.sql.analyzer.ExpressionAnalyzer.JSON_NO_PARAMETERS_ROW_TYPE; | ||
|
||
public final class JsonParametersToArrayHelper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this class for? The name is too vague and the method below is not very clear. I'm thinking of whether it should be renamed or placed elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a javadoc, and renamed the class to ParameterUtil
.
function = plannerContext.getMetadata().resolveFunction(session, QualifiedName.of(JSON_EXISTS_FUNCTION_NAME), fromTypes(argumentTypes.build())); | ||
} | ||
catch (TrinoException e) { | ||
if (e.getLocation().isPresent()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would this happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When function is not registered, which shouldn't happen. Should I remove the try-catch?
|
||
// wrapper behavior, empty behavior and error behavior will be passed as arguments to function | ||
// quotes behavior is handled by the corresponding output function | ||
ImmutableList.Builder<Type> argumentTypes = ImmutableList.<Type>builder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call .build()
here directly and declare the variable as List<Type>
. No need to have an explicit variable to hold the intermediate.
// process(node.getPath()); | ||
// types.put(PathNodeRef.of(node), BOOLEAN); | ||
// return BOOLEAN; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove or add a TODO comment.
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
package io.trino.sql.planner; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be in the "planner" package, but in the io.trino.json
package.
this.invocationContext = requireNonNull(invocationContext, "invocationContext is null"); | ||
} | ||
|
||
public List<Object> evaluate(IrJsonPath path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems backward. The fixed part should be the path (i.e., the "program" or "IR"), the functions, etc. The variable part should be the json over which the path is being evaluated.
That will allow caching resolutions and other similar decisions and attach them to the corresponding IR nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored
right = (TypedValue) rightObject; | ||
} | ||
|
||
ResolvedOperatorAndCoercions operators = getOperators(OperatorType.valueOf(node.getOperator().name()), left.getType(), right.getType()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the operator being cached? Ideally, it should be attached to the node
(via a map of node->operator) to avoid having to re-resolve it every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The operators are cached in CachingResolver
.
b5341a7
to
04d22e4
Compare
@martint I refactored operator caching according to our discussion. Please take a look at the last commit. |
32c0f72
to
4838a88
Compare
*/ | ||
public void initializeIfNecessary(IrJsonPath path, ConnectorSession session) | ||
{ | ||
if (evaluator != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not obvious by looking at this why it's safe to skip re-initializing the evaluator if it's already set. Technically, and from the JsonPathInvocationContext's perspective, the path could be different. The fact that it's not, is a side effect of how the JSON constructs in SQL work (the path is constant) and how they are mapped to functions. But that knowledge is too far removed from this site.
I don't think this class is needed. Instead, the functions should initialize and hold on to the JsonPathEvaluator
instance upon first invocation (i.e., wherever they are calling initializeIfNecessary
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that JsonPathInvocationContext
is needed. It is passed through the instanceFactory
and this way an instance of JsonPathInvocationContext
persists through multiple rows processed by the JSON-funciton.
We can't skip it and just pass JsonPathEvaluator
as the context object, because we can create the JsonPathEvaluator
only after the function is invoked with certain arguments (including path). The context object already exists at that moment (it comes as one of the arguments to the function).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right. This is the state object for the (stateful) function. However, it could be changed to be just that -- a state object, and leave all logic to be done in the caller.
Let `handleSubqueries` find subqueries nested under `Expression` nodes, linked via children not being `Expression`.
`json_extract` and `json_extract_scalar` are old JSON-processing functions which have very limited capabilities, and are not compliant with the spec. However, they have simple lightweight implementation, optimized for the use case. `json_query` and `json_value` are new spec-compliant JSON-processing functions, which support the complete specification for JSON path. Their implementation is much more complicated, resulting both from the scope of the supported feature, and the "streaming" semantics. These benchmarks are to compare both implementations applied to the common use-case. Benchmark json_extract vs json_query: Benchmark (depth) Mode Cnt Score Error Units BenchmarkJsonFunctions.benchmarkJsonExtractFunction 1 avgt 15 813.030 ± 12.263 ns/op BenchmarkJsonFunctions.benchmarkJsonExtractFunction 3 avgt 15 939.202 ± 94.621 ns/op BenchmarkJsonFunctions.benchmarkJsonExtractFunction 6 avgt 15 1004.352 ± 12.945 ns/op BenchmarkJsonFunctions.benchmarkJsonQueryFunction 1 avgt 15 1136.371 ± 17.502 ns/op BenchmarkJsonFunctions.benchmarkJsonQueryFunction 3 avgt 15 1399.780 ± 27.967 ns/op BenchmarkJsonFunctions.benchmarkJsonQueryFunction 6 avgt 15 1666.810 ± 29.572 ns/op Benchmark json_extract_scalar vs json_value: Benchmark (depth) Mode Cnt Score Error Units BenchmarkJsonFunctions.benchmarkJsonExtractScalarFunction 1 avgt 15 644.762 ± 9.195 ns/op BenchmarkJsonFunctions.benchmarkJsonExtractScalarFunction 3 avgt 15 720.244 ± 13.288 ns/op BenchmarkJsonFunctions.benchmarkJsonExtractScalarFunction 10 avgt 15 928.069 ± 17.623 ns/op BenchmarkJsonFunctions.benchmarkJsonValueFunction 1 avgt 15 811.158 ± 14.694 ns/op BenchmarkJsonFunctions.benchmarkJsonValueFunction 3 avgt 15 1042.795 ± 40.555 ns/op BenchmarkJsonFunctions.benchmarkJsonValueFunction 10 avgt 15 1663.328 ± 27.320 ns/op
private final Metadata metadata; | ||
private final Session session; | ||
private final TypeCoercion typeCoercion; | ||
private final NonEvictableCache<IrPathNodeRef<IrPathNode>, NonEvictableCache<InputTypes, ResolvedOperatorAndCoercions>> operators = buildNonEvictableCache(CacheBuilder.newBuilder().maximumSize(MAX_CACHE_SIZE)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not flatten the caches an use a composite key made of IrPathNode ref
, left type
and right type
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then one node could take too much space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should be ok, especially if there's a case where a single node sees many combinations of types.
It would also make the code simpler. If later we find that this is not sufficient, we can think of better caching and eviction strategies, or JIT-style codegen/execution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache is now flattened.
Nice! |
No description provided.