Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The problem of comparing an evaluated expression with a json literal #74

Closed
danielaparker opened this issue Mar 13, 2021 · 15 comments
Closed

Comments

@danielaparker
Copy link

danielaparker commented Mar 13, 2021

This issue relates to suggested expression syntax described in #17.

For a given JSON document

[[1, 2, 3], [1], [2, 3], 1, 2]

the following queries are considered:

$[?(@ == 2)]               (1)

$[?(@ == [1,2,3])]       (2)

$[?(@[0:1]==[1])]        (3)

In these three queries, the left side of the expression is a JSONPath selector against the current node1, and the right side is a JSON literal.

If the selector on the left side is evaluated such that only a single value is returned when only one match is possible (instead of an array of a single value), the following results are expected:

[2]                                  (4)

[[1,2, 3]]                         (5)

[[1, 2, 3],  [1]]                 (6)

Cpp (jsoncons), Java (com.jayway.jsonpath) , Objective-C (SMJJSONPath), and Python (jsonpath) are consistent with these results. For comparable queries in JSONPath Comparison, see Filter expression with bracket notation with number, Filter expression with equals array, Filter expression with equals array for array slice with range 1

But let's consider the case where the selector on the left side is evaluated such that an array is always returned, including an array of a single value (typical behavior of a JSONPath evaluator.) If we take the first element of that array to compare with the JSON literal on the right, we match (4) and (5), but not (6).

It seems to me that for sensible results, in the context of filter expressions, the selector on the left side must be evaluated such that only a single value is returned when only one match is possible.

I note that most implementations do not support comparisons between evaluated JSONPath expressions and JSON arrays, in JSONPath Comparisons, only six give the expected result in a query analogous to (2), and four for the query (3).

Most implementations give the expected result for (1).

1: In many implementations, particularly ones using an external script engine, these aren't in fact evaluated as JSONPath selectors, but in some they are.

This was referenced Mar 13, 2021
@danielaparker danielaparker changed the title Issues with comparing an evaluated JSONPath expression to a json literal The problem of comparing an evaluated expression with a json literal Mar 14, 2021
@gregsdennis
Copy link
Collaborator

gregsdennis commented Mar 14, 2021

It seems to me that for sensible results, in the context of filter expressions, the selector on the left side must be evaluated such that only a single value is returned when only one match is possible.

This seems to touch on value equivalency a bit, which @danielaparker commented on in another issue and I replied to.

Consider the cases (1) and (4). If the expression returns a single value, it could be valid to operate on that single value directly rather than its containing array (which is what is technically returned). JSON Logic does something similar to make authoring logic constructs simpler. It's a sort of short-hand or syntax sugar.

I can see value in this.

(Also, I've already coded it, although it went against all of the strongly-typed bones in my body.)

@danielaparker
Copy link
Author

danielaparker commented Mar 14, 2021

It seems to me that for sensible results, in the context of filter expressions, the selector on the left side must be evaluated such that only a single value is returned when only one match is possible.

This seems to touch on value equivalency a bit, which @danielaparker commented on in another issue and I replied to.

Consider the cases (1) and (4). If the expression returns a single value, it could be valid to operate on that single value directly rather than its containing array (which is what is technically returned).

Right, taking the first element is the obvious rule, and corresponds to intuition when the JSONPath expression does in fact evaluate to a single value, as in (1) and (2). But that rule doesn't correspond to intuition when the JSONPath expression evaluates to multiple values, such as with a wildcard selector, a union selector, or a slice selector, as in (3).

In any case, the draft needs to specify rules for how the results of these expressions are to be compared. The simplest would be to define operators only between single values, which is what the majority of implementations have. But a few implementations have been more ambitious.

@gregsdennis
Copy link
Collaborator

gregsdennis commented Mar 14, 2021

So supposing that the spec defines behavior for single values / single-valued arrays, what's the behavior when multiple values are returned? I think either:

  • this needs to be explicitly defined (e.g. operators return false, null, some unevaluatable state, etc.), or
  • the spec should explicitly state that behavior is not defined.

The nice thing about the second option is that it leaves the behavior open to the implementations for now. Then future versions of the spec can use the behavior defined by the implementations to define it officially.


But then some cases might require returning multiple values, such as functions, e.g. len() or contains().

@danielaparker
Copy link
Author

@gregsdennis

So supposing that the spec defines behavior for single values / single-valued arrays, what's the behavior when multiple values are returned? I think either:

  • this needs to be explicitly defined (e.g. operators return false, null, some unevaluatable state, etc.), or
  • the spec should explicitly state that behavior is not defined.

I think the general rule should be that the JSONPath expression be evaluated in expressions without putting single values in an array. That works for all cases, and in particular it works for the comparison of a slice and an array in (3). Putting single values in an array makes sense for the final result, but not in expressions.

To make live easier for legacy implementations, I would suggest the specification only require comparison of paths returning single values. In this case it doesn't make any difference whether the JSONPath evaluator puts an array around the single value or not, the implementation can use the single value or the element in the array. It can be left as an implementation extension whether to support more general paths such as slices in expressions, as a few currently do.

Daniel

@gregsdennis
Copy link
Collaborator

... JSONPath expression be evaluated in expressions without putting single values in an array. That works for all cases... - @danielaparker

It doesn't work for all cases when combining this with #59. If paths can start with @ then it makes sense that they always return an array. This allows the expression evaluation to use the overall party logic to evaluate the path. In this approach, the operator interprets a single-value array as merely the value as necessary.

There's also, the ambiguity where the value returned is actually an array. E.g. $[?(@.foo == [1, 2, 3])]

I think returning an array has valid use cases.

@danielaparker
Copy link
Author

danielaparker commented Mar 15, 2021

@gregsdennis wrote:

There's also, the ambiguity where the value returned is actually an array. E.g. $[?(@.foo == [1, 2, 3])]

That's effectively covered by example (2) in the OP, for the simplest possible case.

I think returning an array has valid use cases.

It would be helpful to keep the discussion example driven, as comparing a path result on the left side with a JSON literal on the right side, or another path on the right side, is really hard to wrap one's mind around without them. At least with "return as array". This is one situation where the "return as single value" advocates, whose case is represented in section 8 of minutes-109-jsonpath-00, have an easier story.

Could you explain with an example how you think these two comparisons should work, with consistent rules in both cases:

  • A comparison of a slice expression against the current node, with a JSON literal array (i.e. example (3))
  • A comparison of one slice expression against the current node, with another slice expression against the current node on the right side.

Also, be explicit about what information you assume is available to the equality operator when performing the evaluation. In the JMESPath equality operator, for example, it is specified that the operator only sees a JSON value on the left, and a JSON value on the right, the expression evaluation has already taken place.

@cabo
Copy link
Member

cabo commented Mar 15, 2021 via email

@gregsdennis
Copy link
Collaborator

I was thinking about this some more.

Having the operator itself strip off the outer array for single-value arrays doesn't make sense. If you use substitution, evaluating the path on @..foo == 1 would result in [1] == 1 which is wrong. So for this case, it makes sense to not allow multiple return values and always strip the array.

However, supposing that we define functions, you could have something like contains(@..foo, 1) in which case, you might want to allow multiple return values.

But.... Then there's an ambiguity between @..foo returning multiple values and it returning a single value which happens to be an array.

Considering these variants, I think it only makes sense to force the return of a single value from paths inside expressions. I just don't see a reliable way to return the full match set and still get the expressions that we want.

I see this as an implicit single() function wrapping the path, e.g. single(@..foo) == 1 where @..foo returns its normal array of matches, but the single() ensures that it only contains one value, otherwise an error results (or, to be non-destructive, the expression evaluates to false so that the item is not selected).

If this single() function were the default behavior (implicit), we could define another function, say all_results(), that could be used explicitly to declare that we want the full match set so that we can operate on the result as an array.

@gregsdennis
Copy link
Collaborator

Sorry if that's not clear ☝️. Was thinking out loud a bit.

@cabo
Copy link
Member

cabo commented Mar 15, 2021 via email

@gregsdennis
Copy link
Collaborator

This whole discussion is confused between this (returning multiple values) and JSON arrays. - @cabo

That's because the multiple values are returned as a JSON array.

Since a path returns an array, it makes sense to at least consider the option of and discuss taking that array of results into account when evaluating expressions that can include paths.

@danielaparker
Copy link
Author

danielaparker commented Mar 16, 2021

@cabo wrote

On 2021-03-13, at 22:16, Daniel Parker @.***> wrote: But let's consider the case where the selector on the left side is evaluated such that an array is always returned, including an array of a single value. If we take the first element of that array to compare with the JSON literal on the right, we match (4) and (5), but not (6).

I’m having a hard time following this conversation. Why would we want to do what you describe? Grüße, Carsten

The question is, what does it mean to compute the result of a JSONPath expression evaluated against the current node, and compare it to a JSON literal? For advocates of "return as single value", whose case is represented in section 8 of minutes-109-jsonpath-00, the answer is completely clear. The path evaluates to a single value, which can be compared to a JSON literal on the right hand side, and there is no issue.

However, for advocates of "return as array" (the majority), the answer is less clear, as the JSONPath Comparison examples illustrate. Rules need to be defined, and the obvious one is not equivalent to the "return as single value" case.

@cabo
Copy link
Member

cabo commented Mar 17, 2021

We are completely free to define what it means to compare a literal to a nodelist. We don't have to have the same semantics as with comparing a literal to an array (which has an obvious answer).

@danielaparker
Copy link
Author

danielaparker commented Mar 17, 2021

@cabo wrote:

This whole discussion is confused between this (returning multiple values) and JSON arrays. JSON arrays are items in the input. Multiple values are, just that, multiple values. (Call them, collectively, collections if you like.) Can we keep these two concepts separate?

The distinction between "return as array" and "return as single value", as discussed in minutes-109-jsonpath-00, is well understood. In terms of existing practice, the JSONPath Comparisons show that 9 out of 41 implementations return a single value where only one match is possible, 32 return an array.

In Goessner, the "collection" of "multiple values" is a JSON array. Goessner writes

"Please note, that the return value of jsonPath is an array, which is also a valid JSON structure. So you might want to apply jsonPath to the resulting structure again or use one of your favorite array methods as sort with it."

As far as I can tell, all JSONPath implementations, whether "return as array" or "return as single value", are consistent with the idea of JSON in, JSON out. That the result can be provided back as input. That's true when the "value" option is specified, and also when the "path" option is specified.

@cabo
Copy link
Member

cabo commented Mar 17, 2021

I don't doubt that many implementations implement returning the nodelist as returning an array. That was not my point.
The nodelist returned by a nested query does not need to be reified as a JSON array before it is compared to a literal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants