Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query expression language support #17

Closed
gregsdennis opened this issue Sep 18, 2020 · 27 comments
Closed

Query expression language support #17

gregsdennis opened this issue Sep 18, 2020 · 27 comments

Comments

@gregsdennis
Copy link
Collaborator

In his post, Goessner indicates that for [(...)] ("container query") and [?(...)] ("item query"), the contained expression should use "the underlying script engine."

This presents a problem for consistency and interoperability between systems. A JSON Path written with a Javascript expression won't work when evaluated using an implementation written in PHP or .Net.

To address this, we should either define that the scripting language is something well-known (e.g. ECMAScript 2015 or C), or we should define our own language (a domain-specific language, or DSL).

Proposal

I like the idea of a simple DSL, and this proposal outlines the rules around such a language.

Exploring the data with @

JSON elements can be explored using JSON Path within the expression, and the values that are returned can be compared using simple comparison operators.

This alone enables expressions like ?(@.price<10) and ?(@.isbn). These state "the path @.price returns a value and that value is less than 10," and "the path @.isbn returns a value," respectively. And because these are just JSON Paths, implementations will already have the parsing logic for them. Further, it means that indexer syntax for property names will work, so that ?(@['price']<10) and ?(@['isbn']) will also work.

Operators

The basic comparison, mathematical, and boolean operators should be sufficient, at least for the initial revision of the specification. I expect most programmers will be familiar with the C-style operators, so I propose we use those.

==   equal
!=   not equal
<    less than
<=   less than or equal to
>    greater than
>=   greater than or equal to

+    addition
-    subtraction
*    multiplication
/    division
%    modulus

&&   and
||   or

I'm open to alternatives, but this gives us a good grounding.

Perhaps we can just use single =, &, and | instead of the doubles? That might open up a ^ for an XOR or a !& for a NAND. (!| for NOR is going to be fun to read.)

Reserved words

I'd also like to propose that we define a number of reserved properties, like length. This enables functionality like in the container query syntax example (@.length-1). If the user wants to reference an object property named length, they would need to specifically use the indexer syntax (@['length'] - 1) which would fetch the value from the length property of an object, subtract one, and return the result.

Additional reserved words that we may need are open for proposal/discussion.

Open question: Does this enable these reserved keywords outside of the context of a query expression, i.e. does $.length give the number of child items of the root value?

Backtracking?

There aren't any examples in Goessner's post, but it might be desirable to navigate up the JSON structure to fetch a value to be used in an expression.

For example (with Goessner's example data), suppose I wanted to find the books which cost less than the bike. There isn't a way to get from iterating over the book array to outside of the book array where the bicycle is.

Alernatively, I could use the root operator $ to start from the beginning within the expression and do something like $..book[@.price<$..bicycle.price], so maybe this isn't explicitly needed for now.

Restrictions

For these cases, I think it makes sense to require that these internal paths MUST only return a single value. Returning multiple values should remain an explicitly undefined behavior (allows the implementation to decide how to handle it). (However doing something like ?(@..price.length>4) to get "are there more than 4 objects that contain a price property?" does make sense. The full path still just returns a single value, even though the @..price portion returns multiple.)

@gregsdennis
Copy link
Collaborator Author

It just occurred to me that in item queries like ?(@.price<10), the 10 is really just a JSON literal. That could imply that you could have any JSON literal here, including booleans, strings, null, arrays, and objects:

$..name[?(@.values==[1,2,3])]

This raises a question of what JSON equality is. The basic types are fairly straightforward, but the complex types are more... complex. JSON.org states that arrays are ordered and objects are key/value pairs, but not necessarily ordered.

@gregsdennis
Copy link
Collaborator Author

oops.

@gregsdennis
Copy link
Collaborator Author

Also probably needs term grouping, i.e. () support.

@gregsdennis
Copy link
Collaborator Author

Given that @ could support the full JSON Path syntax, there is a question of having indexers with query expressions inside an @-path. I think this is fine. The only question is around scoping for the @.

I think it makes sense to confine the scope to the innermost context. This raises another question around how to reference the current item of any intermediate contexts.

@glyn
Copy link
Collaborator

glyn commented Sep 28, 2020

To address this, we should either define that the scripting language is something well-known (e.g. ECMAScript 2015 or C), or we should define our own language (a domain-specific language, or DSL).

Another option is to omit query expressions (but keep filters). I suspect there's not much consensus around query expressions or use of them for that matter.

@gregsdennis
Copy link
Collaborator Author

gregsdennis commented Sep 28, 2020

Query expression are what you're calling "filters." What are you suggesting we remove? Without expression support, you can't do something like $..[(@.length-1)] without special-casing "@.length-1". What if someone wanted to do $..[(@.length-2)]? The only way to do this is to support expressions in general.

Also pertains to #16 (comment). You can't have the above if - can be part of a key. Otherwise you're saying that length-1 is the key.

Furthermore, I lead this issue with

In his post, Goessner indicates that for [(...)] ("container query") and [?(...)] ("item query"), the contained expression should use "the underlying script engine."

His post explicitly calls them expressions, too:

XPath JSONPath Description
... ... ...
[] ?() applies a filter (script) expression.
n/a () script expression, using the underlying script engine.

Expression support for these kinds of query is integral to JSON Path. You can't just not have it. Half of your query syntax goes away if you don't.

@glyn
Copy link
Collaborator

glyn commented Sep 29, 2020

I think $..[(@.length-1)] is equivalent to $..[-1] and $..[(@.length-2)] is equivalent to $..[-2]. Maybe there are other usages of subtraction "in the wild" which will influence the decision - I'm just not aware of them.

I've not really looked at "container queries" using script expressions because they were so vaguely defined by Gössner. I think it's a good topic for the Working Group to look into.

@gregsdennis
Copy link
Collaborator Author

Those equivalencies are correct, but without expression support, you can't parse the @.length-1 version without explicitly catering for that string or providing general expression support.

I think expressions provide the end user (the path author) a better tool.

@gregsdennis
Copy link
Collaborator Author

I'm thinking of examples like

$[?(@.prop+3<$.alertAt)] // comparison against another value
$[?(@.prop%2=0)]  // finds even numbers
$[?(@.prop="string")]

We don't know what people will want to do with expressions, and I don't like the idea of limiting our support to the simple examples in Goessner's post.

The last example really becomes important if we specify that the location of the value can also be returned.

@glyn
Copy link
Collaborator

glyn commented Sep 30, 2020

For me, filters [?()] seem to offer far more capabilities than "container queries" [()]. I'm wondering whether JSONPath would suffer much from dropping "container queries", especially if there isn't much consensus around them.

@gregsdennis
Copy link
Collaborator Author

I can't say much about usage in the wild, but container queries can do things like $.values[($.index)] which can dynamically select a value based on the value in the property index. I can see value in it, but...

Usage stats probably aren't something we're likely to obtain accurate numbers on, either.

@glyn
Copy link
Collaborator

glyn commented Oct 1, 2020

I can't say much about usage in the wild, but container queries can do things like $.values[($.index)] which can dynamically select a value based on the value in the property index. I can see value in it, but...

That example is pretty compelling. Someone out there must be using it...

@gregsdennis
Copy link
Collaborator Author

gregsdennis commented Oct 6, 2020

A need for expression support in the wild: Get root element using jsonpath based on sub elements condition

@gregsdennis
Copy link
Collaborator Author

gregsdennis commented Oct 8, 2020

The path that I suggested in that SO question is $[?(@.phoneNumbers[*].type=="iPhone")].id, but I think it would be better served with a function syntax (will elaborate in a bit).

The reasoning behind this is the requirement that paths in expressions should only return a single value. However @.phoneNumbers[*].type actually returns multiple values, and we want to ensure that one of those values is "iPhone".

To that end, alongside .length, I would like to propose another reserved word: .contains. The kicker here is that we want to pass a parameter (the value to check for), so we need a parameter list: .contains("iPhone"). This, then, returns a boolean value, just like the == operator.


Extending this, it may make sense to have all reserved words carry a parameter list, even if that list is empty. For example, .length(). This would leave the door open for .length to refer to a property named "length" instead.

(It looks like the Java implementation does this.)

@glyn
Copy link
Collaborator

glyn commented Oct 8, 2020

The Working Group will need to decide whether to extend the syntax or standardise only what's already in use.

@gregsdennis
Copy link
Collaborator Author

gregsdennis commented Oct 17, 2020

I've been thinking about the container expression syntax [(...)] a bit more, and I think it may be useful as an evaluatable index.

If we take (@.length-1) to evaluate to a numeric index, then this opens the door for the expression to also be used in the slice syntax. For example $[:(@.length-2)] would be equivalent to $[:-2]. Not really sure how useful that is. It's not really a good example, but I imagine having the feature would result in some imaginative uses.

I definitely wouldn't suggest this as a feature for the first draft, but it's something to consider.

@glyn
Copy link
Collaborator

glyn commented Nov 5, 2020

Survey of script expression support

The Script expression comparison shows some implementations which support script expressions, or container queries, ([()]):

*: script expressions are not clearly defined

In addition, JMESPath (an alternative to JSONPath) includes an interesting set of built-in functions which are probably candidates for including in script expressions.

@gregsdennis
Copy link
Collaborator Author

Both of my implementations in dotnet support expressions: Manatee.Json & JsonPath.Net

@gregsdennis
Copy link
Collaborator Author

Just logging another question about expressions in the wild about testing for the absence of a property. Their go-to attempt was to use a ! operator.

@gregsdennis
Copy link
Collaborator Author

*: script expressions are not clearly defined - @glyn

This is why we should clearly define the expected support. It should be worded so that implementors may augment the syntax as well (while also providing documentation that such support is non-standard and may not be compatible with other systems).

@gregsdennis
Copy link
Collaborator Author

Note that JMESPath is strict, sum only sums numbers.

Coming from a strongly typed language, this does appeal to me, but I'm biased. Favor language agnosticity.

@glyn
Copy link
Collaborator

glyn commented Mar 12, 2021

=~ Left matches regular expression [?(@.author =~ /Evelyn.*?/)]

I raised #70 to cover the details of regular expressions in filters.

@bettio
Copy link

bettio commented Mar 19, 2021

How do we expect filter working with following JSON input:

[{"k": "1.0"}, {"k": 2}]

and JSONPaths such as $[?(@.k == 1)]?

When using $[?(@.k == 1)] jayway JSONPath outputs the following JSON:

[ ]

When using $[?(@.k == 2.0)] jayway JSONPath outputs the following JSON:

[
   {
      "k" : 2
   }
]

While goessner when using $[?(@.k == 1)]

[
   {
      "k" : "1.0"
   }
]

I think that === should be introduced, however I'm not sure if it is widely supported or not.

@danielaparker
Copy link

danielaparker commented Mar 19, 2021

How do we expect filter working with following JSON input:

[{"k": "1.0"}, {"k": 2}]

and JSONPaths such as $[?(@.k == 1)]?

When using $[?(@.k == 1)] jayway JSONPath outputs the following JSON:

[ ]

When using $[?(@.k == 2.0)] jayway JSONPath outputs the following JSON:

[
   {
      "k" : 2
   }
]

While goessner when using $[?(@.k == 1)]

[
   {It
      "k" : "1.0"
   }
]

I think Goessner is right.

I think that === should be introduced, however I'm not sure if it is widely supported or not.

It's supported in implementations like Goessner that use Javascript for a scripting language. But not in Jayway, which implements its own evaluator. It would look foreign to users coming from a strongly typed background. But with numbers, keep in mind that 1.0 and 1 have the same JSON type.

I prefer the JMESPath approach, which only has "==", but defines type requirements for comparators such as equality operators.

@bettio
Copy link

bettio commented Mar 21, 2021

I'm leaving here a further comment as a reminder that scope for @ needs to be defined, as discussed in #75.

@danielaparker
Copy link

@bettio wrote:

I'm leaving here a further comment as a reminder that scope for @ needs to be defined, as discussed in #75.

See here for my views.

@cabo
Copy link
Member

cabo commented Jan 17, 2022

early discussion that helped in working out the terms.
Nothing actionable appears to remain, closing now.

@cabo cabo closed this as completed Jan 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants