Determinism? #60

cabo · 2021-03-10T13:02:18Z

Does applying a JSONPath query to a JSON data item always generate the same result?

Some selectors do not have a natural order of the nodelists they return, e.g., $.* applied to a JSON object

We could prescribe an order (e.g., sorting keys lexicographically or in JavaScript order) if determinism is important

We could disallow .* on JSON objects

goessner · 2021-03-10T14:18:55Z

hmm .. depends, how deterministic JSON parsers are when iterating over objects ... they should work in ascending chronological order ... https://stackoverflow.com/questions/42491226/is-json-stringify-deterministic-in-v8 JavaScript 'Object.keys' also guaranties the same order https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object/keys So ... order should be always the same, as long the implementation is the same. How different implementations behave is uncertain at best. But why should we care and heal deficiencies of JSON parsers in that case? I can't see a benefit of always guaranteeing the same -whatever- order. And disallowing iterating over objects by '$.*' might break existing implementations.

glyn · 2021-03-10T16:59:24Z

I can't see a benefit of always guaranteeing the same -whatever- order.

A minor benefit would be that the Compliance Test Suite would be simpler if the output always had a defined implementation-independent ordering.

And disallowing iterating over objects by '$.*' might break existing implementations

Agreed.

cabo · 2021-03-10T17:00:39Z

(Determinism in interop testing is a major benefit for me. But that can be added by reprocessing after the JSONpath processing, so it is not completely a killer argument.)

glyn · 2021-03-10T17:03:37Z

Another selector which requires careful definition for ordering is .., particularly when this recurses over an object or objects.

goessner · 2021-03-10T17:14:50Z

hmm ... compliance testing is a point, sure. Simple first tests of $.obj.* and $..* regarding ordering should be possible with current test suite ... ?

glyn · 2021-03-10T18:16:50Z

hmm ... compliance testing is a point, sure. Simple first tests of $.obj.* and $..* regarding ordering should be possible with current test suite ... ?

Anything is "possible". ;-)

Tests involving objects which have only one name/value pair are trivially easy to test because any non-determinism of object enumeration is irrelevant. But tests involving objects with two or more name/value pairs need to capture the ordering possibilities somehow. Note that the current test suite is a JSON document so there would need to be a declarative way of specifying the ordering possibilities.

gregsdennis · 2021-03-10T19:05:51Z

We covered this topic over in the JSON Schema repo somewhere with quite a lengthy discussion. I'll see if I can dig it up.

In short, parsers in some languages just can't be made to be deterministic. Reading the same object from a file (where the keys are certainly in a defined sequence) will yield differing orderings when printing or even debugging the value.

This topic, therefore, falls into the "language agnosticity" category. I don't think it's something we can enforce.

bettio · 2021-03-12T10:29:00Z

Some selectors do not have a natural order of the nodelists they return, e.g., $.* applied to a JSON object

We could disallow .* on JSON objects

$.* is accepted (and correctly evalueted) by ~93% of the implementations on the JSONPath comparison.

Each of them has a number of users, it is really important to not disrupt their existing expressions and their habits.

We could prescribe an order (e.g., sorting keys lexicographically or in JavaScript order) if determinism is important

I don't think determinism is that important, $.* on a object evals to an array but semantically it is an unordered set (which is not a JSON type, therefore a different type is used ¹) which is still a meaningful result and developers should remember that. Hence we shouldn't discard this feature.

Anyway what I found confusing when implementing ExJSONPath was the lack of any official recommendation about $.object.* on a object behaviour.

I think we should clearly state that the result has to be treated as unordered but we might suggest using lexicographic order for validation purposes when possibile (I think a huge number of implementations are using ordered maps for JSON objects storage so it shouldn't be hard).

Determinism in interop testing is a major benefit for me.

Removing an implemented fature used from a number of users for testing purposes is something that I wish to not discuss.

hmm .. depends, how deterministic JSON parsers are when iterating over
objects ... they should work in ascending chronological order ...

I think a number of JSONPath implementations are completely decoupled from the JSON parser, and they just accepted maps/lists/atomic-values as input (ExJSONPath is one of them, it doesn't depend on any JSON parser and it doesn't care about JSON parsing).

I'm not even sure if JSONPath is only used for JSON inputs, it works so nicely with any JSON-like structure.

¹ I'm not 100% sure that all implementations are using JSON types, there might be implementations using unordered sets for real as containers for $.object.* results.

gregsdennis · 2021-03-12T11:25:41Z

Each of them has a number of users, it is really important to not disrupt their existing expressions and their habits.

A noble goal, to be sure. But I imagine that sudden conformance to a spec would natural engender breaking changes, especially in a landscape of Implementation as large as we're dealing with. I don't think it'll be avoided.

As such, my feeling is that we get the spec right rather than make concessions just because the majority does it some way.

I think a number of JSONPath implementations are completely decoupled from the JSON parser...

Absolutely correct.

cabo · 2021-03-12T11:31:34Z

On 2021-03-12, at 12:26, Greg Dennis ***@***.***> wrote: As such, my feeling is that we get the spec right rather than make concessions just because the majority does it some way.

We have some of this in our DNA, but we also have a clear mandate not to disregard the installed base (this is JSONpath, not a new protocol based on JSONpath). So any argument on this spectrum will need to be carefully weighed, because both extremes are wrong. Grüße, Carsten

cabo · 2021-03-13T13:54:10Z

I think the summary so far is

We recognize that there are operations in JSONPath whose result is ordered but that order needs to be created based on unordered input.
We are not going to cut off these noses to spite our face.
We have a choice between imposing a language-defined order (e.g., sorting) or leaving the result non-deterministic.

I gather that there is a general direction toward non-determinism, as imposing an order would be too expensive, but I'm not sure we have a conclusion here.

Also: https://mailarchive.ietf.org/arch/msg/jsonpath/N_emdQBtZg3xTuVQBFyXcjAR4ZE

timbray · 2021-03-16T19:18:37Z

I agree with Carsten's general direction in support of nondeterminism, simply because 8259 explicitly discourages assigning semantics to the ordering of object members. If we want determinism, we probably have to specify RFC8785 (JCS) conformance, which doesn't feel reasonable. As for testing, anyone who's doing this seriously in the real world has to have an JSONObject.equals() method already, right?

glyn · 2021-03-17T07:46:11Z

I also agree with supporting non-determinism, but please note that JSONObject.equals() won't necessarily help because the non-deterministic portions of the output won't necessarily be part of a JSONObject. For example, applying .* to an input consisting of a JSON object will produce a node list in a non-deterministic order.

I think we'd either need to decorate such tests with an indication of where non-determinism can occur so that each test framework can compare the output to the expected value and take into account non-determinism or we'd need to have a way of capturing a set of possible alternative outputs. The latter may be preferable as it would complicate the generation of tests, which is a one-off process, but would simplify implementing test frameworks, of which there are likely to be many. (There may be other solutions too - haven't spent very long on this.)

gregsdennis · 2021-03-17T08:23:49Z

FWIW, in implementing JSON Schema, I've found that I need both contents-equality and sequence-equality on arrays depending on the circumstance.

Given that JSON Path outputs an array, and the contents of that array are likely to be generated non-deterministically due to evaluating objects, I think it makes sense for implementations to verify evaluation results (e.g. when processing a test suite) in a "contents-equality" sense. It's not per the JSON definition of array equality (which mandates sequence-equality), but I think the circumstance fits.

The fact that the result array is not guaranteed to be deterministic would need to be declared in the specification, probably associated with a remark about it deviating from JSON array equalilty and perhaps even an explanation why.

cabo · 2021-03-17T09:33:10Z

Nodelists are not arrays.
(They can be represented in a JSONPath API as arrays. Or as lists. Or as tuples. ...)

goessner · 2021-03-17T14:21:51Z

I do agree with common non-determinism here. Naming results as nodelists is clever. Shouldn't we define that term 'nodelist' then ?

cabo · 2021-03-17T14:33:20Z

Yes, we should define nodelist. I made a first attempt in PR #72 now.

gregsdennis · 2021-03-17T18:27:37Z

We need to tread carefully here. Are we saying we're comfortable outputting something that's not JSON but acts really similar?

All implementations (including @goessner) output an array (or optionally false) as JSON. If we suddenly call the output a "nodelist" are we breaking that?

cabo · 2021-03-17T19:16:05Z

If we suddenly call the output a "nodelist" are we breaking that?

No. I agree that many APIs will output platform arrays here (which may or may not feel similar to JSON arrays). Saying that the output is a nodelist is gives them freedom to define their API and how these present the nodelist.

More importantly, we make processing for embedded queries less finicky.

cabo · 2021-03-18T17:05:30Z

Daniel, nobody can (or will try to) stop you thinking about a nodelist as a JSON array, if that suits your mental model.
Giving a separate name to that concept does have its advantages.
It, for instance, helps finding answers to the role of nested queries in filter expressions.
It also helps reflect the fact that some implementations handle nodelists different from others.
So I don't know why we are having this discussion.
(And why is it under "Determinism?".)

gregsdennis · 2021-03-18T18:52:26Z

It also helps reflect the fact that some implementations handle nodelists different from others.

This is irrelevant because of many of the points that @danielaparker mentioned: essentially, implementations don't use nodelists from what we can tell, they use arrays. And isn't our charter to break existing implementations as little as possible? If all implementations use arrays, changing that is not something we should consider lightly.

@cabo We're discussing it here because of your comment. Following some discussion elsewhere about expressions, you spammed "nodelist" comments anywhere it might be related in order to promote the idea before anyone had a chance to discuss it. So now we're discussing it.

All of that said, I think we should consider the output format (#23 & #44), determinism (this issue), and paths embedded in expressions (#74) together. There is some subtle interplay going on.

If the output is an array of objects that specify location and value, then both that format and the sequence of the items within it need to be considered when returning from an embedded path so that the remainder of the expression can handle it properly.

If embedded paths are to follow some different output that operators can then use, we need to specify what that is as well. Maybe that is the idea of a nodelist. But we need to discuss it first and come to some level of agreement.

cabo · 2021-03-18T22:04:45Z

It is a pretty common technique in model building to identify important concepts, and once one has identified them, to give them (tentative) names. That allows the participants to refer to the concept-in-formation, and to gradually increase certainty about it. Once there is enough certainty, it may be possible to identify some concepts with each other and turn them into one, but you don't do that before achieving that certainty.

I think much of the confusion comes from a perception that whatever is in the draft at one time is cast in concrete. That is not a healthy way to run this, and maybe I was making an unwarranted assumption that this would be obvious.

So, indeed, feedback is good, but not so much about reasons wanting to eliminate/equate concepts prematurely and end the discussion about them before understanding them, but more about the properties of the concept itself. The fact that XPath arrived at a specific form of nodelist after trying nodesets is certainly interesting feedback. We may not need that complexity, but it is much easier to eliminate complexity once we understand the concepts.

glyn · 2021-03-19T06:26:50Z

Given that JSON Path outputs an array, and the contents of that array are likely to be generated non-deterministically due to evaluating objects, I think it makes sense for implementations to verify evaluation results (e.g. when processing a test suite) in a "contents-equality" sense. It's not per the JSON definition of array equality (which mandates sequence-equality), but I think the circumstance fits.

I'm not sure this is sufficient. Let me explain why with an example.

Take the input argument:

{ "a": {"c": 0, "d": 1},
  "b": {"e": 2, "f": 3}
}

and the selector $.*.*.

I think the valid outputs should be:

[0, 1, 2, 3]
[0, 1, 3, 2]
[1, 0, 2, 3]
[1, 0, 3, 2]
[2, 3, 0, 1]
[2, 3, 1, 0]
[3, 2, 0, 1]
[3, 2, 1, 0]

But I think the following output, for example, should be invalid:

[0, 2, 1, 3]

The spec should allow for this level of non-determinism, but not more.

Similarly, if the Compliance Test Suite covers this example, it should allow precisely the valid outputs to pass. I don't see how a "contents-equality" comparison of arrays would help. This makes me favour encoding sets of possible outputs in the CTS as described in #60 (comment).

cabo · 2021-03-19T07:25:58Z

[0, 1, 2, 3]

[0, 1, 3, 2]

[1, 0, 2, 3]

[1, 0, 3, 2]

Actually, [2, 3, 0, 1] etc. as well (the outer container also is a map).

glyn · 2021-03-19T08:49:50Z

Actually, [2, 3, 0, 1] etc. as well (the outer container also is a map).

Whoops! Yes. Edited the list so it is now complete.

gregsdennis · 2021-03-19T09:51:59Z

(@glyn) But I think the following output, for example, should be invalid:

[0, 2, 1, 3]

I'm curious why you think this should be invalid. I understand how you got the result, and if you include (or replace with) the paths to those values, it's just as (or more) clear.

Partial determinism is going to create a bunch of weird edge cases that will be impossible to completely define.

cabo · 2021-03-19T09:57:30Z

That of course depends on the semantics of the nodelist. If, as I'm assuming, it is sequence-preserving, you cannot arrive at [0, 2, 1, 3], which would be (JSON pointer syntax, I'm lazy) /a/c, /b/e, /a/d, /b/f.

goessner · 2021-03-19T10:27:02Z

This is quite a complicated discussion. I'm trying to sum things up:

JSON objects as maps are unordered by nature.
Due to that, JSONPath query results are non-deterministic regarding their order – which is common sense here.
But determinism i.e. reproducability of JSONPath query results is required for compliance tests.
Now implementations usually return query results as an array – pragmatically JSON in ... JSON out.
JSON arrays are ordered by nature, which might falsy signal determinism of results.
So calling the result a "node-list" might help here out, but introduces another level of complexity in terms and their usage.

Having understood, that determinism is solely required for compliance testing, I am not sure, how to interprete Glyn's example above. Should future compliance tests really tell about every possible valid and invalid result ordering?

I am sure someone of us would be able to implement something, exactly reproducing [0, 2, 1, 3] as result (noone would of course). Thus it's implementation dependent.

But if we only need to insure, to always generate a single reproducable result, we can achieve this by something like ...

Generate a list of output pathes.
Sort it lexically or however.
Generate the list of values from that path list – preserving order.

... which we can even communicate in the draft as a recipe.

Regarding "node-list", there is indeed in JavaScript a concept of "array-like objects". Those objects only need to be able to tell the list length and provide an iterator to visit all entries. Defacto dom node-lists are exactly such things. This is what above discussion also reminds me on.

Maybe I still miss another important point.

glyn · 2021-03-19T10:31:27Z

* But determinism i.e. reproducability of JSONPath query results is required for compliance tests.

Neither determinism of the spec nor determinism for a given JSONPath implementation are necessary for compliance tests, but non-determinism does add some complexity to testing. That said, I am in favour of non-determinism in the spec because it gives the most freedom to implementations.

glyn · 2021-03-19T10:38:49Z

(@glyn) But I think the following output, for example, should be invalid:

[0, 2, 1, 3]

I'm curious why you think this should be invalid.

Because I would expect each "stage" of a selector to be evaluated in turn. In other words, the intermediate results which are output from one stage to the next can be in a non-deterministic order, but that order won't be modified by other stages as those stages are operating on the individual values from earlier stages.

I understand how you got the result, and if you include (or replace with) the paths to those values, it's just as (or more) clear.

Sorry, but what is more clear?

Partial determinism is going to create a bunch of weird edge cases that will be impossible to completely define.

I don't think so. I think once we have addressed the non-determinism of specific selector "stages" applied to objects, that's pretty much it. What you describes as "partial determinism" is simply a result of how the spec is constructed from small pieces in a uniform way.

glyn · 2021-03-19T10:43:00Z

To be clear, non-determinism in the spec doesn't force implementations to be non-deterministic. But it allows implementations to trade off efficiency against non-determinism. It also allows for variations between (deterministic) implementations.

gregsdennis · 2021-05-12T09:00:29Z

Also related to (maybe duplicates) #27.

cabo · 2021-11-10T10:46:57Z

Closed by #134

gregsdennis mentioned this issue Mar 17, 2021

Align (and simplify) terminology. #72

Merged

danielaparker mentioned this issue Mar 24, 2021

Processing model #79

Merged

glyn mentioned this issue Jun 15, 2021

Partial determinism of the descendant selector (..) #103

Closed

cabo closed this as completed Nov 10, 2021

glyn mentioned this issue Mar 9, 2022

Is it possible to provide JSONPath test cases based on consensus results cburgmer/json-path-comparison#104

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Determinism? #60

Determinism? #60

cabo commented Mar 10, 2021

goessner commented Mar 10, 2021 via email •

edited

Loading

glyn commented Mar 10, 2021

cabo commented Mar 10, 2021

glyn commented Mar 10, 2021

goessner commented Mar 10, 2021

glyn commented Mar 10, 2021

gregsdennis commented Mar 10, 2021

bettio commented Mar 12, 2021

gregsdennis commented Mar 12, 2021

cabo commented Mar 12, 2021 via email

cabo commented Mar 13, 2021

timbray commented Mar 16, 2021

glyn commented Mar 17, 2021

gregsdennis commented Mar 17, 2021

cabo commented Mar 17, 2021

goessner commented Mar 17, 2021

cabo commented Mar 17, 2021

gregsdennis commented Mar 17, 2021

cabo commented Mar 17, 2021 •

edited

Loading

cabo commented Mar 18, 2021

gregsdennis commented Mar 18, 2021 •

edited

Loading

cabo commented Mar 18, 2021 •

edited

Loading

glyn commented Mar 19, 2021 •

edited

Loading

cabo commented Mar 19, 2021

glyn commented Mar 19, 2021

gregsdennis commented Mar 19, 2021

cabo commented Mar 19, 2021

goessner commented Mar 19, 2021 •

edited

Loading

glyn commented Mar 19, 2021

glyn commented Mar 19, 2021

glyn commented Mar 19, 2021

gregsdennis commented May 12, 2021

cabo commented Nov 10, 2021

Determinism? #60

Determinism? #60

Comments

cabo commented Mar 10, 2021

goessner commented Mar 10, 2021 via email • edited Loading

glyn commented Mar 10, 2021

cabo commented Mar 10, 2021

glyn commented Mar 10, 2021

goessner commented Mar 10, 2021

glyn commented Mar 10, 2021

gregsdennis commented Mar 10, 2021

bettio commented Mar 12, 2021

gregsdennis commented Mar 12, 2021

cabo commented Mar 12, 2021 via email

cabo commented Mar 13, 2021

timbray commented Mar 16, 2021

glyn commented Mar 17, 2021

gregsdennis commented Mar 17, 2021

cabo commented Mar 17, 2021

goessner commented Mar 17, 2021

cabo commented Mar 17, 2021

gregsdennis commented Mar 17, 2021

cabo commented Mar 17, 2021 • edited Loading

cabo commented Mar 18, 2021

gregsdennis commented Mar 18, 2021 • edited Loading

cabo commented Mar 18, 2021 • edited Loading

glyn commented Mar 19, 2021 • edited Loading

cabo commented Mar 19, 2021

glyn commented Mar 19, 2021

gregsdennis commented Mar 19, 2021

cabo commented Mar 19, 2021

goessner commented Mar 19, 2021 • edited Loading

glyn commented Mar 19, 2021

glyn commented Mar 19, 2021

glyn commented Mar 19, 2021

gregsdennis commented May 12, 2021

cabo commented Nov 10, 2021

goessner commented Mar 10, 2021 via email •

edited

Loading

cabo commented Mar 17, 2021 •

edited

Loading

gregsdennis commented Mar 18, 2021 •

edited

Loading

cabo commented Mar 18, 2021 •

edited

Loading

glyn commented Mar 19, 2021 •

edited

Loading

goessner commented Mar 19, 2021 •

edited

Loading