-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Determinism? #60
Comments
hmm .. depends, how deterministic JSON parsers are when iterating over
objects ... they should work in ascending chronological order ...
https://stackoverflow.com/questions/42491226/is-json-stringify-deterministic-in-v8
JavaScript 'Object.keys' also guaranties the same order
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object/keys
So ... order should be always the same, as long the implementation is
the same. How different implementations behave is uncertain at best.
But why should we care and heal deficiencies of JSON parsers in that case?
I can't see a benefit of always guaranteeing the same -whatever- order.
And disallowing iterating over objects by '$.*' might break existing
implementations.
|
A minor benefit would be that the Compliance Test Suite would be simpler if the output always had a defined implementation-independent ordering.
Agreed. |
(Determinism in interop testing is a major benefit for me. But that can be added by reprocessing after the JSONpath processing, so it is not completely a killer argument.) |
Another selector which requires careful definition for ordering is |
hmm ... compliance testing is a point, sure. Simple first tests of |
Anything is "possible". ;-) Tests involving objects which have only one name/value pair are trivially easy to test because any non-determinism of object enumeration is irrelevant. But tests involving objects with two or more name/value pairs need to capture the ordering possibilities somehow. Note that the current test suite is a JSON document so there would need to be a declarative way of specifying the ordering possibilities. |
We covered this topic over in the JSON Schema repo somewhere with quite a lengthy discussion. I'll see if I can dig it up. In short, parsers in some languages just can't be made to be deterministic. Reading the same object from a file (where the keys are certainly in a defined sequence) will yield differing orderings when printing or even debugging the value. This topic, therefore, falls into the "language agnosticity" category. I don't think it's something we can enforce. |
Each of them has a number of users, it is really important to not disrupt their existing expressions and their habits.
I don't think determinism is that important, Anyway what I found confusing when implementing ExJSONPath was the lack of any official recommendation about I think we should clearly state that the result has to be treated as unordered but we might suggest using lexicographic order for validation purposes when possibile (I think a huge number of implementations are using ordered maps for JSON objects storage so it shouldn't be hard).
Removing an implemented fature used from a number of users for testing purposes is something that I wish to not discuss.
I think a number of JSONPath implementations are completely decoupled from the JSON parser, and they just accepted maps/lists/atomic-values as input (ExJSONPath is one of them, it doesn't depend on any JSON parser and it doesn't care about JSON parsing). I'm not even sure if JSONPath is only used for JSON inputs, it works so nicely with any JSON-like structure. ¹ I'm not 100% sure that all implementations are using JSON types, there might be implementations using unordered sets for real as containers for |
A noble goal, to be sure. But I imagine that sudden conformance to a spec would natural engender breaking changes, especially in a landscape of Implementation as large as we're dealing with. I don't think it'll be avoided. As such, my feeling is that we get the spec right rather than make concessions just because the majority does it some way.
Absolutely correct. |
On 2021-03-12, at 12:26, Greg Dennis ***@***.***> wrote:
As such, my feeling is that we get the spec right rather than make concessions just because the majority does it some way.
We have some of this in our DNA, but we also have a clear mandate not to disregard the installed base (this is JSONpath, not a new protocol based on JSONpath).
So any argument on this spectrum will need to be carefully weighed, because both extremes are wrong.
Grüße, Carsten
|
I think the summary so far is
I gather that there is a general direction toward non-determinism, as imposing an order would be too expensive, but I'm not sure we have a conclusion here. Also: https://mailarchive.ietf.org/arch/msg/jsonpath/N_emdQBtZg3xTuVQBFyXcjAR4ZE |
I agree with Carsten's general direction in support of nondeterminism, simply because 8259 explicitly discourages assigning semantics to the ordering of object members. If we want determinism, we probably have to specify RFC8785 (JCS) conformance, which doesn't feel reasonable. As for testing, anyone who's doing this seriously in the real world has to have an JSONObject.equals() method already, right? |
I also agree with supporting non-determinism, but please note that JSONObject.equals() won't necessarily help because the non-deterministic portions of the output won't necessarily be part of a JSONObject. For example, applying I think we'd either need to decorate such tests with an indication of where non-determinism can occur so that each test framework can compare the output to the expected value and take into account non-determinism or we'd need to have a way of capturing a set of possible alternative outputs. The latter may be preferable as it would complicate the generation of tests, which is a one-off process, but would simplify implementing test frameworks, of which there are likely to be many. (There may be other solutions too - haven't spent very long on this.) |
FWIW, in implementing JSON Schema, I've found that I need both contents-equality and sequence-equality on arrays depending on the circumstance. Given that JSON Path outputs an array, and the contents of that array are likely to be generated non-deterministically due to evaluating objects, I think it makes sense for implementations to verify evaluation results (e.g. when processing a test suite) in a "contents-equality" sense. It's not per the JSON definition of array equality (which mandates sequence-equality), but I think the circumstance fits. The fact that the result array is not guaranteed to be deterministic would need to be declared in the specification, probably associated with a remark about it deviating from JSON array equalilty and perhaps even an explanation why. |
Nodelists are not arrays. |
I do agree with common non-determinism here. Naming results as nodelists is clever. Shouldn't we define that term 'nodelist' then ? |
Yes, we should define nodelist. I made a first attempt in PR #72 now. |
We need to tread carefully here. Are we saying we're comfortable outputting something that's not JSON but acts really similar? All implementations (including @goessner) output an array (or optionally |
No. I agree that many APIs will output platform arrays here (which may or may not feel similar to JSON arrays). Saying that the output is a nodelist is gives them freedom to define their API and how these present the nodelist. More importantly, we make processing for embedded queries less finicky. |
Daniel, nobody can (or will try to) stop you thinking about a nodelist as a JSON array, if that suits your mental model. |
This is irrelevant because of many of the points that @danielaparker mentioned: essentially, implementations don't use nodelists from what we can tell, they use arrays. And isn't our charter to break existing implementations as little as possible? If all implementations use arrays, changing that is not something we should consider lightly. @cabo We're discussing it here because of your comment. Following some discussion elsewhere about expressions, you spammed "nodelist" comments anywhere it might be related in order to promote the idea before anyone had a chance to discuss it. So now we're discussing it. All of that said, I think we should consider the output format (#23 & #44), determinism (this issue), and paths embedded in expressions (#74) together. There is some subtle interplay going on. If the output is an array of objects that specify location and value, then both that format and the sequence of the items within it need to be considered when returning from an embedded path so that the remainder of the expression can handle it properly. If embedded paths are to follow some different output that operators can then use, we need to specify what that is as well. Maybe that is the idea of a nodelist. But we need to discuss it first and come to some level of agreement. |
It is a pretty common technique in model building to identify important concepts, and once one has identified them, to give them (tentative) names. That allows the participants to refer to the concept-in-formation, and to gradually increase certainty about it. Once there is enough certainty, it may be possible to identify some concepts with each other and turn them into one, but you don't do that before achieving that certainty. I think much of the confusion comes from a perception that whatever is in the draft at one time is cast in concrete. That is not a healthy way to run this, and maybe I was making an unwarranted assumption that this would be obvious. So, indeed, feedback is good, but not so much about reasons wanting to eliminate/equate concepts prematurely and end the discussion about them before understanding them, but more about the properties of the concept itself. The fact that XPath arrived at a specific form of nodelist after trying nodesets is certainly interesting feedback. We may not need that complexity, but it is much easier to eliminate complexity once we understand the concepts. |
I'm not sure this is sufficient. Let me explain why with an example. Take the input argument:
and the selector I think the valid outputs should be:
But I think the following output, for example, should be invalid:
The spec should allow for this level of non-determinism, but not more. Similarly, if the Compliance Test Suite covers this example, it should allow precisely the valid outputs to pass. I don't see how a "contents-equality" comparison of arrays would help. This makes me favour encoding sets of possible outputs in the CTS as described in #60 (comment). |
Actually, [2, 3, 0, 1] etc. as well (the outer container also is a map). |
Whoops! Yes. Edited the list so it is now complete. |
I'm curious why you think this should be invalid. I understand how you got the result, and if you include (or replace with) the paths to those values, it's just as (or more) clear. Partial determinism is going to create a bunch of weird edge cases that will be impossible to completely define. |
That of course depends on the semantics of the nodelist. If, as I'm assuming, it is sequence-preserving, you cannot arrive at [0, 2, 1, 3], which would be (JSON pointer syntax, I'm lazy) /a/c, /b/e, /a/d, /b/f. |
This is quite a complicated discussion. I'm trying to sum things up:
Having understood, that determinism is solely required for compliance testing, I am not sure, how to interprete Glyn's example above. Should future compliance tests really tell about every possible valid and invalid result ordering? I am sure someone of us would be able to implement something, exactly reproducing But if we only need to insure, to always generate a single reproducable result, we can achieve this by something like ...
... which we can even communicate in the draft as a recipe. Regarding "node-list", there is indeed in JavaScript a concept of "array-like objects". Those objects only need to be able to tell the list length and provide an iterator to visit all entries. Defacto dom node-lists are exactly such things. This is what above discussion also reminds me on. Maybe I still miss another important point. |
Neither determinism of the spec nor determinism for a given JSONPath implementation are necessary for compliance tests, but non-determinism does add some complexity to testing. That said, I am in favour of non-determinism in the spec because it gives the most freedom to implementations. |
Because I would expect each "stage" of a selector to be evaluated in turn. In other words, the intermediate results which are output from one stage to the next can be in a non-deterministic order, but that order won't be modified by other stages as those stages are operating on the individual values from earlier stages.
Sorry, but what is more clear?
I don't think so. I think once we have addressed the non-determinism of specific selector "stages" applied to objects, that's pretty much it. What you describes as "partial determinism" is simply a result of how the spec is constructed from small pieces in a uniform way. |
To be clear, non-determinism in the spec doesn't force implementations to be non-deterministic. But it allows implementations to trade off efficiency against non-determinism. It also allows for variations between (deterministic) implementations. |
Also related to (maybe duplicates) #27. |
Closed by #134 |
Does applying a JSONPath query to a JSON data item always generate the same result?
Some selectors do not have a natural order of the nodelists they return, e.g., $.* applied to a JSON object
We could prescribe an order (e.g., sorting keys lexicographically or in JavaScript order) if determinism is important
We could disallow .* on JSON objects
The text was updated successfully, but these errors were encountered: