Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial determinism of the descendant selector (..) #103

Closed
glyn opened this issue Jun 15, 2021 · 17 comments · Fixed by #134
Closed

Partial determinism of the descendant selector (..) #103

glyn opened this issue Jun 15, 2021 · 17 comments · Fixed by #134
Assignees
Labels

Comments

@glyn
Copy link
Collaborator

glyn commented Jun 15, 2021

At today's meeting, I agreed to submit a PR to float the idea of partial determinism for the descendants operator, even though the consensus seems to be against any determinism for that operator. Given the descendants operator PR #102 isn't yet merged, I'll start with an issue to capture discussion and avoid rebasing effort.

We probably don't want to force any determinism on implementations, so the idea is to provide a recommendation for implementations which want to provide partial determinism and use "SHOULD" language so the recommendation is optional. Then implementations which opt in will agree on the ordering recommentations.

The benefits of partial determinism are:

  • improved predictability and interoperation
  • relative ease for JSONPath users writing tests, as there are fewer possible outputs.

In PR #102 there are no constraints on the non-determinism of the descendants operator:

The descendant-selector is inspired by ECMAScript for XML (E4X). It selects the node and all its descendants.

We agreed in issue #60 to allow non-determinism in the order in which objects are iterated and the same should be true when the descendants operator encounters an object. But there are other details of the ordering which could be tied down:

  • arrays SHOULD be visited in array order
  • nodes SHOULD be visited strictly before their descendants (so-called pre-order traversal).

(Post-order traversal, in which nodes are visited strictly after their descendants, would be an equally valid alternative.)

One of the reasons given at today's meeting for allowing full non-determinism was for parallel processing, so it would be interesting to discuss what constraints the above rules produce on parallel processing. The order of collection of the results seems to be more significant than whether or not parallel processing is used.

@glyn glyn self-assigned this Jun 15, 2021
@gregsdennis
Copy link
Collaborator

gregsdennis commented Jun 22, 2021

My preference is pre-order traversal (node then descendants).

@ietf-wg-jsonpath ietf-wg-jsonpath deleted a comment from gregsdennis Jun 27, 2021
@ietf-wg-jsonpath ietf-wg-jsonpath deleted a comment from gregsdennis Jun 27, 2021
@cabo
Copy link
Member

cabo commented Jul 23, 2021

Except that I don't know what's "SHOULD" about that, I agree with these partial ordering guidelines.

If we want to introduce different compliance levels, that is fine, but it SHOULD :-) not be coupled to using the term SHOULD vs. MUST.
"Full compliance requires..."
vs. "For conditional compliance, at least..."

@glyn
Copy link
Collaborator Author

glyn commented Jul 23, 2021

I probably misunderstood RFC 2119, or applied it atypically. I am comfortable with using @cabo's compliance level language instead.

@gregsdennis
Copy link
Collaborator

We need to be careful here that we don't optimize for tests. It seems to be a focus of this discussion that we want to make verification of output in the test suite easier.

Rather, we need to focus on supportability and interoperability. If that aligns with running the test, that's good. But don't let making running the tests easier at the cost of making things significantly more difficult for implementors.

It's non-trivial, but also not very complex, to verify unordered sets of nodes in output.

@glyn
Copy link
Collaborator Author

glyn commented Jul 24, 2021 via email

@timbray
Copy link
Contributor

timbray commented Jul 26, 2021

I recommend checking out the language around interoperability in 8259. Something like that might work here.

@glyn
Copy link
Collaborator Author

glyn commented Jul 30, 2021

I'm having a hard time judging the consensus, or lack of it, here. Regardless of the detailed language, please could interested parties indicate which of these options they favour:

A. Don't tie down the non-determinism of .. at all
B. Require .. to visit arrays in array order and visit nodes before their descendants
C. Some other option involving B with the option of A or A with the option of B, using some kind of language TBD.

/cc @goessner

@cabo
Copy link
Member

cabo commented Jul 30, 2021 via email

@remorhaz
Copy link
Contributor

remorhaz commented Sep 9, 2021

B. Require .. to visit arrays in array order and visit nodes before their descendants

I would vote for this option, but in the same time we have slices with negative steps, which naturally do sort of "backwards" iteration in Python. Here is an example - while there's no clear consensus, we see that all implementations with "python-compatible" slices return values in reverse order, no exceptions.

If we determine order of array items visiting (thus determining the order of selected nodes in output), shouldn't we also take into consideration the effect of selectors, and if yes - then should the order of union items affect the order of visit?

We can consider part of selectors as "order neutral" and others are "order affecting", with affecting selectors overriding natural order. At first glance I see slices and unions as order-affecting selectors.

@glyn
Copy link
Collaborator Author

glyn commented Sep 9, 2021

@remorhaz I don't follow. The ordering of .. is not affected by other selectors (e.g. involving negative array steps or unions).

@remorhaz
Copy link
Contributor

remorhaz commented Sep 9, 2021

The ordering of .. is not affected by other selectors (e.g. involving negative array steps or unions).

This is not absolutely clear for me. Let us have the following structure:

[
  [
    [1, 2, 3, 4],
    [5, 6, 7, 8]
  ]
]

And we apply $..[::-1] query. At first we apply .. selector to the root node, and it returns following nodes:

  1. $[0]: [[1, 2, 3, 4], [5, 6, 7, 8]]
  2. $[0][0]: [1, 2, 3, 4]
  3. $[0][0][0]: 1
  4. $[0][0][1]: 2
  5. $[0][0][2]: 3
  6. $[0][0][3]: 4
  7. $[0][1]: [5, 6, 7, 8]
  8. $[0][1][0]: 5
  9. $[0][1][1]: 6
  10. $[0][1][2]: 7
  11. $[0][1][3]: 8

Then we apply [::-1] selector to each these nodes. Applying slice to number will return empty node list, so I consider only arrays below:

$[0]: [[1, 2, 3, 4], [5, 6, 7, 8]]

  1. $[0][1]: [5, 6, 7, 8]
  2. $[0][0]: [1, 2, 3, 4]

$[0][0]: [1, 2, 3, 4]

  1. $[0][0][3]: 4
  2. $[0][0][2]: 3
  3. $[0][0][1]: 2
  4. $[0][0][0]: 1

$[0][1]: [5, 6, 7, 8]

  1. $[0][1][3]: 8
  2. $[0][1][2]: 7
  3. $[0][1][1]: 6
  4. $[0][1][0]: 5

The resulting node list is:

  1. $[0][1]: [5, 6, 7, 8]
  2. $[0][0]: [1, 2, 3, 4]
  3. $[0][0][3]: 4
  4. $[0][0][2]: 3
  5. $[0][0][1]: 2
  6. $[0][0][0]: 1
  7. $[0][1][3]: 8
  8. $[0][1][2]: 7
  9. $[0][1][1]: 6
  10. $[0][1][0]: 5

I didn't notice any problems following both "natural order" while applying .. and "reversed order" while applying [::-1], because they never apply simultaneously. Please show me if I missed something in my reasoning.

@remorhaz
Copy link
Contributor

remorhaz commented Sep 9, 2021

The idea of determinism in arrays itself seems important to me. For example, let some service provides array of some data in some garanteed order that is important for end user, and and another service wants to use JSONPath to filter them on some fancy condition.

If JSONPath provides determined order of results, then it's at least possible to solve the problem with a single query. If no - we'll have no easy way to do it. We will need to extract from JSON original order somehow (with alternative tool or manual parsing), and then use it to rebuild original order of filtered items. It will be much more convenient to receive results in a determined order.

@timbray
Copy link
Contributor

timbray commented Sep 9, 2021 via email

@glyn
Copy link
Collaborator Author

glyn commented Sep 9, 2021

Yes, I think we agree that array order should be preserved and that object order is not guaranteed, but I'd like this issue to focus specifically on the ordering of ...

glyn added a commit that referenced this issue Nov 10, 2021
@cabo cabo added the has PR label Nov 10, 2021
@glyn glyn closed this as completed in #134 Nov 10, 2021
@gregsdennis
Copy link
Collaborator

Added text indicating node-first / children-second ordering.

@glyn
Copy link
Collaborator Author

glyn commented Nov 11, 2021 via email

@gregsdennis
Copy link
Collaborator

Sure. That too. But I don't think that was ever in question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants