Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processing model #79

Merged
merged 10 commits into from
Mar 27, 2021
Merged

Processing model #79

merged 10 commits into from
Mar 27, 2021

Conversation

cabo
Copy link
Member

@cabo cabo commented Mar 23, 2021

Try to nail down the processing model some more, and raise four discussion points that we might be able to resolve now.

@cabo cabo marked this pull request as draft March 23, 2021 17:04
@cabo cabo mentioned this pull request Mar 23, 2021
@gregsdennis
Copy link
Collaborator

Seems to be related to #14.

@cabo
Copy link
Member Author

cabo commented Mar 23, 2021

Indeed, this tries to reflect the great discussion in #14.

Copy link
Collaborator

@gregsdennis gregsdennis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is coherent, and I like the direction. I've also added a few suggestions/comments.

draft-ietf-jsonpath-base.md Show resolved Hide resolved
draft-ietf-jsonpath-base.md Show resolved Hide resolved
draft-ietf-jsonpath-base.md Outdated Show resolved Hide resolved
draft-ietf-jsonpath-base.md Outdated Show resolved Hide resolved
draft-ietf-jsonpath-base.md Outdated Show resolved Hide resolved
draft-ietf-jsonpath-base.md Show resolved Hide resolved
draft-ietf-jsonpath-base.md Show resolved Hide resolved
Comment on lines 527 to 538
> Discussion: Do we allow selectors that have their own combine1
> variant, or can we always use the same?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely worth discussion, but I think a common "combine" mechanism is simplest to understand. Each selector defining its own forces the Path author to remember how each one behaves, which can get confusing fast.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good reason to stick with a common "combine1" model.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each selector can only select nodes from the subtree of the current node. So with each selector the number of potentially selectable nodes is reduced or stays constant at best. It can never increase ... so "combine" suggests "increase", is misleading and should either be "pick model" or "select model" or something similar.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, a selector operates on the nodes in the input node list.
The "combine1" function is about combining the outputs that each such operation provides.

So if you have [1], [2] as the nodelist (identifying nodes by their values here), and the selector is .[0], the output nodelists from each selector function invocation are 1 and 2 respectively. combine1 would create an output nodelist 1, 2. This is a bit boring, but more complex selectors can leave a nodelist for each node in the input nodelist, so we need to discuss how to combine them:

{a: {a: 1}}, {a: {a: 2}} as input nodelist, ..a as selector: The individual nodelists are {a: 1}, 1 and {a: 2}, 2. combine1 turns this into, say, {a: 1}, 1, {a: 2}, 2.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh well ... it's a simple "append" to the resulting nodelist. According to my mental model allowing only a single resulting nodelist, the sequence of ...

  • {a: 1}
  • 1
  • {a: 2}
  • 2

... would be added, where order is implementation dependent of course (determinism).

Copy link

@danielaparker danielaparker Mar 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@goessner wrote:

Oh well ... it's a simple "append" to the resulting nodelist. According to my mental model allowing only a single resulting nodelist, the sequence of ...

  • {a: 1}
  • 1
  • {a: 2}
  • 2

Just this. That corresponds to my "mental model", and directly to what many implementations do.

... would be added, where order is implementation dependent of course (determinism).

Implementation dependent, yes, but I think the "natural order" would satisfy the minimally restrictive conditions identified by @glyn in #60.

Copy link
Collaborator

@glyn glyn Mar 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, to spell this out, there is one object with more than one key/value pair and thus one source of non-determinism in this example (assuming the visitation ordering of .. will be tied down in the spec). So the other possible resultant nodelist would correspond to:

  • {a: 2}
  • 2
  • {a: 1}
  • 1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This thread proposes an answer to the discussion point.
Maybe we can merge the PR first and then specifically address the discussion point.

Comment on lines 533 to 544
> Discussion: Is that a common requirement? Or can a selector also go
> up, or to the query Argument?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This touches on the function of @, $ within filter queries, and the idea of scope when @ is used in nested filter queries.

I think it's better left for another PR when we've explored these ideas and how they relate.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A selector can never go up, given the information the current node is holding alone, as it lacks parent node info.

Maintaining parent node info during selection process introduces a new – hard to overview – level of complexity and should be avoided.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree a selector can never go up.

@cabo cabo force-pushed the processing-model branch from ea80e86 to 2d5389f Compare March 24, 2021 09:40
Copy link

@danielaparker danielaparker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding "Data Item: A structure complying to the generic data model of JSON, i.e., composed of containers, namely JSON objects and arrays, and of atomic data, namely null, true, false, numbers, and text strings. Also called a JSON value.", this seems to be related to @ttimbrays posts here and here. Perhaps "Data Item" could be dropped and just use "JSON value", as @ttimbrays proposed?

Regarding "Argument: Short name for the JSON data item a JSONPath expression is applied to", this also seems to be related to @ttimbrays post here. Perhaps "Argument" could be replaced with "root value", as @ttimbray suggested?

Note that if a node is regarded as a value/location pair, than a JSON value cannot be considered as a "tree of nodes", or a "node" itself. We can define a "children" property on a JSON value, which can be considered its members or elements if such exist. But we cannot define a "location" property for a JSON value, it's not possible, and "location" seems to be an essential property of a node as understood in the processing model defined in this PR. See here for a related definition of "node" in the XQuery and XPath Data Model 3.1. JSONPath doesn't need a notion of "node" as complicated as XDM, but minimally it needs location, children, and value properties.

I don't think this observation presents a great difficulty for the PR, but I think that mentions of JSON values as "trees of nodes" and remarks that "the term node has the same meaning [as JSON value]" would be better expunged.

@cabo
Copy link
Member Author

cabo commented Mar 24, 2021

It seems these are all terminology comments.
Can we refrain from continuing the Brownian motion on terminology and focus on content for this PR?
Unless of course the terminology is wrong or not sufficiently sharp.
We can go back to terminology discussions after having made some progress.

@cabo cabo marked this pull request as ready for review March 24, 2021 10:59
@cabo
Copy link
Member Author

cabo commented Mar 24, 2021

I have collected these items in #84 so we don't forget where we were.

@cabo
Copy link
Member Author

cabo commented Mar 24, 2021

@cabo, can you at least define the properties of "node" for JSONPath? I think the discussion of the processing model depends on that, just as the discussion of the processing model in XPath depends on the properties of "node" defined in the XDM. I think the essential properties for JSONPath are value and location, but I would like to know your view.

I was thinking that line 183..185 were doing that. But since you insist, in eefc298 I have tried to clarify this paragraph and elevate it to its own definition.

@danielaparker
Copy link

danielaparker commented Mar 24, 2021

@cabo wrote:

I was thinking that line 183..185 were doing that. But since you insist, in eefc298 I have tried to clarify this paragraph and elevate it to its own definition.

Thank you. Your forbearance is appreciated :-)

I was hoping for something more along the lines of XQuery and XPath Data Model 3.1 wording:

"All nodes must satisfy the following general constraints:"

"Elements [Nodes] have the following properties:"

But I understand (and largely agree with) this sentence, which can be considered a working definition:

"A node can be viewed as a combination of a (1) JSON value and (2) its location in the argument; the latter can, if desired, be represented as an Output Path."

I (and no doubt @timbray) would prefer "root value" to "argument", and I would prefer "Normalized Path" to "Output Path" (why use any other than the easily constructed canonical one?) but these then become terminological or minor issues.

@cabo cabo force-pushed the processing-model branch from 953d245 to 0237a69 Compare March 24, 2021 21:19
@cabo cabo changed the base branch from master to main March 25, 2021 08:56
@glyn
Copy link
Collaborator

glyn commented Mar 25, 2021

@cabo This needs rebasing on main. Also the text includes some discussion points and I'm not sure whether your intention is that these be addressed before merging?

Comment on lines 494 to 497
> Discussion: We haven't decided yet whether there is only one
> *combine1* or a choice, with for instance simple concatenation and
> concatenation with duplicate removal, based on a definition of
> duplicate, are candidates.
Copy link
Collaborator

@glyn glyn Mar 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to think we can agree on one combine1. I haven't heard a strong argument against removing duplicate nodes.

Copy link
Collaborator

@glyn glyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving with discussion points left inline. Better to make progress on the rest.

@glyn
Copy link
Collaborator

glyn commented Mar 26, 2021

@gregsdennis and @danielaparker I'd like to get this PR merged with the discussion points inline. That would be a nice stake in the ground. Do either of you have objections to that or are you comfortable with merging?

@cabo cabo force-pushed the processing-model branch from 0237a69 to 1755f1e Compare March 26, 2021 15:46
@cabo
Copy link
Member Author

cabo commented Mar 26, 2021

To make this easier, I rebased on main again, and I numbered the discussion points (D1 to D4).

Copy link
Collaborator

@gregsdennis gregsdennis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this for its purpose.

I'd like to deal with this wording at some point, but it's not crucial to this PR.

@glyn glyn merged commit d5f7cb9 into main Mar 27, 2021
@glyn glyn deleted the processing-model branch March 27, 2021 15:38
@gregsdennis
Copy link
Collaborator

gregsdennis commented Mar 27, 2021

@danielaparker the goal for putting a "processing model" in the spec is to have some algorithm to reference internally. It doesn't need to consider edge cases or have much detail. It just needs to be sufficient for other parts of the spec to use. It's not even intended as guidance. We want to leave as much as possible up to the implementor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants