Some observations on the 3 March 2021 draft #56

danielaparker · 2021-03-06T12:47:40Z

"dot-child-name" and "child"

The grammar defines a "dot-child-name" as an unquoted string in the dot notation, and a "child" as a quoted string in the bracket notation.

To my tastes, I find "dot-child-name" and "child" awkward when they appear in the body of the text, and would prefer "unquoted-string" (or "identifier", see below) and "quoted-string". Particularly as section 3.6.2.3 begins with "A child is a quoted string." If it's a quoted string, why not call it a "quoted-string"? Generally I find that in other grammars, authors take care to use token names that are intuitive, and natural to use in the body of the text. I'd like to see that more in the JSONPath specification as well.

On a related point, the grammar defines a "dot-child-name" as an unquoted string. However, thanks to Burgmer's excellent comparisons, we know that a majority of implementations support dot notation with single quotes, and a significant minority also support dot notation with double quotes. It seems to me that it would therefore be natural to define

identifier = unquoted-string / quoted-string 

quoted-string = single-quoted-string / double-quoted-string

union-element = quoted-string

selector = dot-child      
    
dot-child = "." identifier

In section 3.6.1, when discussing the dot notation, the term "Union Child" is used, referring to section 3.6.2.3, but this section does not define "Union Child", it defines "child". In my opinion, terms used in the body of the text should always be consistent with the tokens in the grammar, including with respect to case.

Normalized path expressions

The draft states, in section 1.3,

"Where a JSONPath processor uses JSONPath expressions as output paths, these will always be converted to normalized JSONPath expressions which employ the more general bracket-notation. [2] Bracket notation is more general than dot notation and can serve as a canonical form when a JSONPath processor uses JSONPath expressions as output paths."

Remarks:

Is the notation "[2]" intended to be a reference? If so, to what does it refer?
This paragraph uses the term "normalized JSONPath expressions", and is the only place in the draft where this exact expression occurs. "Normalized Path Expression" is used in Section 1.1.
The draft refers to this section in Section 1.1 when introducing the term "Normalized Path Expression", but the above text cannot be considered a definition. Intuitively what is required is a path in the bracket notation with only non-negative indices and single quoted names allowed, both those restrictions are required for canonical form.

Wildcard with bracket notation

The draft gives examples of using a wildcard with the bracket notation, e.g. $.store.book[*].author, but it's not covered in the grammar.

One approach would be to treat it symmetrically as a union element, i.e.

union-element =/ "[" "*" "]"

Note that a query like

$.store.book[*,*]

doesn't make much sense, but it does no harm, and some implementations support that, e.g. Goessner.

Filters

Filters are largely unmentioned in the current draft, apart from examples.

Is it intended to also support filters and expressions as "union elements"?

Referring to https://cburgmer.github.io/json-path-comparison/results/union_with_filter.html,
six out of 41 implementations support union with filter elements.

node

It's unclear what the draft means by "node", which occurs 60 times in the text. In section 3.2, it says "Each node holds a JSON value", but it doesn't say what else the node holds (a position or path to that point?) And then we have "root node which is the input document", which suggests the root node is a value. My own understanding of a node is a path/value pair, and I think the draft needs to be more clear about this term, and to distinguish between root and current nodes, and the corresponding root and current values.

Data Item

Section 1.1 defines the term "Data Item" as follows:

"A structure complying to the generic data model of JSON, i.e., composed of containers, namely JSON objects and arrays, and of atomic data, namely null, true, false, numbers, and text strings. Also called a JSON value."

But if "Data Item" means the same as "JSON value", why introduce it at all? I note that the term "JSON value" occurs 13 times in the draft; "Data Item" occurs 15 times. I think there is some risk, in a document with multiple authors and multiple critics, that people have different preferred terms, and that as a compromise, all terms get used. But this is not helpful to implementers, for implementers, one term is best.

Position

In section 1.1, which introduces terminology, the draft defines Position as follows:

"A JSON data item identical to or nested within the JSON data item to which the query is applied to, expressed either by the value of that data item or by providing a Normalized Path Expression as a JSONPath Output Path."

Remarks

The expression "the JSON data item to which the query is applied to" is awkward, and could be substituted with "root", which is a term used 11 times in the draft from Section 1.3 on. I would suggest introducing "root" in the terminology section, and using it.
I don't understand what this sentence means. What does it mean to say that "Position" is the root or a nested data item and can be expressed by the value of that data item (as well as by a Normalized Path Expression)? Intuitively, my understanding of "Position" is that it is the location of an item within the root, and can be represented by a Normalized Path Expression.

General observation

If I compare the draft to the JMESPath specification, one thing that stands out is that in the JMESPath specification, every definition is expressed in terms of tokens that are defined in the grammar, and the names in the grammar correspond to intuition. This makes the JMESPath specification very clear. But that's not always the case in the draft. For example, the text about Normalized Path Expression uses terms such as "bracket notation" that are not found in the grammar, and the grammar uses names like "child" that cannot be easily understood if used bare without more context.

I think it would be helpful if the draft consistently described things in terms of tokens that are defined in the grammar, and that the names in the grammar are natural ones to use in the text. On that note, I would suggest that "child" is not a natural way to refer to a quoted string in brackets.

My own view is that describing Normalized Path Expression in terms of "union", "union element" and "child" would be awkward, which suggests to me that additional tokens might be introduced into the grammar to represent bracketed expressions with a single item.

The text was updated successfully, but these errors were encountered:

danielaparker closed this as completed Mar 27, 2021

goessner mentioned this issue Apr 30, 2021

Filter Expressions #64

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some observations on the 3 March 2021 draft #56

Some observations on the 3 March 2021 draft #56

danielaparker commented Mar 6, 2021 •

edited

Loading

Some observations on the 3 March 2021 draft #56

Some observations on the 3 March 2021 draft #56

Comments

danielaparker commented Mar 6, 2021 • edited Loading

"dot-child-name" and "child"

Normalized path expressions

Remarks:

Wildcard with bracket notation

Filters

node

Data Item

Position

Remarks

General observation

danielaparker commented Mar 6, 2021 •

edited

Loading