-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Initial Feedback] New like
and match
keywords
#29
Comments
I agree that having a static published specification of syntax and semantics to reference makes this feasible, although it won't preclude an endless stream of requests for extension. Regardless, https://www.ietf.org/archive/id/draft-bormann-jsonpath-iregexp-02.html is a draft, and expired at that (although its latest successor, https://www.ietf.org/archive/id/draft-ietf-jsonpath-iregexp-04.html , is still active). Relying on draft documents is generally considered bad form, so would the plan be to pursue this only if draft-ietf-jsonpath-iregexp advances to RFC? |
Note also that extending |
The latest version is still a draft but appears more and more to be in line with a porentially future accepted standard. But yes, the value is to rely on a shared common agreed-upon specification, to avoid implementation-specific regex dialects. I think this is the closest we can do short of creating a spec ourselves, which I would not have the hubris to even attempt. |
I initially thought of an alternative that I think I would be inline with, but I did not include this in the initial feedback pitch. It would look like this: expression =/ like-expression / match-expression
like-expression = expression "like" expression
match-expression = expression "match" expression We could even – although I do not see why we should – restrict to: expression =/ like-expression / match-expression
like-expression = expression "like" raw-string
match-expression = expression "match" raw-string If we are worried to allow dynamically created regular expressions. |
Maybe we could split this proposal into two.… Maybe the |
Maybe it's better to implement these like other string manipulation functions? E.g. |
Also this way we can add regex replacement functionality, like |
@hell-racer functions are definitely possible and are currently the main way to extend JMESPath. Nothing precludes shipping a high quality library of functions. However, extending using functions poses its own challenges, for instance, when JMESPath is embedded into a third-party tool. I think it makes sense to bring some key features into the core language that all library implementations must abide by. Hence my proposal using keywords instead. Although I’m certainly happy to listen to the pros and cons and hear about suggested alternatives. At another level, although I would love JMESPath to have builtin ability to search, capture, extract and replace text using regular expressions, I realize that programming languages have somewhat incompatible dialects of regex. I’m not comfortable specifying a function whose behaviour depends on whatever programming language a particular library happens to be written in. For instance, this would make it impossible to share a common suite of compliance tests that all implementations can rely on to assess compliance. This proposal is deliberate and focuses on a narrow but universal use case, which is to match only. This happens to be inline with the I-Regexp initiative that produces a spec that is unambiguous and easy to follow. Maybe, with experience, we could foresee a future where that spec is extended to support capture groups and replacement. So, to sum up, using functions rather than keywords is a matter of style. Irrespective of the syntax we choose, I think having a defined common spec for what a valid regex syntax is in JMESPath is very important. |
Regex is an entire language by itself, so including all its functionality into compliance tests is somewhat unnecessary. So, I think we'll have to include some lowest common denominator into compliance tests. Also, when a developer uses JmesPath in some project written in some language, he or she also almost certainly uses Regex in that same language too, so it would be strange if the Regex itself works differently than Regex in JmesPath - so it's understandable if Regex in JmesPath works the same way that language/framework does. Does it make sense? |
Another thing to keep in mind, if the |
Indeed, that is the exact point of defining a spec for an interoperable subset of the Regex language.
It makes total sense but I beg to disagree. The purpose of promoting a spec for JMESPath is so that all implementations – irrespective of their differences – can agree on some common semantics and behaviour for the language. I think the proposal is "elegant" – in all modesty – because:
This means that from an implementation perspective, the only thing that’s required is to:
That’s what I included in my reference implementations in Python and TypeScript. from iregexp import check
from iregexp import toPCRE
import re
## check syntax is valid I-Regexp
succeeded = check('[aeiouy]*')
## returns PCRE-compatible expression
regex = toPCRE('.*', anchor = True)
re.compile(regex)
re.match('aaaa') |
I think I disagree with this... changing syntax is a much bigger deal that breaks forward compatibility of old implementations. There should generally be a bias towards functions, which have already-established syntax and can be retroactively added.
💯 agree, and not just syntax but also semantics. |
@gibson042 understood. Although, for the record, my proposal does not break any syntax. It enables syntax that was previously not valid. As for breaking forward compatibility with old implementations I understand. But, this ship has sailed, I'm afraid, with the recent approval of the Although, to be fair, the previous design using a |
I'm not saying syntax should never change, rather that it should change only when the alternative is impossible or impractical (as was the case with |
I’ve come to the common understanding that For the record, I would like to propose the following signatures:
For the record this posts tracks the latest version of the spec: The assumption being that this proposal is subject of the linked-to specification attaining RFC status. |
The proposal for an interoperable regex syntax is nearing RFC 9485 status. 😉 |
I have been thinking that comparisons using new
like
andmatch
keywords would be a natural extension.The
comparator
rule would be extended like so:Are these features that would be of interest?
Like
items[? foo like '**/*.json' ]
The
like
comparator would match simple SQL-like, wildcard-like or glob-like patterns.I have investigated adding such contextual keywords to the language and found that it would probably easier with lex/yacc-based implementations, Nevertheless, using the top-down parser approach, one simply has to account for those keywords between the
nud()
andled()
calls in the main parsing loop.This makes the parsing algorithm less "pure" but I’m pretty sure more knowledgeable people might come up with more elegant designs.
Match
The
match
comparator would match simple regular expressions.items[? foo match 'ba[rz]' ]
I know that regular expressions have been a touchy subject in the past but I strongly believe we can make this work for JMESPath with:
true
orfalse
.I have come up with a prototype for a reference implementation of a simple push-down automaton-based checker and the implementation is reasonaly tight and compact.
I believe most languages in which a JMESPath implementation exists do indeed support the interoperable subset.
The idea from the standazdization document referred to above is to:
The standardization documents lists mappings for ECMAScript, PCRE, RE2, Go and Ruby dialects.
Once relying on such a compact library, the implementation is in fact really easy.
The text was updated successfully, but these errors were encountered: