-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LINQ-like regular expressions #471
Comments
Putting into 'Area-Language' for now (due to the LINQ reference). However the particular RegExp example is probably more library related than language related. Removed Type-Defect label. |
This comment was originally written by [email protected]
I have to disagree. Common regexp syntax is indeed very terse, but that is actually a good thing. Regexp is not cryptic at all, it is in fact precisely the string that you want to find (!!) with some special syntax for expressing that more variants are possible on certain places. A regular expression for matching string "abc" is exactly this, "abc". A regular expression for matching one digit is "[0-9]", for one or more digits "[0-9]+" -- and I really wouldn't like to write "atLeast 1 number" instead. (And what exactly does "number" mean here? An integer or decimal number? A digit only? Every Unicode codepoint that denotes a digit in some language or 0-9 only?) A lot of people have troubles constructing and understanding regexps, sure, but a lot of people do not. And from my experience, people get used to the regexp syntax pretty quickly. Some alternative means of constructing regexps might be good, but maybe allowing comments in regexps like Perl allows would be good enough (but that would allow regexp literals, I think). |
This comment was originally written by [email protected] "but a lot of people do not" http://stackoverflow.com/questions/tagged/regex It is in fact one of the top tags on stackoverflow. "And from my experience, people get used to the regexp syntax pretty quickly" Yes, but that only applies if you are using regex every single day, otherwise you will forget the vast array of special codes it uses. |
This comment was originally written by [email protected] This should be a library (I imagine it using chaining), not a language construct. I highly disagree your example is "easily readable". You're introducing new operators and precedence rules; your formatting suggests you read it one way, but I read it completely differently. var regExp = (startsWith 3 letters |
This comment was originally written by [email protected] These are minor complaints that are easy to fix. I meant "digits" not numbers, which clears up the first problem mentioned. As for precendence, some kind of convention where it is understood that each line starts a new section would work. The operators are English words that are already understood and easier to remember. Regex is an entire alphabet of strange codes that nobody understands, so you can hardly say Regex isn't worse with respect to this issue. Your high disagreement does not correspond to the issues you then bought up, which are minor issues about precendence. I am pretty sure you understood the operators even if they are "new". Assuming precedence was cleared up, and the digits issue was cleared up, any programmer could determine if a string met the criteria in the expression. If you convert that to regex and ask people who don't use regex every day, they will not be able to tell you what it means. |
This comment was originally written by [email protected] I have now created a specification of how the language could look if anyone is interested. All of the character classes and so on in traditional regular expressions can obviously just be represented as sets. There may be an object containing all predefined expressions such as: class Pre { or something like that. Here are some examples. var regex = start "$" Quantifiers can be expressed with the following syntax: Given the example "googledartgoogledartgoogledart", we could have: Greedy quantifier that matches the entire string: var regex = start min 1 letters Reluctant quantifier that matches the first "googledart": var regex = start min 1 letters Posessive quantifier: var googleLetters = Set.from(["g", "o", "l", "e"]); There are examples of lookaheads even though they say before. Dart and javascript do not have lookbehinds. var regex = "google" before "dart"; var regex = "google" before not "dart"; Some other things: var regex = 1 not digits; var regex = start 1 (digits - ["0", "1", "2", "3"]) var regex = 3 letters "or" applies to one line above and below if on a line by itself. |
This comment was originally written by [email protected] Actually that or idea is terrible :) Also, could alternatively use var regex = start min 3 letters to remove need for "inclusive" as a keyword. |
May be suitable for a library - not a core language feature. Added WontFix label. |
Changes: ``` > git log --format="%C(auto) %h %s" 93d0eee..49eefd2 https://dart.googlesource.com/markdown.git/+/49eefd2 Refactor AutolinkExtensionSyntax (#471) https://dart.googlesource.com/markdown.git/+/07e2683 Optimise TableSyntax (#472) https://dart.googlesource.com/markdown.git/+/9b61871 Make helper class private that should not have been exposed (#476) https://dart.googlesource.com/markdown.git/+/299964e Return list for link nodes creation (#452) https://dart.googlesource.com/markdown.git/+/aee6a40 validate code coverage on CI (#474) https://dart.googlesource.com/markdown.git/+/88f3f8a Fix html entity and numeric character references (#467) ``` Diff: https://dart.googlesource.com/markdown.git/+/93d0eee771f6355be6737c2a865f613f6b105bf1~..49eefd211e7840bac7e11257cd966435ae3cb07f/ Change-Id: I2a88d7c386f567738226701be4edcd7c4818744f Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/266760 Auto-Submit: Devon Carew <[email protected]> Commit-Queue: Oleh Prypin <[email protected]> Reviewed-by: Oleh Prypin <[email protected]>
This issue was originally filed by [email protected]
Regular expressions are cryptic. Everybody has problems trying to remember how to construct a regular expression and a day later they forget what the strange thing means. There is no argument about this. The best approach to solve regular expressions that I can think of is:
var regExp = startsWith 3 letters
andThen 6 numbers or "blah"
andThen atLeast 2 "-";
Easily readable.
The text was updated successfully, but these errors were encountered: