-
Notifications
You must be signed in to change notification settings - Fork 20
repair regex in #quote-paragraph macro #197
base: master
Are you sure you want to change the base?
Conversation
P.S. On those screenshots, my themes-configuration was:
|
P.P.S. I was relying on my screenshot filenames to identiy what was what -- not realizing the filenames would get changed! So: The first screenshot is the view with the OLD regex; the next screenshot is the view with the NEW regex. |
Hello, as we have an issue with our CI, can you send the output of the tests launch? |
Sure: I ran the tests on both of the versions I suggested above -- the one with the outer-wrapping capturing group, and the one without. The results were the same for both, except that with the capturing group included, there was an extra "deprecated warning". Here are the full results, including that extra warning:
The warning included here but NOT included when I tested without the extra capturing group, is that last one:
If we're not really wanting to capture the match inside this group, then the above warning seems more than enough reason to omit that extra capturing group. (We might even want to make ALL the remaining groups non-capturing.) Please let me know, if you'd like me to change my PR code to omit that group. I'd like to add a tangential comment: My earlier screenshots had attribute-list-highlighting that was ... slightly incorrect. This was only because of the styling, not because of the markup: I was running, at the time, a custom grammar package of my own that 'include'd the main grammar files from your package (with just the regex tweaked); but I'd forgotten, as yet, to copy over your 'styles' folder! For the above tests, though, I edited my installed copy of your original package (doing it the proper way -- by editing the quote-paragraph grammar-file and re-generating the main file). So, here's a better screenshot, which I took at that time, showing the correct attribute-list-coloring: If you look close, you might notice that, within the block-attribute-list, the inline macro's opening bracket is marked up like the text surrounding it, but its closing bracket is not (separately) marked up at all, but appears as just text, like the opening and closing bracket of the block-attribute-list. I missed this anomaly at first: you have to look pretty close, even to spot it. So, I'm not worried about it -- it's a perfectly acceptable price to pay, IMHO (unless we can think of a neat fix for it). |
P.S. One way to fix that little anomaly might be to use a recursive regex -- which Ruby started supporting in v. 2.0, it seems. |
coffeescript is not ruby, it's not the same regex support. |
Hmm. Well, as far as I'm aware, JavaScript (must less CoffeeScript?) supports neither recursive regex nor regex with subroutine calls, which would ba another way to approach this ... |
Silly idea anyway, I guess: |
Fixes a typo (a wildcard '.' in the regex instead of the intended, literal \\.), and added a look-ahead. Also removed the extraneous, outer capturing group from my earlier tweak to the regex in #quote-paragraph.
I've suggested two changes in a regex in the file WIth those regex-fixes in place, I got the following test-run:
I may be wrong, but it seems to me that these regex-fixes may solve the problem! |
Hmm ... on second thought, that look-ahead is not QUITE robust enough: it could get fooled by an apostrophe ( So maybe something like this would do it:
I gave that a quick trial run, and it seems to work (on my limited sample): Haven't yet run the test again, though. Thoughts? |
OK, one more suggestion for beefing up the look-ahead, then I shut up: How about we also check, that any character just before the apostrophe (but, like the apostrophe, subsequent to the
On a quick test, this also seems to work... So with that, the whole regex becomes: Here's how Regex Buddy analyzes this regex (but please note that in Regex Buddy, on a Windows, CR/LF machine, just before any
|
Well, it didn't take me long to come up with an example breaking that. But WAIT A SECOND -- maybe the reason I'm having to work so hard to "beef up" the lookahead, is that I'm "looking ahead" for a too-limited set of things: Instead of checking to make sure my If so, then this look-ahead should solve the problem:
-- which puts us here, for the complete regex:
|
Test results for this last commit:
Screen-shot is same as before. |
It might be worth mentioning, that the pattern "alternate-branch-with-lookahead" that I applied to the last regex (to allow it to match a contained, inline macro's closing bracket) affords us an alternative fix for the first regex, as well -- one which would avoid lazy quantification (with the back-tracking it introduces). Here's the current state of the regex-replacement we've substituted into #quote-paragraph -- followed here by the alternate we could use, avoiding lazy quantification:
In trying both of them, my eyes didn't notice any difference in efficiency. But it MIGHT be that a lLITTLE less risk of runaway backtracking attaches to the latter regex ... ?? |
Just in case Regex Buddy's verbalization of that last regex would be helpful
Created with RegexBuddy |
I'm fine with giving this a try, and I appreciate the detailed analysis. I have one question, though. What is |
@ldez we should probably migrate to GitHub Action, any objection? |
@mojavelinux Lol. Yes, it's a bit silly in this context, isn't it? I'd just noticed that it's included in the whitespace category \s and I threw it in for completeness. But I'm happy to omit it. |
Just in case anyone's intrigued, at all, by my last suggestion:
I ran the specs on it:
The main arguments I can see for it (without doing some kind of efficiency-analysis above my pay-grade) are that it makes the regex-code somewhat easier to understand:
|
I'm definitely intrigued. And that sounds like great news. Is there anything else you think needs to be solved before this PR is considered ready? |
Nope. |
The pattern of our fix on #block-attribute-inner's regex, re-use on #quote-paragraph's regexes
It all looks ready, now, to me ... |
@mojavelinux I replied in the negative; but now I've got to back-pedal: Someone please correct me, if I'm wrong: but I don't believe we need the capturing group that I let stand in the regex for
Any reason we shouldn't delete that capturing group -- for efficiency? |
F.y.i., the spec-run on this deletion:
|
I'm happy to proceed. However, I'm not the one who merges to this repo. |
Description
The regex for the quote-paragraph rule is not as robust as it could be -- and also, either not as efficient as it could be, or else (if you're REALLY wanting to refer to captured groups) capturing one group incorrectly.
Here's the regex-segment that is in common between the regexes on lines 35 and 37 of the file ("quote-paragraph-grammar.cson"):
^\\[(quote|verse)((?:,|#|\\.|%)([^,\\]]+))*\\]$
The robustness problem is that as it stands, the regex will not match any block-attribute-list that contains, itself, an inline macro -- such as a URL macro (see screen-shot) or a bibliographic-citation macro (see the below syntax example). And no, it doesn't help, if you escape the closing bracket of the inline macro.
This problem is located in the character-group in the regex: In order to get the character-class not to run away with the rest of the quote-block, there was included in it the closing ']' of the block-attribute-list; but one could have achieved the same thing better by making the quantifier "lazy":
instead of
([^,\\]]+)
one could just write
([^,]+?)
With that change, the regex matches also these inline-macro-containing quote-block-attribute-lists (see screen shots).
As for the other problem: If you don't really need to capture any groups (i.e., you're not gong to refer to them, as is not done inside the present macro), then the regex will run more efficiently if the capturing groups are made non-capturing groups.
On the other hand, if we are wanting to capture those groups, then we need to fix the second capturing group:
((?:,|#|\\.|%)([^,\\]]+))*
Putting a quantifier on a capturing group, like this, leads to the problem which "Regex Buddy" explains thus:
I took the first, safer route for this problem, in my code-fix, in case we do want to capture:
((?:(?:,|#|\\.|%)([^,]+?))*)
If capturing is superfluous, though, then we can just drop the outermost (the capturing) group -- like so:
(?:(?:,|#|\\.|%)([^,]+?))*
Syntax example
Screenshots