-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flags / regex-engine (PCRE?) -- transitioning patterns from regex101 to Python regex #454
Comments
The README explains that this module was written to be compatible with the re module and provide a superset of what re provides. It also explains the purpose of Neither the re module nor the regex module are based on PCRE, and I hadn't heard of PCRE2. PCRE was intended to be Perl Compatible Regular Expressions, but Perl has changed some of its regex behaviour since then, so PCRE isn't strictly compatible with Perl any more! When regex101.com says "Python" it means Python with the standard re module. As for the The In fact, in general, those uppercase flags look like they're for controlling features specific to certain engines. |
No @MRBarnett, the FAQ says: "For Python, regex101 implements it on top of PCRE library, by suppressing features not available in Python." There's an issue requesting your module be added as a Python flavor. It's labeled for discussion because regex101's developer believes that Python users (like this issue's author) can choose its |
Why should this regex implementation worry about some 3rd party website being broken? regex101 should be using actual Python re module, with the VM compiled to webassembly, to parse and execute python regular expressions, bug-for-bug. They should do the same for this regex module, since it's in fairly wide use as well. This is really a no-brainer if someone is serious about that stuff. Asking mrab-regex to adapt to regex101's brokenness is IMHO an absurd way to go.
Yes. That's on the third-party tool. And if their developer doesn't think it's their job - oh well. Regex101 can't really be used in most professional closed-source applications anyway because it's a server-based solution, and leaks your code to a third party by design.
Yes, but that is squarely on regex101. When they say "Python regex", they are not being frank with you. They "fake" it, they don't actually use a Python re engine like they should be. This one should be closed IMHO. |
Please might you consider updating the README.md with some preliminary orientation?
It would be helpful to have a section that allows someone using a third-party tool such as regex101.com to test patterns and transition them into Python
regex
with confidence.Currently code that is working on regex101 is failing on regex, and it's hard to figure out why.
I've tracked it down now.
https://regex101.com/r/HQuvtj/1 works (PCRE2 is set) but
regex
misses the second value.Setting PCRE https://regex101.com/r/HQuvtj/2 ... now the regex101 output aligns with the
regex
output.Fixing it for PCRE https://regex101.com/r/HQuvtj/3, now it works in Python
... gives:
Problems I was bumping into:
🔸 VERSION1/2 PCRE/PCRE2/Python
regex101 allows to select flavour. Flavours include Python PCRE PCRE2
regex
doesn't specify if it is using PCRE PCRE2 or something different. I think this should be right at the top of the README.md.regex.DEFAULT_VERSION, regex.VERSION0, regex.V0, regex.VERSION1, regex.V1
gives(8192, 8192, 8192, 256, 256)
. I think it should be documented thatDEFAULT_VERSION
isV0
. Also what doV0
andV1
correspond to?Is
V0
PCRE andV1
PCRE2?Does Python's
re
use PCRE?The author of this repo probably has a context that the reader
So that's the first problem: to understand the situation regarding engines (PCRE/PCRE2/?) as respects
re
andregex
and explain WHAT EXACTLY setting V1/V2 does.🔸 FLAGS
Problem is, I don't know how to configure regex to ensure it's using the same engine as regex101 or the same flags.
regex101 offers:
Are these part of a regex standard?
regex
offers:If those one-letter flagnames match these flags, it would be useful to have a table. Some are obvious, but not all.
Also I should be able to
(?gmi)
at the start of my regex to set flags from within the pattern -- this would resolve ambiguity nicely. Butg
does not work.🔸 🔸 🔸
I'm submitting this in the hope that some flounder-time can be saved with some preamble in the README.md, explaining the mess we're in regarding regex standards.
I think it would help drive adoption of your library.
The text was updated successfully, but these errors were encountered: