Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup regex compiler fixed quantifiers source #10843
Cleanup regex compiler fixed quantifiers source #10843
Changes from 17 commits
f439900
5d79332
8b7dad8
26be3d0
1c865e9
dc55200
9de0b60
962625a
5b18daf
de217de
303829a
1808665
6d12fc3
593cf0e
d8cea51
cfc0ced
84e0e86
fe25cb7
91f7c34
581da1f
043f195
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General regex question: if this is lazy, how does its behavior differ from matching exactly
n
repetitions? What would force it to match more repetitions?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I don't know. I think it depends on the previous character pattern. Here is an example with '.' as the repeat item:
Maybe there are better examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Bradley is asking how
{n}
is different from{n,}?
, not how{n,}
is different from{n,}?
. Here are the extra two cases that need to be added to your examples:The differences have to do with backtracking behavior and whether matching the entire regex requires that the lazy quantifier accept more characters. For example:
In this case, an exact requirement of
b{2}
won't match, because there are three. But the lazy quantifier says "OK, in that case I'll take some extra b characters and see if I can get it to match".There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're only reading at most 3 characters it is impossible to find
count > max_value
, right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is kind of a weak link. If the
max_value
changes to a smaller 3-digit value, the check would need to be re-added. This way, this line should never need to change.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But would there ever be a reason to use a number that isn't the largest number representable by
max_read
digits? My understanding of the code was that the limitation was solely in place to control the width of the buffer reads.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'd like to consider limiting
max_value
to something like 255 in the future which would not changemax_read
but require thecount
check.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine with me. Just for my edification, what is the benefit of that choice? It doesn't have any performance implications unless someone actually requests a number that large at runtime, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am asking whether there is a difference between 255 and 999 if if a user never actually requests a number > 255. Based on
it sounds like the answer is yes? You allocate memory based on the maximum number of repetitions somewhere? In that case, I assume that the amount of memory increases stepwise as the maximum hits crosses powers of two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing that sophisticated. Just maybe store the value in a smaller variable (e.g. uint8).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it doesn’t affect runtime performance, why arbitrarily limit at 999 and not the max value for the size of the repetitions variable? There’s some awkwardness in explaining this arbitrary limit. If it were INT_MAX, I think the docs might not even need to mention it. After all, string columns also have a length limit, right? It might be impossible to reach a repeat limit of INT_MAX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is just an arbitrary string of numbers entered by a human that is being converted to an integer so some error checking will need to be done since the string of decimal digits could be any length. Is there some reason not to limit it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline with @davidwendt. I don't think it's worth blocking on this point so I'm fine with accepting a limit of 999. We can change it later if users need more.