Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex parser improvements and bug fixes #4087

Merged
merged 2 commits into from
Nov 12, 2021

Conversation

andygrove
Copy link
Contributor

Signed-off-by: Andy Grove [email protected]

This PR fixes some issues I found during more extensive testing with random inputs.

  • Character classes can now contain ] as the first character
  • Choices within groups are now being parsed correctly
  • Improved checks for base terms started with quantifier symbols

@andygrove andygrove added the bug Something isn't working label Nov 11, 2021
@andygrove andygrove self-assigned this Nov 11, 2021
RegexRepetition(RegexCharacterClass(negated = false,
ListBuffer(RegexCharacterRange('0', '9'))),SimpleQuantifier('+'))))),
RegexChar('|'),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we were detecting | as a character and not as a choice delimiter

private def parseInternal(): RegexAST = {
val term = parseTerm(() => peek().contains('|'))
private def parseUntil(until: () => Boolean): RegexAST = {
val term = parseTerm(() => until() || peek().contains('|'))
if (!eof() && peek().contains('|')) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Not needed as part of this PR, but I think peek() will return None at EOF, so this can be simplified to:

Suggested change
if (!eof() && peek().contains('|')) {
if (peek().contains('|')) {

Applies to a few other places where eof and peek are combined.

Comment on lines +94 to +95
while (!eof() && !until()
&& (peek().exists(ch => ch == '*' || ch == '+' || ch == '?')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per previous nit, should be able to simplify this to something like:

Suggested change
while (!eof() && !until()
&& (peek().exists(ch => ch == '*' || ch == '+' || ch == '?')
while (!until() && (peek().exists(ch => ch == '*' || ch == '+' || ch == '?')

@andygrove
Copy link
Contributor Author

build

@andygrove andygrove merged commit 4ee229a into NVIDIA:branch-21.12 Nov 12, 2021
@andygrove andygrove deleted the regex-fixes branch November 12, 2021 04:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants