Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reparse trailing escape #67

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

alexrutar
Copy link
Contributor

Two changes here:

First, fixes #66 to check for a trailing unescaped \.

The other fix, which is in the second commit and which I am slightly less sure about, is to replace an ASCII whitespace check with a is_whitespace check in Atom::new. This behaviour was previously changed in #65 for Pattern::(re)parse, is it also correct for this to take place in Atom::new? If this is incorrect I'll drop the second commit.

pascalkuthe
pascalkuthe previously approved these changes Dec 9, 2024
Copy link
Member

@pascalkuthe pascalkuthe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks!

@pascalkuthe
Copy link
Member

Ah actually the test seems to be falling

@alexrutar
Copy link
Contributor Author

Ok, great!

There is another bug still: escaping is for some reason not handled correctly at all in the presence of Unicode, e.g. Atom::parse("foö bar"). Will fix it right now and then force-push.

@alexrutar
Copy link
Contributor Author

Oops, force-pushed just as you added a review, maybe it was on the thing that I fixed in any case.

@alexrutar alexrutar force-pushed the fix-reparse-trailing-escape branch from 1a9e169 to 2989e66 Compare December 9, 2024 21:04
@alexrutar
Copy link
Contributor Author

Ok everything seems to be fine now

@alexrutar
Copy link
Contributor Author

Oh of course this is still is still wrong in the ASCII case now too and needs to be fixed there too. Sorry about that, should have read the surrounding code more carefully before pushing the PR!

@alexrutar alexrutar force-pushed the fix-reparse-trailing-escape branch 2 times, most recently from bedf70d to e87b446 Compare December 9, 2024 22:09
@alexrutar
Copy link
Contributor Author

Now the parsing code is changed even more. I added some tests with all of the edge cases that I found in the previous implementation, so hopefully the new one is more correct, but to be honest with manual parsing code like this without extremely good test coverage I am always a bit concerned.

  1. In the Unicode case, there was a \ push that had to be deferred until the end beginning of the subsequent loop, since it was causing incorrect handling of scapes in the presence of Unicode. This was causing invalid handling of e.g. foö\ bar.
  2. I've re-implemented the ASCII case with essentially the same code as the Unicode case, but with the usual ASCII shortcuts.

This also fixes an unrelated bug when parsing needles which contain
non-ASCI Unicode.
@alexrutar alexrutar force-pushed the fix-reparse-trailing-escape branch from e87b446 to 0dc00b5 Compare December 13, 2024 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(Potentially) incorrect handling of trailing \ in MultiPattern::reparse
2 participants