Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF8 RegexFix doesn't correct capture's text #170

Open
chekoopa opened this issue Jul 17, 2019 · 2 comments
Open

UTF8 RegexFix doesn't correct capture's text #170

chekoopa opened this issue Jul 17, 2019 · 2 comments
Labels

Comments

@chekoopa
Copy link

convertMatchText @ Text.RE.ZeInternals.Types.Match does perfectly correct captures' offsets and lengths, but capturedText is left intact (at this very line you can see, it's put straight from input), which may provoke more issues with using the library, mostly Text.RE.PCRE.Text.

The workaround is take (captureLength c) $ drop (captureOffset c) $ (captureSource c), but it's kind of lame. Incorporating similar code into RegexFix would make it more transparent but may impact on performance.

@cdornan
Copy link
Contributor

cdornan commented Jul 17, 2019

@chekoopa thanks for the clear analysis. I am far too busy to be able to work on this at the moment but will be amenable to carving out some time. The more demand there is the sooner I am likely to get to this so please shout if anybody needs this fixed.

@kt0d
Copy link

kt0d commented May 24, 2020

>matchedText $ "żX" ?=~ [re|ż|]
Just "\380X"

It's may be easy to work around if you just want one match, but I originally encountered this problem using (*=~/) (search and replace).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants