-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perl-style shorthands (like \d
) not recognized, only POSIX ones (like [[:digit:]]
)
#36
Comments
Could you submit a small Haskell program demonstrating the problem? |
Perhaps this will help, taken from my StackOverflow question. module WordCount (wordCount) where
import qualified Data.Char as C
import qualified Data.List as L
import Text.Regex.TDFA as R
wordCount :: String -> [(String, Int)]
wordCount xs =
do
let zs = R.getAllTextMatches (xs =~ "\\d+|\\b[a-zA-Z']+\\b") :: [String]
g <- L.group $ L.sort [map C.toLower w | w <- zs]
return (head g, length g) |
What the others do:
Concerning It even says explicitly:
So, the easiest solution for you might be to use |
I found this library looking for a regex package, and saw it mentioned in the Haskell wiki, and in a blog that’s now part of the README. I compared various libraries based on their maintainability (last commit date) and popularity (GitHub stars, issues addressed promptly), and this one came out at the top. Because of that, I’m indeed surprised that something as common as `\d‘ isn’t supported. I’m a Haskell freshman and don’t have the skills yet to start making PRs on a general-purpose library. |
Predefined character classes we could support are listed here: One could recognize them either directly in the parser:
Maybe it is better to handle them in the translation: regex-tdfa/lib/Text/Regex/TDFA/CorePattern.hs Lines 537 to 538 in 95d47cb
|
There seems to be already code for POSIX character classes: regex-tdfa/lib/Text/Regex/TDFA/TNFA.hs Lines 798 to 805 in 95d47cb
These can be given to Pattern s PAny and PAnyNot :regex-tdfa/lib/Text/Regex/TDFA/Pattern.hs Lines 45 to 46 in 95d47cb
@asarkar: The syntax accepted by |
\d
) not recognized, only POSIX ones (like [[:digit:]]
)
https://regex101.com/r/griuTm/1 shows |
No, Supporting Perl-style regexes like |
I intend to use For example, given below is a question that I previously solved in Rust using a regex library. The pattern I used was They have a predefined list of packages they allow; If you're reluctant in making this change, and I'm not talking about
|
This exercise would be https://exercism.org/tracks/haskell/exercises/phone-number . Please bear with me, I still have trouble understanding the importance of supporting
Ok, but this should be fine, as
Please share the link to the PR if that's fine with you. Would supporting |
No, the PR's been merged.
The importance, at least to me, is brevity and conciseness. If, in your opinion, what I said so far doesn't justify the change, I've nothing further to add to this discussion. Please make a decision, and either proceed to implement this ticket, or don't, I'm going to get my coat. |
Ok, thanks for your input, @asarkar ! I need to balance between convenience and stability. I'll leave this open and see if other users chime in. |
Pattern
\\d+|\\b[a-zA-Z']+\\b
fails to find the digits in input "testing, 1, 2 testing". The regex is correct as can be tested here https://regex101.com/r/griuTm/1.Changing the pattern to
\\b[0-9a-zA-Z']+\\b
works, but it changes the intent because that makes input "123abc" would be valid.\\b[0-9]+\\b|\\b[a-zA-Z']+\\b
works too.The text was updated successfully, but these errors were encountered: