Skip to content

Commit

Permalink
Documentation fixes
Browse files Browse the repository at this point in the history
  • Loading branch information
Artyom Kazak committed May 8, 2019
1 parent d0d6b4b commit e3c15fb
Show file tree
Hide file tree
Showing 5 changed files with 85 additions and 69 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ as a dependency and `import Text.Regex.TDFA.Text ()`.
-- non-monadic
<to-match-against> =~ <regex>

-- monadic, uses MonadFail on lack of match
-- monadic, uses 'fail' on lack of match
<to-match-against> =~~ <regex>
```

Expand Down Expand Up @@ -145,7 +145,7 @@ getAllSubmatches (a =~ b) :: [(Int, Int)] -- (index, length)

regex-tdfa does not provide find-and-replace.

## The Relevant Links
## The relevant links

This documentation is also available in [Text.Regex.TDFA haddock](http://hackage.haskell.org/package/regex-tdfa-1.2.3.2/docs/Text-Regex-TDFA.html).

Expand All @@ -169,7 +169,7 @@ import Text.Regex.TDFA
>>> "(3 + 1)"
```

## Known Bugs and Infelicities
## Known bugs and infelicities

* Regexes with large character classes combined with `{m,n}` are very slow and memory-hungry ([#14][]).

Expand Down
128 changes: 72 additions & 56 deletions Text/Regex/TDFA.hs
Original file line number Diff line number Diff line change
Expand Up @@ -23,23 +23,25 @@ In modules where you need to use regexes:
> import Text.Regex.TDFA
Note that regex-tdfa does not provide support for Text by default.
Note that regex-tdfa does not provide support for @Text@ by default.
If you need this functionality, add <https://hackage.haskell.org/package/regex-tdfa-text regex-tdfa-text>
as a dependency and @import Text.Regex.TDFA.Text ()@.
= Basics
> λ> emailRegex = "[a-zA-Z0-9+._-]+@[a-zA-Z-]+\\.[a-z]+"
> λ> "my email is [email protected]" =~ emailRegex :: Bool
> >>> True
>
> -- non-monadic
> <to-match-against> =~ <regex>
>
> -- monadic, uses MonadFail on lack of match
> <to-match-against> =~~ <regex>
@(=~)@ and @(=~~)@ are polymorphic in their return type. This is so that
@
λ> let emailRegex = "[a-zA-Z0-9+.\_-]+\@[a-zA-Z-]+\\\\.[a-z]+"
λ> "my email is [email protected]" '=~' emailRegex :: Bool
>>> True
/-- non-monadic/
λ> \<to-match-against\> '=~' \<regex\>
/-- monadic, uses 'fail' on lack of match/
λ> \<to-match-against\> '=~~' \<regex\>
@
('=~') and ('=~~') are polymorphic in their return type. This is so that
regex-tdfa can pick the most efficient way to give you your result based on
what you need. For instance, if all you want is to check whether the regex
matched or not, there's no need to allocate a result string. If you only want
Expand All @@ -53,60 +55,71 @@ type you want, especially if you're trying things out at the REPL.
== Get the first match
> -- returns empty string if no match
> a =~ b :: String -- or ByteString, or Text...
>
> λ> "alexis-de-tocqueville" =~ "[a-z]+" :: String
> >>> "alexis"
>
> λ> "alexis-de-tocqueville" =~ "[0-9]+" :: String
> >>> ""
@
/-- returns empty string if no match/
a '=~' b :: String /-- or ByteString, or Text.../
λ> "alexis-de-tocqueville" '=~' "[a-z]+" :: String
>>> "alexis"
λ> "alexis-de-tocqueville" '=~' "[0-9]+" :: String
>>> ""
@
== Check if it matched at all
> a =~ b :: Bool
>
> λ> "alexis-de-tocqueville" =~ "[a-z]+" :: Bool
> >>> True
@
a '=~' b :: Bool
λ> "alexis-de-tocqueville" '=~' "[a-z]+" :: Bool
>>> True
@
== Get first match + text before/after
> -- if no match, will just return whole
> -- string in the first element of the tuple
> a =~ b :: (String, String, String)
>
> λ> "alexis-de-tocqueville" =~ "de" :: (String, String, String)
> >>> ("alexis-", "de", "-tocqueville")
>
> λ> "alexis-de-tocqueville" =~ "kant" :: (String, String, String)
> >>> ("alexis-de-tocqueville", "", "")
@
/-- if no match, will just return whole/
/-- string in the first element of the tuple/
a =~ b :: (String, String, String)
λ> "alexis-de-tocqueville" '=~' "de" :: (String, String, String)
>>> ("alexis-", "de", "-tocqueville")
λ> "alexis-de-tocqueville" '=~' "kant" :: (String, String, String)
>>> ("alexis-de-tocqueville", "", "")
@
== Get first match + submatches
> -- same as above, but also returns a list of /just/ submatches
> -- submatch list is empty if regex doesn't match at all
> a =~ b :: (String, String, String, [String])
>
> λ> "div[attr=1234]" =~ "div\\[([a-z]+)=([^]]+)\\]" :: (String, String, String, [String])
> >>> ("", "div[attr=1234]", "", ["attr","1234"])
@
/-- same as above, but also returns a list of just submatches./
/-- submatch list is empty if regex doesn't match at all/
a '=~' b :: (String, String, String, [String])
λ> "div[attr=1234]" '=~' "div\\\\[([a-z]+)=([^]]+)\\\\]" :: (String, String, String, [String])
>>> ("", "div[attr=1234]", "", ["attr","1234"])
@
== Get /all/ matches
> -- can also return Data.Array instead of List
> getAllTextMatches (a =~ b) :: [String]
>
> λ> getAllTextMatches ("john anne yifan" =~ "[a-z]+") :: [String]
> >>> ["john","anne","yifan"]
@
/-- can also return Data.Array instead of List/
'getAllTextMatches' (a '=~' b) :: [String]
λ> 'getAllTextMatches' ("john anne yifan" '=~' "[a-z]+") :: [String]
>>> ["john","anne","yifan"]
@
= Feature support
This package does provide captured parenthesized subexpressions.
Depending on the text being searched this package supports Unicode.
The [Char] and (Seq Char) text types support Unicode. The ByteString
and ByteString.Lazy text types only support ASCII. It is possible to
support utf8 encoded ByteString.Lazy by using regex-tdfa and
regex-tdfa-utf8 packages together (required the utf8-string package).
The @[Char]@ and @(Seq Char)@ text types support Unicode. The @ByteString@
and @ByteString.Lazy@ text types only support ASCII. It is possible to
support utf8 encoded @ByteString.Lazy@ by using regex-tdfa and
<http://hackage.haskell.org/package/regex-tdfa-utf8 regex-tdfa-utf8>
packages together (required the utf8-string package).
As of version 1.1.1 the following GNU extensions are recognized, all
anchors:
Expand Down Expand Up @@ -135,7 +148,8 @@ This package does not provide "basic" regular expressions. This
package does not provide back references inside regular expressions.
The package does not provide Perl style regular expressions. Please
look at the regex-pcre and pcre-light packages instead.
look at the <http://hackage.haskell.org/package/regex-pcre regex-pcre>
and <http://hackage.haskell.org/package/pcre-light pcre-light> packages instead.
This package does not provide find-and-replace.
Expand All @@ -145,13 +159,15 @@ If you find yourself writing a lot of regexes, take a look at
<http://hackage.haskell.org/package/raw-strings-qq raw-strings-qq>. It'll
let you write regexes without needing to escape all your backslashes.
> {-# LANGUAGE QuasiQuotes #-}
>
> import Text.RawString.QQ
> import Text.Regex.TDFA
>
> λ> "2 * (3 + 1) / 4" =~ [r|\([^)]+\)|] :: String
> >>> "(3 + 1)"
@
\{\-\# LANGUAGE QuasiQuotes \#\-\}
import Text.RawString.QQ
import Text.Regex.TDFA
λ> "2 * (3 + 1) / 4" '=~' [r|\\([^)]+\\)|] :: String
>>> "(3 + 1)"
@
-}

Expand Down
6 changes: 3 additions & 3 deletions Text/Regex/TDFA/ByteString.hs
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{-|
{-|
This modules provides 'RegexMaker' and 'RegexLike' instances for using
'ByteString' with the DFA backend ("Text.Regex.Lib.WrapDFAEngine" and
@ByteString@ with the DFA backend ("Text.Regex.Lib.WrapDFAEngine" and
"Text.Regex.Lazy.DFAEngineFPS"). This module is usually used via
import "Text.Regex.TDFA".
Expand Down Expand Up @@ -44,7 +44,7 @@ instance RegexLike Regex B.ByteString where
matchCount r s = length (matchAll r' s)
where r' = r { regex_execOptions = (regex_execOptions r) {captureGroups = False} }
matchTest = Tester.matchTest
matchOnceText regex source =
matchOnceText regex source =
fmap (\ma -> let (o,l) = ma!0
in (B.take o source
,fmap (\ol@(off,len) -> (B.take len (B.drop off source),ol)) ma
Expand Down
8 changes: 4 additions & 4 deletions Text/Regex/TDFA/ByteString/Lazy.hs
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{-|
{-|
This modules provides 'RegexMaker' and 'RegexLike' instances for using
'ByteString' with the DFA backend ("Text.Regex.Lib.WrapDFAEngine" and
@ByteString@ with the DFA backend ("Text.Regex.Lib.WrapDFAEngine" and
"Text.Regex.Lazy.DFAEngineFPS"). This module is usually used via
import "Text.Regex.TDFA".
Expand Down Expand Up @@ -45,7 +45,7 @@ instance RegexLike Regex L.ByteString where
matchCount r s = length (matchAll r' s)
where r' = r { regex_execOptions = (regex_execOptions r) {captureGroups = False} }
matchTest = Tester.matchTest
matchOnceText regex source =
matchOnceText regex source =
fmap (\ma ->
let (o32,l32) = ma!0
o = fi o32
Expand All @@ -64,7 +64,7 @@ instance RegexLike Regex L.ByteString where
let (off0,len0) = x!0
trans pair@(off32,len32) = (L.take (fi len32) (L.drop (fi (off32-i)) t),pair)
t' = L.drop (fi (off0+len0-i)) t
in amap trans x : seq t' (go (off0+len0) t' xs)
in amap trans x : seq t' (go (off0+len0) t' xs)
in go 0 source (matchAll regex source)

fi :: (Integral a, Num b) => a -> b
Expand Down
6 changes: 3 additions & 3 deletions Text/Regex/TDFA/Sequence.hs
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{-|
{-|
This modules provides 'RegexMaker' and 'RegexLike' instances for using
'ByteString' with the DFA backend ("Text.Regex.Lib.WrapDFAEngine" and
@ByteString@ with the DFA backend ("Text.Regex.Lib.WrapDFAEngine" and
"Text.Regex.Lazy.DFAEngineFPS"). This module is usually used via
import "Text.Regex.TDFA".
Expand Down Expand Up @@ -49,7 +49,7 @@ instance RegexLike Regex (Seq Char) where
matchCount r s = length (matchAll r' s)
where r' = r { regex_execOptions = (regex_execOptions r) {captureGroups = False} }
matchTest = Tester.matchTest
matchOnceText regex source =
matchOnceText regex source =
fmap (\ma -> let (o,l) = ma!0
in (before o source
,fmap (\ol -> (extract ol source,ol)) ma
Expand Down

0 comments on commit e3c15fb

Please sign in to comment.