-
Notifications
You must be signed in to change notification settings - Fork 21
Home
Welcome to the crocs docs!
This wiki describes the features and the benefits of using crocs.
In Crocs regex's become classes and you use these classes to describe your patterns. Once a pattern is constructed then it is possible to compile the python classes structure to a regex string. It also gives you possible hits for the pattern. It means that you can get a strong idea with which strings your pattern will match.
A regex pattern is a sequence of smaller patterns. There is the Join class that is used to join patterns and build a single one.
For instance consider a string is a pattern:
from crocs.regex import Join
e = Join('a', 'b', 'c', 'd')
e.test()
e.hits()
The Join constructor accepts regex's classes to glue them together so forming a master pattern.
That gives:
from crocs.regex import Join
e = Join('a', 'b', 'c', 'd')
e.test()
Regex: abcd
Input: abcd
Group dict: {}
Group 0: abcd
Groups: ()
e.hits()
Match with:
abcd abcd abcd abcd abcd abcd abcd abcd abcd abcd
The regex's wildcard character in crocs it becomes a class. It can be used to product patterns with other existing classes.
from crocs.regex import Join, X
e = Join('a', X(), 'b')
e.test()
e.hits()
That produces:
Regex: a.b
Input: a.b
Group dict: {}
Group 0: a.b
Groups: ()
Match with:
a9b alb a|b aVb apb aqb arb aAb a[b a;b
Regex's sequences are mapped to the Seq class. Such a class receives two arguments which are used to delim the start and end of the desired sequence.
from crocs.regex import Join, Include, Seq
e = Join('x', Include(Seq('0', '9')))
e.test()
e.hits()
That would give you:
Regex: x[0-9]
Input: x8
Group dict: {}
Group 0: x8
Groups: ()
Match with:
x8 x4 x6 x1 x0 x0 x2 x7 x4 x2
In order to better elucidate:
from crocs.regex import Include, Seq
e = Include(Seq('a', 'z'))
e.test()
e.hits()
Which would output:
Regex: [a-z]
Input: t
Group dict: {}
Group 0: t
Groups: ()
>>> e.hits()
Match with:
v o g q v p t x l f
The Repeat class is used to describe number of times a given pattern has to occur in other to be classified as a valid pattern.
The example below clarifies the usage.
from crocs.regex import Join, Repeat
e = Join('a', Repeat('b'), Repeat('cd'))
e.test()
e.hits()
Would output:
Regex: ab{0,}(cd){0,}
Input: abbbbbbcdcdcdcdcdcd
Group dict: {}
Group 0: abbbbbbcdcdcdcdcdcd
Groups: ('cd',)
Match with:
abbbbbbcdcdcdcd abbbbbbbbbbcdcdcdcdcdcdcd abbbbbcdcdcdcdcdcdcdcdcd
acdcdcdcdcdcdcdcd abbbbbbbbcd abbbcdcdcdcd abbbbbbbbbbcdcdcdcdcdcdcdcdcdcd
abbbbbcdcdcdcdcdcdcd abbbbbbbbbcdcdcdcdcdcdcd abbbbbbbcdcdcdcdcdcd
The Group class is used to group patterns together and making it possible to reuse other Regex operators to build new patterns. It also allows a mechanism to record the group patterns for retrieving the data that the group pattern matched.
from crocs.regex import Join, Group, X
e = Join('a', Group('b', X()))
e.test()
e.hits()
That would output:
Regex: a(b.)
Input: ab&
Group dict: {}
Group 0: ab&
Groups: ('b&',)
Match with:
ab& ab& ab& ab& ab& ab& ab& ab& ab& ab&
Named groups are useful to keep track of specific patterns that were matched. In crocs you can reference a named group in other regex pattern. It allows you to better debug your regex's.
from crocs.regex import Join, NamedGroup, X
e = Join('x', NamedGroup('foo', X()))
e.test()
e.hits()
Would output:
Regex: x(?P<foo>.)
Input: xo
Group dict: {'foo': 'o'}
Group 0: xo
Groups: ('o',)
e.hits()
Match with:
xK xS xt x{ x7 x3 xv xE xu xU
Group references are a powerful mean of building some specific patterns. It allows you to create a group then reference the data it matched somewhere else.
from crocs.regex import Join, X, Group
n = Group('b', X(), 'c')
e = Join('a', n, n)
e.test()
e.hits()
That defines the group e and uses it along the pattern e. When you first mention the group n it compiles to its regex's primary form, in the second time you reference to it compiles to its group reference.
Regex: a(b.c)\1
Input: ab{cb{c
Group dict: {}
Group 0: ab{cb{c
Groups: ('b{c',)
Match with:
ab{cb{c ab{cb{c ab{cb{c ab{cb{c ab{cb{c ab{cb{c ab{cb{c ab{cb{c ab{cb{c ab{cb{c
That basiclly means that what is matched in (b.c) has to be matched at the right side, the '\1' is the group reference.
from crocs.regex import Join, ConsumeNext, X
e = ConsumeNext(Join('a', X(), 'b'), 'def')
e.test()
e.hits()
The ConsumeNext constructor accepts the keyword neg which can define a positive or negative lookahead assertion.
It is also important to notice that the Join class can be nested to build patterns.
That would output:
Regex: (?<=a.b)def
Input: ambdef
Group dict: {}
Group 0: def
Groups: ()
Match with:
aUbdef a@bdef ambdef a=bdef a&bdef ambdef a0bdef aMbdef a1bdef aIbdef
This is a negative lookbehad. It also accepts a neg argument as it is shown below.
from crocs.regex import Join, ConsumeBack
e = ConsumeBack('Isaac ', 'Asimov', neg=True)
e.test()
e.hits()
That would output:
Regex: Isaac\ (?!Asimov)
Input: Isaac e<&c2)
Group dict: {}
Group 0: Isaac
Groups: ()
Match with:
Isaac )nPSNn Isaac e>}@cC Isaac R(+SHX Isaac 5RK~^X
Isaac +2'b0- Isaac QWCa%k Isaac $ZDc9j