Skip to content
Iury O. G. Figueiredo edited this page Mar 15, 2020 · 25 revisions

Welcome to the crocs docs!

This wiki describes the features and the benefits of using crocs.

Introduction

In Crocs regex's become classes and you use these classes to describe your patterns. Once a pattern is constructed then it is possible to compile the python classes structure to a regex string. It also gives you possible hits for the pattern. It means that you can get a strong idea with which strings your pattern will match.

Joining Patterns

A regex pattern is a sequence of smaller patterns. There is the Join class that is used to join patterns and build a single one.

For instance consider a string is a pattern:

from crocs.regex import Join
e = Join('a', 'b', 'c', 'd')
e.test()
e.hits()

The Join constructor accepts regex's classes to glue them together so forming a master pattern.

That gives:

from crocs.regex import Join
e = Join('a', 'b', 'c', 'd')
e.test()
Regex: abcd
Input: abcd
Group dict: {}
Group 0: abcd
Groups: ()
e.hits()
Match with:
 abcd abcd abcd abcd abcd abcd abcd abcd abcd abcd

Wildcard

The regex's wildcard character in crocs it becomes a class. It can be used to product patterns with other existing classes.

from crocs.regex import Join, X

e = Join('a', X(), 'b')
e.test()
e.hits()

That produces:

Regex: a.b
Input: a.b
Group dict: {}
Group 0: a.b
Groups: ()
Match with:
 a9b alb a|b aVb apb aqb arb aAb a[b a;b

Include/Sequence

Regex's sequences are mapped to the Seq class. Such a class receives two arguments which are used to delim the start and end of the desired sequence.

from crocs.regex import Join, Include, Seq

e = Join('x', Include(Seq('0', '9')))
e.test()
e.hits()

That would give you:

Regex: x[0-9]
Input: x8
Group dict: {}
Group 0: x8
Groups: ()
Match with:
 x8 x4 x6 x1 x0 x0 x2 x7 x4 x2

In order to better elucidate:

from crocs.regex import Include, Seq

e = Include(Seq('a', 'z'))
e.test()
e.hits()

Which would output:

Regex: [a-z]
Input: t
Group dict: {}
Group 0: t
Groups: ()
>>> e.hits()
Match with:
 v o g q v p t x l f

Repeat

The Repeat class is used to describe number of times a given pattern has to occur in other to be classified as a valid pattern.

The example below clarifies the usage.

from crocs.regex import Join, Repeat

e = Join('a', Repeat('b'), Repeat('cd'))
e.test()
e.hits()

Would output:

Regex: ab{0,}(cd){0,}
Input: abbbbbbcdcdcdcdcdcd
Group dict: {}
Group 0: abbbbbbcdcdcdcdcdcd
Groups: ('cd',)
Match with:
 abbbbbbcdcdcdcd abbbbbbbbbbcdcdcdcdcdcdcd abbbbbcdcdcdcdcdcdcdcdcd 
acdcdcdcdcdcdcdcd abbbbbbbbcd abbbcdcdcdcd abbbbbbbbbbcdcdcdcdcdcdcdcdcdcd 
abbbbbcdcdcdcdcdcdcd abbbbbbbbbcdcdcdcdcdcdcd abbbbbbbcdcdcdcdcdcd

Group

The Group class is used to group patterns together and making it possible to reuse other Regex operators to build new patterns. It also allows a mechanism to record the group patterns for retrieving the data that the group pattern matched.

from crocs.regex import Join, Group, X

e = Join('a', Group('b', X()))
e.test()
e.hits()

That would output:

Regex: a(b.)
Input: ab&
Group dict: {}
Group 0: ab&
Groups: ('b&',)
Match with:
 ab& ab& ab& ab& ab& ab& ab& ab& ab& ab&

Named Group

Named groups are useful to keep track of specific patterns that were matched. In crocs you can reference a named group in other regex pattern. It allows you to better debug your regex's.

from crocs.regex import Join, NamedGroup, X
e = Join('x', NamedGroup('foo', X()))
e.test()
e.hits()

Would output:

Regex: x(?P<foo>.)
Input: xo
Group dict: {'foo': 'o'}
Group 0: xo
Groups: ('o',)
e.hits()

Match with:
 xK xS xt x{ x7 x3 xv xE xu xU

Group Reference

Group references are a powerful mean of building some specific patterns. It allows you to create a group then reference the data it matched somewhere else.

from crocs.regex import Join, X, Group

n = Group('b', X(), 'c')
e = Join('a', n, n)
e.test()
e.hits()

That defines the group e and uses it along the pattern e. When you first mention the group n it compiles to its regex's primary form, in the second time you reference to it compiles to its group reference.

Regex: a(b.c)\1
Input: ab{cb{c
Group dict: {}
Group 0: ab{cb{c
Groups: ('b{c',)
Match with:
 ab{cb{c ab{cb{c ab{cb{c ab{cb{c ab{cb{c ab{cb{c ab{cb{c ab{cb{c ab{cb{c ab{cb{c

That basiclly means that what is matched in (b.c) has to be matched at the right side, the '\1' is the group reference.

Consume Next

from crocs.regex import Join, ConsumeNext, X

e = ConsumeNext(Join('a', X(), 'b'), 'def')
e.test()
e.hits()

The ConsumeNext constructor accepts the keyword neg which can define a positive or negative lookahead assertion.

It is also important to notice that the Join class can be nested to build patterns.

That would output:

Regex: (?<=a.b)def
Input: ambdef
Group dict: {}
Group 0: def
Groups: ()
Match with:
 aUbdef a@bdef ambdef a=bdef a&bdef ambdef a0bdef aMbdef a1bdef aIbdef

Consume Back

This is a negative lookbehad. It also accepts a neg argument as it is shown below.

from crocs.regex import Join, ConsumeBack

e = ConsumeBack('Isaac ', 'Asimov', neg=True)
e.test()
e.hits()

That would output:

Regex: Isaac\ (?!Asimov)
Input: Isaac e<&c2)
Group dict: {}
Group 0: Isaac 
Groups: ()
Match with:
 Isaac )nPSNn Isaac e>}@cC Isaac R(+SHX Isaac 5RK~^X 
Isaac +2'b0- Isaac QWCa%k Isaac $ZDc9j
Clone this wiki locally