Regex examples

From Wikipedia: A regular expression (shortened as regex or regexp) is a sequence of characters that specifies a match pattern in text. Basically, you can use regex patterns to efficiently find fine-tuned substring matches in strings.

Each of the examples reads the text in sample.txt and searches for matches that correspond to the regex /^h.\D\S[a-z]{3,5}(...)\w+\d*$/m (although the specific format of this regex differs between all of the languages). Let's break down this regex:

/***/m is the overall form of a regex pattern with the multiline option enabled
- This form may not be understood by every programming language, but is the theoretical definition
- The multiline regex option tells the regex engine to interpret each visual line of text as its own line, rather than the entire string as one "line" of text
^ means the start of a line; therefore, a match must start with the letter h
. means any character, so any character can follow h
\D means any non-digit, so the third character in a match must not be a number
\S means any non-whitespace character, so the fourth character in a match must not be whitespace (tabs or \t, spaces, or newlines or \n)
[a-z]{3,5} means that between 3 to 5 (inclusive) characters in the range a-z (they must all be lowercase) must follow the fourth character
(...) is what is being "captured" by this regex: the 3 characters following the [a-z]{3,5}
\w+ means one or more word characters (letters, digits, or underscores) must follow the captured characters in a match
\d* means zero or more digits must follow the above one or more word characters in a match
$ means the end of a line

The capture group should be "mei" for both lines in sample.text, but the indices should obviously differ.

The second regex is bob$, which looks for lines that end with "bob". This is used to replace "bob" at the end of a line with "alice".

The third regex is hello[0-9]+world, which just means look anywhere in the string for "hello", then one or more digits in the range 0-9, followed by "world".

The fourth regex is ^he[^A-Z]*\d{2}wo\w{2,}$. Aside from the meanings outlined above:

[^A-Z]* means zero or more characters that are not in the range A-Z (uppercase letters)
\d{2} means exactly two digits
\w{2,} means two or more word characters

The fifth regex is \W+, which means one or more non-word characters. This is used to split a string into substrings that exclude any non-word characters (e.g. punctuation marks and whitespace).

Note: all but the C examples use Perl regex syntax, but the C example uses POSIX ERE (extended regular expressions) syntax. This is also reflected in the GNU and BSD versions of grep -E, respectively. To use Perl regex syntax in C, the libpcre3 library is available.

To compile the C example on Windows (which does not have regex.h as part of its C runtime), install libsystre for Mingw-w64: pacman -S mingw-w64-ucrt-x86_64-libsystre

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
go		go
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Makefile		Makefile
README.md		README.md
Regex.cs		Regex.cs
Regex.java		Regex.java
regex.c		regex.c
regex.cpp		regex.cpp
regex.js		regex.js
regex.py		regex.py
regex.rs		regex.rs
sample.txt		sample.txt
test-regex.csproj		test-regex.csproj
test-regex.sln		test-regex.sln

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Regex examples

About

Releases

Packages

Languages

chrislattman/test-regex

Folders and files

Latest commit

History

Repository files navigation

Regex examples

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages