This document describes how to build the rebar
tool and the regex engines
that it knows how to measure. For a guided tour of how to use rebar
,
see the TUTORIAL.
I've only tested rebar
on Linux. It should probably work on macOS too.
rebar
itself will probably work on Windows as well, but building at least
some of the regex engines may be a challenge. As will be discussed below, you
don't need to build all of the regex engines in order to use rebar
. It works
perfectly fine if a few cannot be built. Moreover, even if you can't build any
of them, you can still use rebar
to explore the benchmark data. (See the
tutorial linked above for a guided exploration.)
rebar
is the harness tool used to gather measurements and compare results.
The harness tool itself doesn't have any explicit knowledge about regex engines
or even benchmarks. It just knows how to read a directory full of TOML files,
and those TOML files define the benchmarks and tell rebar
how to run a
particular regex engine.
Unless you're building your own benchmark suite, it's likely that
rebar
will not be useful as a tool on its own. Instead, you'll want the
benchmark definitions and regex engine runner programs. You can get all of that
by simply cloning this repository:
$ git clone https://github.com/BurntSushi/rebar
$ cd rebar
Building the harness tool requires that Rust be installed. If you're on Linux,
you can probably install Rust via your distribution's package manager, but it
might give you a version of Rust that is too old to build rebar
. If so, you
might consider installing Rust via rustup
.
Once you have Rust installed, you should have a command called cargo
, which
we use to build rebar
:
$ cargo install --path .
On my system, this compiles the rebar
binary and puts it at
$HOME/.local/cargo/bin/rebar
, which is already in my PATH
. If you can't
find it, the binary will also be at ./target/release/rebar
. You can test that
rebar
is available by querying its version (the output you see may be a
little different than what is shown below):
$ rebar version
0.0.1 (rev b46c3a4aba)
The runner programs that execute regex searches can be built by rebar
with
the following command:
$ rebar build
It is almost certain that some regex engines will fail to build initially,
and this is usually because of a missing dependency. Below, we'll attempt to
explain how rebar
builds runner programs so that you can diagnose the issue
and install the missing dependency. Do note though that rebar
is perfectly
happy to run without building all of the regex engines. So you don't have to
get rebar build
working completely.
All regex engines that rebar
knows how to run are defined in
benchmarks/engines.toml
. Each entry in that file
provides the following details about each engine:
- Its name.
- The program dependencies the engine has.
- How to build the engine.
- How to get the version of the engine after it has been built.
- How to run the engine.
- How to remove the build artifacts.
rebar
does not provide a way to build every regex engine it benchmarks from
source. Some are done this way:
- The PCRE2 and RE2 regex engines have their source code vendored into this
repository. That means you don't need to have them separately installed.
rebar
knows how to build them from scratch. All you need are a C and C++ compiler, respectively. - All regex engines written in Rust are pinned to a specific version.
Most of the rest of the regex engines are instead pulled in through system
dependencies. For example, the javascript/v8
regex engine is benchmarked via
a Javascript program that is executed by your system's copy of Node. rebar
will not install or build Node for you. Instead, if Node is not installed,
building the javascript/v8
engine will simply fail. To see what each engine
requires, inspect its definition in benchmarks/engines.toml
.
With that said, most regex engines, even if they are pulled in from a system
dependencies, do require an actual program to be compiled. For example, the
dotnet
regex engine is bundled as part of your .NET system installation,
but this repository contains a .NET program that uses that bundled regex
engine. The program itself still needs to be compiled.
As a result, there are generally three classes of regex engine in rebar
:
- Regex engines that just need a system dependency to be installed and
otherwise do not require any building. For example,
javascript/v8
,perl
andpython/re
. These engines are still "built," but the process of building is just verifying that the programs can be executed. - Regex engines that derive their regex engine from a system dependency, but
still need a program compiled before it can be executed. For example,
dotnet
,java/hotspot
,icu
andgo/regexp
. - Regex engines that are either bundled or pinned such that they themselves
are built by
rebar
. For example,rust/regex
,re2
,pcre2
andregress
. Note that these still need system dependencies. For example, a Rust compiler, a C++ compiler and a C compiler.
There isn't really any significance to which class a regex engine falls into.
It's just what is most convenient. For example, it is feasible to build PCRE2
from scratch, but much more annoying to build all of Python just to use its
standard library re
module.
The downside of this convenience-based approach is that rebar
cannot
benchmark arbitrary versions of regex engines. That is, in many cases, you are
pretty much stuck with whatever version happens to be installed on your system.
This might make benchmark results incomparable from different environments.
Currently, rebar
doesn't have a great answer to this.
This section will give a brief lesson on how to use rebar build
by showing
a debugging session. For example, let's say I don't have .NET installed on
my system, but I want to benchmark its regex engine. I don't care about the
others, so I ask rebar
to just build the .NET runner programs:
$ rebar build -e dotnet
dotnet: dependency command failed: failed to run command and wait for output
note: a dependency that is required to build 'dotnet' could not be found, either because it isn't installed or because it didn't behave as expected
note: run `RUST_LOG=debug rebar build -e '^dotnet$'` to see more details
dotnet/compiled: dependency command failed: failed to run command and wait for output
dotnet/nobacktrack: dependency command failed: failed to run command and wait for output
The error messages above are telling you that rebar
can't find a program
dependency. Notice also that there are multiple regex engines that contain
the string dotnet
. This is because dotnet
has three different regex
engines that might be worth measuring: the default interpreter, a JIT and a
non-backtracking finite automata based engine. Generally speaking, since all
three are executed by the same runner program, if we can build one of them then
we'll be able to build all of them. So let's just focus on one. We'll also set
RUST_LOG=debug
so that we can get a little more information about what's
failing:
$ RUST_LOG=debug rebar build -e '^dotnet$'
[2023-04-15T18:20:18Z DEBUG rebar::util] running command: cd "engines/dotnet" && "/home/andrew/code/rust/rebar/engines/dotnet/bin/Release/net7.0/main" "version"
[2023-04-15T18:20:18Z DEBUG rebar::format::benchmarks] extracted version for engine 'dotnet' failed: failed to get version: failed to run command and wait for output: No such file or directory (os error 2)
[2023-04-15T18:20:18Z DEBUG rebar::util] running command: "dotnet" "--list-sdks"
dotnet: dependency command failed: failed to run command and wait for output
note: a dependency that is required to build 'dotnet' could not be found, either because it isn't installed or because it didn't behave as expected
note: run `RUST_LOG=debug rebar build -e '^dotnet$'` to see more details
So why isn't dotnet --list-sdks
working? Let's try it:
$ dotnet --list-sdks
zsh: command not found: dotnet
Ah, because .NET is not installed! So let's get it installed. You'll need to follow .NET installation instructions for your platform, but for me on Archlinux, I'll try this:
$ sudo pacman -S dotnet-host
Now let's try building again:
$ rebar build -e '^dotnet$'
dotnet: dependency command did not print expected output: could not find match for "(?m)^7\\." in output of "dotnet" "--list-sdks"
note: a dependency that is required to build 'dotnet' could not be found, either because it isn't installed or because it didn't behave as expected
note: run `RUST_LOG=debug rebar build -e '^dotnet$'` to see more details
That didn't work, so let's do what the error message suggests and re-run it with some debugging log messages displayed:
$ RUST_LOG=debug rebar build -e '^dotnet$'
[2023-04-11T14:27:18Z DEBUG rebar::util] running command: cd "engines/dotnet" && "/home/andrew/code/rust/rebar/engines/dotnet/bin/Release/net7.0/main" "version"
[2023-04-11T14:27:18Z DEBUG rebar::format::benchmarks] extracted version for engine 'dotnet' failed: failed to get version: failed to run command and wait for output: No such file or directory (os error 2)
[2023-04-11T14:27:18Z DEBUG rebar::util] running command: "dotnet" "--list-sdks"
dotnet: dependency command did not print expected output: could not find match for "(?m)^7\\." in output of "dotnet" "--list-sdks"
note: a dependency that is required to build 'dotnet' could not be found, either because it isn't installed or because it didn't behave as expected
note: run `RUST_LOG=debug rebar build -e '^dotnet$'` to see more details
[2023-04-11T14:27:18Z DEBUG rebar::cmd::build] output for dependency command "dotnet" "--list-sdks": <EMPTY>
The extra logging shows us rebar
is trying to run dotnet --list-sdks
, but
it's not getting the output it expects. Indeed, it claims that it gets no
output at all. Is that true?
$ dotnet --list-sdks
$
Yup. So we probably need to install more stuff. In this case, an SDK:
$ sudo pacman -S dotnet-sdk
And now let's try building again:
$ rebar build -e '^dotnet$'
dotnet: running: cd "engines/dotnet" && "dotnet" "build" "-c" "Release"
dotnet: build complete for version 7.0.3
Great! Now let's make sure it actually runs by executing rebar
's test suite
(masquerading as benchmarks). The -t
flag below tells rebar
to execute
the benchmark once and emit a success or fail message for each one. Since
rebar
verifies the result of each benchmark, the benchmark definition can
double as a test by just running it once.
$ rebar measure -f '^test/' -e '^dotnet$' -t
test/func/leftmost-first,dotnet,count-spans,7.0.3,OK
test/func/dollar-only-matches-end,dotnet,count,7.0.3,OK
test/func/non-greedy,dotnet,count,7.0.3,OK
test/model/count,dotnet,count,7.0.3,OK
[... snip ...]
All is well! Finally, let's manufacture a build error with the .NET runner
program to see what that looks like. Let's change the first line of
engines/dotnet/Main.cs
to this:
using Tystem;
Now let's see what happens when we try to build the runner program:
$ rebar build -e '^dotnet$'
dotnet: running: cd "engines/dotnet" && "dotnet" "build" "-c" "Release"
dotnet: build failed: command failed with ExitStatus(unix_wait_status(256)) but stderr is empty
note: run `RUST_LOG=debug rebar build -e '^dotnet$'` to see more details
That's not so helpful, so let's ask for debug messages:
$ RUST_LOG=debug rebar build -e '^dotnet$'
[2023-04-11T14:33:34Z DEBUG rebar::util] running command: cd "engines/dotnet" && "/home/andrew/code/rust/rebar/engines/dotnet/bin/Release/net7.0/main" "version"
[2023-04-11T14:33:34Z DEBUG rebar::util] running command: "dotnet" "--list-sdks"
dotnet: running: cd "engines/dotnet" && "dotnet" "build" "-c" "Release"
[2023-04-11T14:33:34Z DEBUG rebar::util] running command: cd "engines/dotnet" && "dotnet" "build" "-c" "Release"
[2023-04-11T14:33:35Z DEBUG rebar::util] command failed, exit status: ExitStatus(unix_wait_status(256))
[2023-04-11T14:33:35Z DEBUG rebar::util] stderr:
dotnet: build failed: command failed with ExitStatus(unix_wait_status(256)) but stderr is empty
note: run `RUST_LOG=debug rebar build -e '^dotnet$'` to see more details
That's not terribly helpful either. In particular, stderr
appears to be empty
(otherwise it would have been included here). So maybe the problems are being
printed to stdout
instead. Let's try to run the command that rebar
is
running ourselves:
$ cd engines/dotnet
$ dotnet build -c Release
MSBuild version 17.4.1+fedecea9d for .NET
Determining projects to restore...
All projects are up-to-date for restore.
/home/andrew/code/rust/rebar/engines/dotnet/Main.cs(1,7): error CS0246: The type or namespace name 'Tystem' could not be found (are you missing a using directive or an assembly reference?) [/home/andrew/code/rust/rebar/engines/dotnet/main.csproj]
Build FAILED.
/home/andrew/code/rust/rebar/engines/dotnet/Main.cs(1,7): error CS0246: The type or namespace name 'Tystem' could not be found (are you missing a using directive or an assembly reference?) [/home/andrew/code/rust/rebar/engines/dotnet/main.csproj]
0 Warning(s)
1 Error(s)
Time Elapsed 00:00:01.04
There we go. Now we can undo our manufactured error and rebuild it:
$ cd ../..
$ rebar build -e '^dotnet$'
dotnet: running: cd "engines/dotnet" && "dotnet" "build" "-c" "Release"
dotnet: build complete for version 7.0.3