This directory contains a Javascript runner program for benchmarking Javascript regular expressions. This currently only supports the irregexp engine inside of v8. (Which is used by Google Chrome, Firefox and NodeJS.)
Note that v8, as of somewhat recently, does contain an experimental non-backtracking regex engine. It would be cool to measure that, but it's not clear what the best approach is there.
This program otherwise makes the following choices:
- It will throw an exception if given a haystack that contains invalid UTF-8. Namely, as far as I can tell, there is no way to use Javascript's regex engine on arbitrary bytes. Its API seems to suggest that it is only possible to run it on Javascript strings, which I understand to be sequences of UTF-16 code units.
Like the java/hotspot
regex engine, it looks like the JIT is potentially
caching regex compilation in some way. Basically, after a number of iterations,
regex compilation becomes stupidly fast. This is probably not a good model of
the real world, and so, javascript/v8
is not included in any of the compilation
benchmarks.
See also the discussion in the Java runner program's README. For anyone with more experience with v8 and irregexp, I would welcome feedback.
Like the regress
engine (another ECMAScript regex engine), this regex engine
has no support for inline flags. So for example, syntax like (?s:.)
or
(?i:abc)
is not allowed.
Javascript's regex engine does okay with Unicode support, but only when Unicode mode is enabled, which is not the default. The biggest difference between non-Unicode mode and Unicode mode is probably that the former uses UTF-16 code units as the fundamental atom of matching, where as the latter uses full codepoints as the fundamental atom of matching. This has the effect where non-Unicode tends to not support codepoints outside of the BMP (basic multi-lingual plane).