Skip to content

Performance Comparison

Koichi Akabe edited this page Dec 15, 2021 · 15 revisions

This wiki shows the experimental results of time and memory for daachorse and other pattern matching libraries.

Experiment Setup

Datasets

We used the following two types of datasets for our experiments:

Implementations

We compared the following Aho-Corasick implementations:

In addition, we compared the following pattern matching automaton implementations:

Machine

The following is the specification of the used machine:

  • CPU: Intel(R) Core(TM) i7-8086K CPU @ 4.00GHz
  • Memory: 64GiB
  • OS: CentOS Linux release 7.5.1804 (Core)

Experiment Result

The benchmark code can be found here.

Overlapping Match

We measured the elapsed time to perform find_overlapping() provided by the Aho-Corasick implementations. For fst and yada, we performed common prefix searches from all positions in the target text.

Library Word100K [ms] Unidic [ms]
aho-corasick (NFA) 14.2 36.0
aho-corasick (DFA) 3.8 23.2
daachorse 4.1 11.5
fst 34.1 56.5
yada 6.3 12.6

Non-Overlapping Shortest Match

We measured the elapsed time to perform find_iter() provided by the Aho-Corasick implementations.

Library Word100K [ms] Unidic [ms]
aho-corasick (NFA) 12.5 12.1
aho-corasick (DFA) 3.1 5.3
daachorse 3.8 2.0

Non-Overlapping Longest Match

We measured the elapsed time to perform leftmost_find_iter() (resp. find_iter()) with the MatchKind::LeftmostLongest specification provided by daachorse (resp. aho-corasick).

Library Word100K [ms] Unidic [ms]
aho-corasick (NFA) 12.1 28.8
aho-corasick (DFA) 3.8 17.3
daachorse 4.1 8.3

Non-Overlapping Leftmost-First Match

We measured the elapsed time to perform leftmost_find_iter() (resp. find_iter()) with the MatchKind:: LeftmostFirst specification provided by daachorse (resp. aho-corasick).

Library Word100K [ms] Unidic [ms]
aho-corasick (NFA) 12.1 13.8
aho-corasick (DFA) 3.4 5.6
daachorse 3.9 3.0

Memory usage

We measured the memory usage reported by heap_bytes() provided by the Aho-Corasick implementations. For fst and yada, we measured the data buffer size obtained by their constructors.

Library Word100K [MiB] Unidic [MiB]
aho-corasick (NFA) 12.8 63.5
aho-corasick (DFA) 763.8 2,664.2
daachorse 8.3 34.7
fst 1.0 2.8
yada 2.9 9.4
Clone this wiki locally