Author: | Daniel Walker |
---|---|
Version: | 0.1.2 |
Date: | 2023-01-05 |
The dam_lev
package implements the
Damerau–Levenshtein diff algorithm. That is,
it will take two sequences and determine the minimum number of transpositions, substitutions, insertions, and
deletions needed to transform the first sequence into the second.
The package exposes a single function, dam_lev.get_changes
. It takes two sequences (i.e., they must
implement the __len__
and __getitem__
methods) and returns a list of dam_lev.Mutation
objects. There are four subclasses of dam_lev.Mutation
corresponding to the four types of
transformations. For example,
diffs = dam_lev.get_changes('abcdef', 'bcedxy')
print(diffs) # [Deletion(at=0), Transposition(at=3), Substitution(at=5, at2=4), Insertion(at=6, at2=5)]
We see that the sequence of transformations is:
- Delete the item at index 0 (
'a'
) - Transpose the item at index 3 (
'd'
) with its successor - Substitute the item at index 5 (
'f'
) with the item from the second sequence at index 4 ('x'
) - Insert at index 6 the item from the second sequence at index 5 (
'y'
)
Note the index for the transposition. Even though, after the deletion, the 'd'
is at index 2, it's at
index 3 in the original version of the sequence. Likewise for the successive mutations.
You can also pass a callable as the key
keyword argument to dam_lev.get_changes
. Similar to
list.sort
, this callable will be used to compare the elements of the sequences. For example,
diffs = dam_lev.get_changes('aBc', 'AbC', key=str.upper)
print(diffs) # []