Skip to content

What is leftCovered and rightCovered in pairs.csv output? #1631

Answered by rien
bestchai asked this question in Q&A
Discussion options

You must be logged in to vote

This is the absolute number of shared fingerprints in the left and right file. These are not necessarily the same because a shared fingerprint might have a different number of occurrences in the left and right file.

A fingerprint is a series of $$k$$ subsequent tokens (k-grams) in the syntax tree selected out of a window of $$w$$ k-grams.

The similarity between two source files a and b is computed as
$$sim(a,b) = \frac{S_a + S_b}{T_a + T_b}$$
with $$T_x$$ the total number of fingerprints in file $$x$$ and $$S_x$$ the number of fingerprints in file $$x$$ that also occur in the other file.

The naming is indeed a bit awkward. It was chosen a few years ago and we try to keep our API somewhat …

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by rien
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants