Skip to content

Array comparisons vs Single value comparisons #2550

Answered by RobinL
p4pratikjain asked this question in Q&A
Discussion options

You must be logged in to vote

It's because the array comparison is a filter condition, whereas the l.email=r.email comparison is an equi join condition.

You can read more about this here:
https://moj-analytical-services.github.io/splink/topic_guides/blocking/performance.html?h=equi#equi-join-conditions

In your situation I'd recommend sorting your arrays and using something like l.email[1] = r.email[1] rather than the array comparison. You could adjust the recall down a bit to account for the fact this won't capture all matches.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by p4pratikjain
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants