Replies: 1 comment 1 reply
-
Any update? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have 3 key comparisons: company name, address, location (postcode + town)
I have ended up having a system with more than 15 blocking rules and custom comparison levels... let me tell you why and what I may be doing wrong:
I always want company names to be matching to some degree. But the naive training gives very high probability to matching address & location, to the point that completely different names still receive a very high probability (99.9%).
So I said ok, will guard via blocking rules, so that there is a minimum match to the names (hence the 15-20 blocking rules). But then, because matching names become common, matching address get big weights similar/more than name matching, and the "else" case for address gets very big (much more than the name).
How can I "teach" or "guide" the algorithm to realise that name matching is more important, and address or location matches are more to support a weak name match?
I attach my weights diagram. Essentially would want the "else" cases reversed at least... And the positive matches on name to have a stronger effect. I think the imbalance is caused the guided blocking rules that make name matching appear way more common, but what else to do?
I think this kind of intuition oriented guidance is a gap in the otherwise amazing documentation.
Beta Was this translation helpful? Give feedback.
All reactions