-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Penalize relationship scores in cases for each non-match #341
Comments
@paulalbert1 Would this affect publications with less authors? |
Yes, but the effect would be minimal. If you have four authors and no relationship matches, the decrement in score would be only -0.18 (3 non-target authors). Since this would affect all such articles consistently, it wouldn't have a significant relative effect. The greatest effect would be on the articles with a gazillion authors. |
OK gotcha would you want to include this in the ReCiter release 1.1? @paulalbert1 |
That would be great! |
Problem
Some articles are scoring too highly, mostly because there are hundreds of coauthors, and by sheer chance, one or several of those co-authors matches a known relationship.
For example, CWID=pas2026 , and PMID=31031568
Suggested fix
Penalize each relationship non-match a small amount, say, -0.06. As a whole, set a minimum total score, say, -2.
Existing
Proposed
Add these values to application.properties
Change the feature generator output as follows.
Here, the revised relationship score would be: 2.2 + 2.2 + (51 * -0.06) = 1.34.
Now, let's suppose that the number of co-authors were 800. 2.2 + 2.2 + (800 * -0.06) = -43.6. Because -2 > -43.6, we go with -2.
The text was updated successfully, but these errors were encountered: