Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Penalize relationship scores in cases for each non-match #341

Closed
paulalbert1 opened this issue May 16, 2019 · 4 comments
Closed

Penalize relationship scores in cases for each non-match #341

paulalbert1 opened this issue May 16, 2019 · 4 comments

Comments

@paulalbert1
Copy link
Contributor

paulalbert1 commented May 16, 2019

Problem

Some articles are scoring too highly, mostly because there are hundreds of coauthors, and by sheer chance, one or several of those co-authors matches a known relationship.

For example, CWID=pas2026 , and PMID=31031568

Screen Shot 2019-05-15 at 5 45 26 PM

Screen Shot 2019-05-16 at 7 35 06 AM

Suggested fix

Penalize each relationship non-match a small amount, say, -0.06. As a whole, set a minimum total score, say, -2.

Existing

        "relationshipEvidence": [
          {
            "relationshipNameArticle": {
              "firstName": "Erika L",
              "firstInitial": "E",
              "lastName": "Abramson"
            },
            "relationshipNameIdenity": {
              "firstName": "Erika",
              "firstInitial": "E",
              "lastName": "Abramson"
            },
            "relationshipType": [
              "Co-investigator"
            ],
            "relationshipMatchType": "verbose",
            "relationshipMatchingScore": 2.2,
            "relationshipVerboseMatchModifierScore": 0.6,
            "relationshipMatchModifierMentor": 0,
            "relationshipMatchModifierMentorSeniorAuthor": 0,
            "relationshipMatchModifierManager": 0,
            "relationshipMatchModifierManagerSeniorAuthor": 0
          },
          {
            "relationshipNameArticle": {
              "firstName": "Sameer",
              "firstInitial": "S",
              "lastName": "Malhotra"
            },
            "relationshipNameIdenity": {
              "firstName": "Sameer",
              "firstInitial": "S",
              "lastName": "Malhotra"
            }
        ],

Proposed

Add these values to application.properties

strategy.knownrelationships.relationshipMinimumTotalScore=-2
strategy.knownrelationships.relationshipNonMatchScore=-0.06

Change the feature generator output as follows.

        "relationshipEvidence": [
          {
            "relationshipNonMatchCount": 51,
            "relationshipNonMatchScore":-0.06,
            "relationshipMinimumTotalScore":-2
          },
          {
            "relationshipNameArticle": {
              "firstName": "Erika L",
              "firstInitial": "E",
              "lastName": "Abramson"
            },
            "relationshipNameIdenity": {
              "firstName": "Erika",
              "firstInitial": "E",
              "lastName": "Abramson"
            },
            "relationshipType": [
              "Co-investigator"
            ],
            "relationshipMatchType": "verbose",
            "relationshipMatchingScore": 2.2,
            "relationshipVerboseMatchModifierScore": 0.6,
            "relationshipMatchModifierMentor": 0,
            "relationshipMatchModifierMentorSeniorAuthor": 0,
            "relationshipMatchModifierManager": 0,
            "relationshipMatchModifierManagerSeniorAuthor": 0
          },
          {
            "relationshipNameArticle": {
              "firstName": "Sameer",
              "firstInitial": "S",
              "lastName": "Malhotra"
            },
            "relationshipNameIdenity": {
              "firstName": "Sameer",
              "firstInitial": "S",
              "lastName": "Malhotra"
            "relationshipType": [
              "Co-investigator"
            ],
            "relationshipMatchType": "verbose",
            "relationshipMatchingScore": 2.2,
            "relationshipVerboseMatchModifierScore": 0.6,
            "relationshipMatchModifierMentor": 0,
            "relationshipMatchModifierMentorSeniorAuthor": 0,
            "relationshipMatchModifierManager": 0,
            "relationshipMatchModifierManagerSeniorAuthor": 0
          }
        ],

Here, the revised relationship score would be: 2.2 + 2.2 + (51 * -0.06) = 1.34.

Now, let's suppose that the number of co-authors were 800. 2.2 + 2.2 + (800 * -0.06) = -43.6. Because -2 > -43.6, we go with -2.

  1. Compute the scores for positive matches.
  2. Compute the scores for negative matches.
  3. Positive match + negative match = TotalProvisionalRelationshipScore
  4. If TotalProvisionalRelationshipScore < relationshipMinimumTotalScore, then relationshipMinimumTotalScore. Else, TotalProvisionalRelationshipScore.
@sarbajitdutta
Copy link
Contributor

@paulalbert1 Would this affect publications with less authors?

@paulalbert1
Copy link
Contributor Author

Yes, but the effect would be minimal. If you have four authors and no relationship matches, the decrement in score would be only -0.18 (3 non-target authors). Since this would affect all such articles consistently, it wouldn't have a significant relative effect. The greatest effect would be on the articles with a gazillion authors.

@sarbajitdutta
Copy link
Contributor

OK gotcha would you want to include this in the ReCiter release 1.1? @paulalbert1

@paulalbert1
Copy link
Contributor Author

That would be great!

@paulalbert1 paulalbert1 changed the title Penalize relationship scores in cases where there a lot of non-matches Penalize relationship scores in cases for each non-match May 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants