Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve method for identifying target author #185

Closed
paulalbert1 opened this issue May 12, 2018 · 7 comments
Closed

Improve method for identifying target author #185

paulalbert1 opened this issue May 12, 2018 · 7 comments

Comments

@paulalbert1
Copy link
Contributor

paulalbert1 commented May 12, 2018

Every paper - including those from the gold standard and those where the ReCiter algorithm opts not to make a match - should have one and only one target author.

All matching should be done in a case-insensitive way.

The output of this judgment should be:

  • done at an early state in the algorithm
  • stored in the Analysis table (which hasn't been created yet)
  • used for analysis
  • used as a field to be output into the feature generator API.

Some of this work is being done by getCorrectAuthor.

Order of operations

  • Retrieve records from PubMed and according to the range of retrieval strategies
  • Get complimentary data from Scopus
  • Identify targetAuthor

Logic

Attempt email match

  • if 0 results, go to next.
  • if 1 result, stop.
  • if > 1 result (e.g., 28265069 for OS Andersen), assign remaining authors as false, go to next.

Attempt strict last name, strict middle name, and strict first name match.

  • if 0 results, go to next.
  • if 1 result, stop.
  • if > 1 result, assign remaining authors as false, go to next.

Attempt strict last name, middle initial, and strict first name match.

  • if 0 results, go to next.
  • if 1 result, stop.
  • if > 1 result, assign remaining authors as false, go to next.

Attempt strict last name and strict first name match.

  • if 0 results, go to next.
  • if 1 result, stop.
  • if > 1 result, assign remaining authors as false, go to next.

Attempt strict last name and partial first name match, in which article is substring of identity.

  • if 0 results, go to next.
  • if 1 result, stop.
  • if > 1 result, assign remaining authors as false, go to next.

Attempt strict last name and partial first name match, in which identity is substring of article.

  • if 0 results, go to next.
  • if 1 result, stop.
  • if > 1 result, assign remaining authors as false, go to next.

Attempt strict last name and first initial match.

  • if 0 results, go to next.
  • if 1 result, stop.
  • if > 1 result, assign remaining authors as false, go to next.

Attempt strict last name and middle initial to first initial, and first initial to middle initial match.

  • if 0 results, go to next.
  • if 1 result, stop.
  • if > 1 result, assign remaining authors as false, go to next.

Attempt partial last name and first initial match.

  • if 0 results, go to next.
  • if 1 result, stop.
  • if > 1 result, assign remaining authors as false, go to next.

Attempt strict last name match.

  • if 0 results, go to next.
  • if 1 result, stop.
  • if > 1 result, assign remaining authors as false, go to next.

Attempt strict first name and first initial of last name match.

  • if 0 results, go to next.
  • if 1 result, stop.
  • if > 1 result, assign remaining authors as false, go to next.

Attempt strict first name match.

  • if 0 results, go to next.
  • if 1 result, stop.
  • if > 1 result, assign remaining authors as false, go to next.

Attempt full last name match from article to partial last name from identity. (e.g., Somersan-Karakaya)

  • if 0 results, go to next.
  • if 1 result, stop.
  • if > 1 result, assign remaining authors as false, go to next.

If one of the names in Article is for a “Collective Author” (examples - this is for rbdevere), map to that.

  • if 0 results, assign all authors as false.
  • if 1 result, stop. Assign that author as true.
  • if > 1 result, assign ALL authors as false, go to next.

Assign all authors as false.

@paulalbert1 paulalbert1 changed the title Create dedicated process for reliably identifying target author Improve method for identifying target author May 12, 2018
@sarbajitdutta
Copy link
Contributor

The last step to check for collective author is ON HOLD since we need to bring in additional data from pubmed namely Contributor list

@paulalbert1
Copy link
Contributor Author

paulalbert1 commented Aug 9, 2018

@sarbajitdutta - For ses9022, I got this:

19:37:49.914 [http-nio-5000-exec-4] INFO  r.a.e.t.TargetAuthorSelection - There was no target author found for 29117776
19:37:49.947 [http-nio-5000-exec-4] INFO  r.a.e.t.TargetAuthorSelection - There was no target author found for 27398323
19:37:49.950 [http-nio-5000-exec-4] INFO  r.a.e.t.TargetAuthorSelection - There was no target author found for 28352671
19:37:49.967 [http-nio-5000-exec-4] INFO  r.a.e.t.TargetAuthorSelection - There was no target author found for 28287838


For ajg9004, I got this:

07:50:05.603 [http-nio-5000-exec-9] INFO  r.a.e.t.TargetAuthorSelection - 3 authors were marked as target author for article 29603699

This seems doable...

        "rank": 1,
          "lastName": "Gupta",
          "firstName": "Ajay K",
          "initials": "A",
          "affiliations": null,
          "targetAuthor": true
        },
        {
          "rank": 4,
          "lastName": "Gupta",
          "firstName": "Amit K",
          "initials": "A",
          "affiliations": null,
          "targetAuthor": true
        },
        {
          "rank": 6,
          "lastName": "Gupta",
          "firstName": "Deepika",
          "initials": "D",
          "affiliations": null,
          "targetAuthor": true
        }

Is the above logic wrong??

@paulalbert1
Copy link
Contributor Author

I've noticed that email match is not very effective. We might consider commenting it out - or moving down to position 6 in the rubric.

Screen Shot 2019-04-01 at 7 39 57 AM

@paulalbert1
Copy link
Contributor Author

paulalbert1 commented Apr 14, 2019

No target author for aae2001

{
      "rank": 1,
      "lastName": "El Menyar",
      "firstName": "Ayman A",
      "initials": "A",
      "targetAuthor": false
    },


"primaryName": {
  "firstInitial": "A",
  "firstName": "Ayman",
  "lastName": "El-menyar",
  "middleInitial": "A",
  "middleName": "A."
},        

PMID = 15590364

aae2001

@sarbajitdutta
Copy link
Contributor

Add deAccent to the AuthorSanitizationUtil.

@paulalbert1
Copy link
Contributor Author

Here's a tricky case
amr2011 - https://www.ncbi.nlm.nih.gov/pubmed/27922594

Should match to

<LastName>Rajadhyaksha</LastName>
<ForeName>Anjali M</ForeName>

@paulalbert1
Copy link
Contributor Author

Fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants