Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Managing invalid PMID used in non-pubmed sources #31

Open
kermitt2 opened this issue May 22, 2019 · 4 comments
Open

Managing invalid PMID used in non-pubmed sources #31

kermitt2 opened this issue May 22, 2019 · 4 comments
Assignees
Milestone

Comments

@kermitt2
Copy link
Owner

http://localhost:8080/service/lookup?pmid=16262981 results in 404

However it is the PMID indicated by this record, so it exists !
http://localhost:8080/service/lookup?pii=S0266462305050762

@kermitt2 kermitt2 assigned kermitt2 and unassigned kermitt2 May 22, 2019
@lfoppiano
Copy link
Collaborator

it looks not in the PMID source file:

(base) Johan:consolidationData lfoppiano$ zgrep 16262981 PMID_PMCID_DOI.csv.gz 
(base) Johan:consolidationData lfoppiano$ 

@kermitt2
Copy link
Owner Author

kermitt2 commented May 22, 2019

but present in the ISTEX ids file

lopez@work:/mnt/data/biblio$ zgrep 16262981 ~/biblio-glutton/data/istex/istexIds.all.gz
{"corpusName":"cambridge","istexId":"CC91E0F1789978CE79D653533100BA315CA337B3","ark":["ark:/67375/6GQ-9RTTRBZ7-G"],"doi":["10.1017/S0266462305050762"],"pmid":["16262981"],"pii":["S0266462305050762"]}

So it might be present in the ISTEX metadata as provided by the publishers, but not in the mapping file DOI/PMID. It is actually not a valid PMID -> https://www.ncbi.nlm.nih.gov/pubmed/16262981

This DOI does not correspond to a real article but to an index of a Cambridge journal.

I think it's fine not to map it to anything indeed with the lookup service, as it's not a valid PMID anymore. But I am not so sure how to deal with it in the full record.

@lfoppiano
Copy link
Collaborator

right... one thing I notice is that we do not use the pmid information from the istex mapping but only from the pmid file (which should be the authority here, isn't it?).
We should complement the pmid lookup with information from istex mapping, perhaps?

@kermitt2
Copy link
Owner Author

Yes I think it makes sense to complement like that the pmid lookup.

The main issue is then that the publisher metadata are not well maintained. When a PMID becomes invalid at PubMed (which is the reference), apparently they are not removed from the publisher metadata and we would have invalid mapping via the istex data file.

The submodule pubmed-glutton map DOI/PMID to MESH classes using the officiel PubMed metadata dump files. One approach to control the pmid information would be to exploit this additional mapping. I still need then to work a bit on it again.

So I would suggest to wait until I review the pubmed-glutton submodule. This module will produce an additional mapping PMID/MESH classes to enrich the record of biblio-glutton, and we could use it to control valid PMID.

@kermitt2 kermitt2 self-assigned this Sep 5, 2021
@kermitt2 kermitt2 changed the title Particular PMID lookup fails Managing invalid PMID used in non-pubmed sources Apr 12, 2022
@lfoppiano lfoppiano added this to the 0.3 milestone Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants