Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some PMIDs not accessible #3

Closed
kghbln opened this issue May 14, 2019 · 9 comments
Closed

Some PMIDs not accessible #3

kghbln opened this issue May 14, 2019 · 9 comments

Comments

@kghbln
Copy link
Contributor

kghbln commented May 14, 2019

Setup

  • MediaWiki | 1.27.5 (5eda979)19:59, 20 September 2018
  • PHP | 5.6.40 (cgi-fcgi)
  • MySQL | 5.6.40-84.0-log
  • PubmedParser | 4.0.3 (0f11f3b) 06:13, 28 April 2019

Issue
I have a rather big wiki using the #pmid quite intensely however in two cases I do not get any PMID info though the IDs exist. These are "24803064" and "25394499". In contrast IDs like e.g. "22986634" or "24947988" as well as many more are working perfectly. Thus I wanted to check back if it is an issue with the extension or an issue with PubMed failing to provide data. Doing e.g. {{#pmid:24803064|reload}} does not help.

@bovender
Copy link
Owner

These IDs work fine over here. I suspect the data was incorrectly retrieved from Pubmed (maybe due to a network issue?). Because records are cached in the database, you never get to see the data. You may use the (reload)[https://www.mediawiki.org/wiki/Extension:PubmedParser#Configuration] flag to invalidate the cache: {{#pmid:24803064|reload}}. I'd remove the flag later on to avoid fetching the data from Pubmed every time the page is being edited (on the other hand, it may not really matter, depending on how often the page is being revised...).

@kghbln
Copy link
Contributor Author

kghbln commented May 24, 2019

Thanks a lot for getting back to this. It appears that I only have the data of the other PMIDs since they were fetched at an earlier point. {{#pmid:24803064|reload}} does not help. As a matter of fact there is the Semantic Cite extension in use too which also fails to retrieve data from PubMed. I am not sure but perhaps it is the old version of MediaWiki or even worse the website was blocked by PubMed for data retrieval. Closing since it is working for you.

@kghbln kghbln closed this as completed May 24, 2019
@bovender
Copy link
Owner

Still it's puzzling me why you cannot retrieve the data.

The API URL to fetch PMID 24803064 is:

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=24803064&retmode=xml

It'd be interesting to see what happens when you curl this on the server. (Don't forget to enclose the URL in '...'...)

Depending on how curious you are, you might also have a look at the Pubmed table in your Wiki's database, it should contain a record for 24803064.

@kghbln
Copy link
Contributor Author

kghbln commented May 29, 2019

Thanks a lot for this tip.

Just did:

curl 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=24803064&retmode=xml'

and got

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a href="https://misuse.ncbi.nlm.nih.gov/error/abuse.shtml?db=pubmed&amp;id=24803064&amp;retmode=xml">here</a>.</p>
</body></html>

So the wiki was blocked by the PubMed data providers. I have the theory that due to the fact that a user mistyped the template name to fetch the data without noticing it the wiki made many unsuccessful calls leading to the ban. Anyways, now we know the extension is working.

@bovender
Copy link
Owner

bovender commented Jun 2, 2019

Nonetheless it would be better if the extension provided some sort of guidance if this problem occurs, rather than leaving it up to the end user to try and figure out what went wrong...

@kghbln
Copy link
Contributor Author

kghbln commented Jun 10, 2019

Probably yes, this would be an enhancement. For the time being I have adapted the template in a way that it shows a direct web-link to the respective entry in case no data can be fetched. Not as nice but the entry can still be explored without much hassle.

@kghbln
Copy link
Contributor Author

kghbln commented Jun 10, 2019

Before I forget. The answer by PubMed was pretty general. However I was able to realize that the wiki probably fell victim by a proxy abuser. Nothing to do about this.

bovender added a commit that referenced this issue Jul 16, 2019
@bovender
Copy link
Owner

I implemented some rather crude, but improved checking for empty XML data sets. Furthermore, XML that does not contain an article is no longer cached. This should improve chances that temporary failures are automagically resolved after the next page edit.

Doing this, I realized that the extension requires some major refactoring. This is in part due to my very own spaghetti code, but also because (if I am not mistaken) Pubmed changed their API responses. If an article is not found, they now simply return a <PubmedArticleSet> with no <PubmedArticle> in it. On the other hand, if, as in your case, the API request failed, the API may not return XML at all, but HTML instead. The extension is currently not really dealing with this in an elegant way.

@kghbln
Copy link
Contributor Author

kghbln commented Jul 17, 2019

Thanks a lot for poking the code. :)

Furthermore, XML that does not contain an article is no longer cached.

I am not sure about the mechanics but I understand that the fetching of data is only done on edits which is good. Just have to make sure that the extension does not frantically try to fetch data which would result in the site being blocked due to this.

but also because (if I am not mistaken) Pubmed changed their API responses.

Oh, man.

On the other hand, if, as in your case, the API request failed, the API may not return XML at all, but HTML instead. The extension is currently not really dealing with this in an elegant way.

It will be great if it could. Probably also suggesting some template code that emits an error instead of going blanc and leave people wonder. I used the ParserFunctions extension to implement something like this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants