Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug with painting mentions over PDF? #11

Open
jameshowison opened this issue Apr 5, 2023 · 4 comments
Open

bug with painting mentions over PDF? #11

jameshowison opened this issue Apr 5, 2023 · 4 comments

Comments

@jameshowison
Copy link

I was playing around and noticed this issue, seems like the mentions are being painted on the wrong page, perhaps? or maybe the underlying PDF changed?

https://cloud.science-miner.com/software_kb/frontend/document.html?id=611a11d8e8d4855847028e44

Hopefully this screenshot shows the issue (This is with Firefox 109.0.1 on Mac OS btw)

Screen Shot 2023-04-05 at 12 13 07 PM

@kermitt2
Copy link
Collaborator

kermitt2 commented Apr 5, 2023

Indeed, in the current version it can happen when the underlying paper has changed. The GUI fetches the Open Access PDF at URL given by a call to the Unpaywall API, but if the version of the PDF has changed between the time the annotations was produced and stored in the KB and now, we have position "mismatch" for the annotations.

For example, the annotations produced in 2021 was made with the preprint, and now Unpaywall points to the Gold version of the article.

I store a hash for the PDF originally annotated, so I could detect normally if the fetched PDF is still the same of not - but this is not yet implemented. Then if the PDF mismatch is detected we could imagine a fallback solution, like trying alternative URL to get other version of the PDF or re-annotating on the fly.

@jameshowison
Copy link
Author

Makes sense. Maybe we should store something that enables us to hit the same version (if at all possible)? That's hard without storing the actual article (which we can't do), but perhaps re-trying the URL (or following the metadata to the same version).

But also highlight that unpaywall has a new version?

@kermitt2
Copy link
Collaborator

kermitt2 commented Apr 5, 2023

Maybe we should store something that enables us to hit the same version (if at all possible)?

I think I didn't store the original download url when I did this first experiment, but it will be store next version this year. It can help, but we can also expect that URL change over time even for the same version of the PDF.

But also highlight that unpaywall has a new version?

Yes, and Unpaywall prioritizes gold version - the "right" older version might still be in the alternative URL given by Unpaywall.

I must say I didn't really spend time on this at the time, it's only now after two years that the "live" downloaded PDF starts to be different from time to time and that the problem is visible.

@jameshowison
Copy link
Author

jameshowison commented Apr 5, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants