Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Firefox extension support PDF navigation via included PDF viewer #13

Closed
wants to merge 9 commits into from

Conversation

cipri-tom
Copy link
Contributor

Hi,

As discussed in #4 , we cannot navigate in the PDF.
This is due to the embedded PDF not allowing history:
https://github.com/mozilla/pdf.js/blob/c791e01bfc280fb6abc476dece21c6a88d2340df/web/app.js#L525-L527

They do this on purpose: mozilla/pdf.js#8121

So the only solution is to have our own PDF viewer, and hope that some day Firefox will enable either content_scripts on PDF, or a WebExtension PDF that serves as API, which we could extend.

In this PR I add the original pdf.js repo, and link it to the extension code. Here's how it looks:

arxiv-utils-mod-10fps50.mp4

(this video does not play on firefox 😕 the irony... )

Maintainance

We will need to keep an eye out for new releases of PDF.js and merge them into the repo. I believe this shouldn't be too difficult: pull the newer tag from mozilla, and merge it into arxiv-utils-customisation branch ; then run gulp generic. We can do it by hand the first couple of times, and if it works well, we can add a GitHub action to do it automatically.

Next steps

A part I'm not very happy about is the linking from the dist all the way up to the root project. Maybe we can separate them better?

Finally, one part that I am totally missing is how to build the extension. I think we need to zip certain files, right ? If you have a doc about this, I can add the building scripts (zipping, etc).

What do you think ?

@j3soon j3soon self-requested a review December 29, 2022 18:01
Copy link
Owner

@j3soon j3soon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this feature.

If I understand correctly, the default pdf.js deliberately disables history, therefore we cannot use its CDN, and must include our modified pdf.js in arxiv-utils to re-enable history.

I think an additional fork of pdf.js is unnecessary and may be difficult to maintain. We can simplify this by including the official pdf.js repo as a submodule, and store our modifications in a git patch (i.e., diff file). When building the extension, we can run git apply before running gulp generic.

If you are using Linux, the extension is build by simply zipping the entire directory:

zip -r arxiv-utils-firefox.zip ./firefox

The official method for automated build seems to be using https://github.com/mozilla/web-ext, but I haven't look into it yet.


A few additional questions:

I cannot view the video you provided (even on Chrome). Maybe the video failed to upload?

I'm not quite sure what you mean by:

linking from the dist all the way up to the root project.

could you elaborate more on this?

Since we may not update pdf.js timely, I'm slightly concerned about the potential security threat of this approach. Maybe a better way to include this feature is to add an option in the settings page, and require users to explicitly enable the enhanced PDF viewer?

@cipri-tom
Copy link
Contributor Author

cipri-tom commented Dec 29, 2022

Thank you for the timely and in-depth review !

Here are my answers.


the default pdf.js deliberately disables history

when embedded in an iframe, yes. If not embedded, history is there.

therefore we cannot use its CDN

The reason we cannot use the CDN is different: mozilla's "vanilla" implementation does not allow PDFs from foreign origin. Their reasoning is here: mozilla/pdf.js#6916

So we have to modify their app.js: cipri-tom/pdf.js@83fa62f

If you load official dist, there is no easy option to overcome this limitation.

store our modifications in a git patch. When building the extension, we can run git apply

I like this approach ! I also wasn't too happy with modifying their code directly. Nor with having two repositories to maintain. I'll try your suggestion to see how this works.

OK if I add a Makefile to apply the patch, build the dist, and zip the files ? I'll try to not zip the whole pdf.js submodule, just the dist build, even minified.


cannot view the video

Probably due to the encoding, screen recorded on Mac. Just wanted to show what it looks like. Here it is
arxiv-utils-included-viewer

linking from the dist all the way up to the root project.

Of course you didn't see what I meant, since I forgot to push that change 🤦‍♂️🙈. Sorry.
I meant this: cipri-tom/pdf.js@4ab0f28#diff-4b300cea2df7f3f32d97ec5bd2e71eee41932f0fb52f31516f91dbff0c683409R481 (last modif)

But in addition, also forward linking to the viewer, this page is very deep, and the whole path will show in the address bar.
98c4fae#diff-ec159a73555541cc6f9e5ae23aa3c77adad2a8c0859efcecafc8c945492f4412

A possible solution would be to move the dist folder above pdf.js. Probably a setting or modif in their gulp building process. Or, to interfere as little as possible with their code, to flatten the files a bit in the proposed Makefile that patches, builds (gulp), then moves files, then zip.

P.S. I will attempt these changes tomorrow, or next year ! :) ✨ Happy New Year ! ✨

@j3soon
Copy link
Owner

j3soon commented Dec 30, 2022

Thank you for the detailed reply. I can see the GIF on Windows now.

I think writing a Makefile or shell script, and moving the dist folder after build sounds good. This method should work directly on Linux and Mac, and may work on Windows after intalling Cygwin/bash.

Happy New Year to you too. :)

@cipri-tom
Copy link
Contributor Author

Hi @j3soon

I've made the updating script. I have tried to follow best practices, and left comments and detailed status messages.

What do you think ?

It seems to work rather well. I was afraid that the patches would not apply well from one release to the next, but initially pdf.js was on 3.1.81, and now it's on v3.2.146, and it works well with both. So I think this method, with the patches, is quite future-proof, since our changes are to stable parts of the code.

However, just in case we need to make new changes, I've added the instructions on how to generate the patch again. Everything resides in update.sh.


The only thing that I'm not content about is that we can no longer load the extension based on the files directly, instead we have to build it after each modification and use the zip file. This is because we now extract the dist folder, as discussed. So the path to the title-managing code is not accurate when the files are not zipped.

I don't expect this to be a big nuissance, since we will probably have few updates to do, besides upgrading pdf.js. However, if the need arises, there are 2 solutions:

  1. use the src="../../../../pdfviewer.js" path. Surprisingly, it works, even though the ../../../ is way above the location of pdfviewer.js. This is probably because the browser normalizes the path and doesn't let you go above the root of the extension, and pdfviwer.js resides in the root
  2. copy the dist contents to the root directory before zipping. This way we can load the extension from disk, but we need to remember to run ./update.sh after the changes

Anyhow, we can decide between 1/2 later on, in case we really need to do serious development for the extension. For now, it is stable

@cipri-tom
Copy link
Contributor Author

@j3soon if you have some time to have a look this week, it would be great to advance / close this :). Then I'll be away for ~ 1 month.

Thank you !

@j3soon
Copy link
Owner

j3soon commented Jan 16, 2023

I have been busy lately, I'll review this when I'm available. Thanks.

@j3soon
Copy link
Owner

j3soon commented Feb 2, 2023

@cipri-tom This newer version looks much better. However, it seems like we still need to update the pdf.js version once in a while...

In another related repo, you proposed to re-use the official pdf.js page to achieve the same result (j3soon/firefox-enable-pdf-addons#1), which simply redirect to https://mozilla.github.io/pdf.js/web/viewer.html and utilize the ?file=... parameter.

This approach allows us to always use the latest pdf.js.

I wonder if this approach has any drawbacks?

@cipri-tom
Copy link
Contributor Author

Hi @j3soon !

I'm not sure Mozilla would be very happy if all PDFs are using their website. Generates traffic for which they pay, I assume. I like Mozilla and wouldn't want to do this without checking with them it's OK.

But maybe this way they'd take a decision sooner to fix these limitations around PDFs.

I do not think the updating of the PDF is so difficult. Until now it worked butter smooth. Maybe we can even set up a GitHub action so it is done automatically ?

@j3soon
Copy link
Owner

j3soon commented Feb 15, 2023

@cipri-tom The viewer provided by Mozilla is hosted on GitHub pages, so the traffic is covered by GitHub, not by Mozilla. The potential concern is hitting the bandwidth limit of GitHub pages, but I doubt that the current ~250 active Firefox users can read that much paper per month.

I prefer to keep things simple for now. We can add a experimental checkbox option in the settings page of arxiv-utils, if the checkbox is checked, all arxiv PDFs will be redirected to https://mozilla.github.io/pdf.js/web/viewer.html.

If this method works well, we may set this behavior as default, and keep an eye on the availability of the PDF viewer on GitHub pages. We may revisit your current implementation (1ffca94) in the future, in case the PDF viewer has other issues or being removed.

@j3soon
Copy link
Owner

j3soon commented Feb 15, 2023

@cipri-tom Regarding GitHub Actions: If you have experiences in using GitHub actions to automatically build/publish the extension, it would be very helpful if you can provide some guidance in #14.

Maybe we can work on these feature incrementally:

  1. Add an experimental feature that redirects to https://mozilla.github.io/pdf.js/web/viewer.html.
  2. Set up automated build & publish in Automated build and publish #14
  3. Reconsider whether to switch to our own pdf.js viewer (1ffca94)

and thanks again for the in-depth investigation of this issue!

@cipri-tom
Copy link
Contributor Author

Hi @j3soon !
This seems like a good plan ! Thanks for the thoughtful direction. I will start working on the redirection in the next days, when I get a bit of time.

I will close this for now, and as you say in point 3., we can reconsider later on. Just for reference, the correct commit would be: 0bda6bc because it uses the mozilla's repo, not a duplicate in my account.

Thank you for following through the investigation, it was fun and good learning for me ! :)

@cipri-tom cipri-tom closed this Feb 15, 2023
j3soon added a commit that referenced this pull request Apr 21, 2023
Fixes: #4
Related: #13

Co-authored-by: Ciprian Tomoiaga <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants