Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export to PDF #126

Closed
aladwani opened this issue Aug 28, 2023 · 3 comments
Closed

Export to PDF #126

aladwani opened this issue Aug 28, 2023 · 3 comments
Assignees
Labels

Comments

@aladwani
Copy link
Collaborator

aladwani commented Aug 28, 2023

Evaluate how much work it is to implement a PDF export functionality within Thilo

@jimmylevell jimmylevell added v1 Issue needs to be resolved for first iteration clarification Clarification labels Sep 16, 2023
@jimmylevell
Copy link
Collaborator

@carlobeltrame : do you have a recommendation for a possible integration (with respect to e-camp v3)? Goal is to provide a option to print chapters as pdf.

@carlobeltrame
Copy link
Member

carlobeltrame commented Sep 16, 2023

At eCamp v3, we are still evaluating two completely different tech stacks, because each brings their own risks and downsides. I'll describe the two options below, with regard to how they could work in thilo.

But first, a more fundamental question: What is the PDF export intended for, and is a PDF export really the best option for this use case? I see two possibilities:

  • The PDF is intended to be read and shared digitally, without requiring internet access. In that case, I would suggest you instead implement PWA offline functionality. This has been done before for cudeschin as well as pfadinamen.app. Any interactive elements (which I assume there will be some, otherwise why use React) as well as advanced browser features such as reader view, screen reader compatibility, etc. won't work in a PDF. Additionally, it will be easier to automatically update the content if some of it has changed in the meantime. Otherwise, scout groups will inevitably be saving and sharing old versions of the contents.
  • The PDF is intended to be printed, by the scout leaders themselves. In this case, all the limitations mentioned above are also true. Additionally, the layouting process is not free! To make a print document look good and readable, you either need a separate manual layout step (which you can't have if your users can export their own PDFs at any time), or your content creators have to be aware of the layouting process and need to always think about how their contents will translate to the print domain (which requires a lot of knowledge). In eCamp v3, printing is a very central requirement, so we have tried to at least give the users the option to layout their contents somewhat. But still, our lead designer is currently unhappy with the way the PDFs turn out (because users can't be expected to know layouting best practices), and is thinking about ways to automate good print layouts. This is a very hard problem!

If you still insist on creating a PDF export for the digital thilo, here is a comparison of the two PDF engines we are currently evaluating at eCamp v3:

Print Layout # 1 Print Layout # 2
Technologies used Self-hosted Browserless (headless Chrome), using Chrome's PDF print features to print a HTML page generated on the server using nuxt React-pdf rendering a component tree (Vue components in the case of eCamp v3, but you could use React components), inside a web worker to make sure the UI doesn't freeze
Open Source and free to use Toolchain Yes, except if you want to use the managed browserless.io service Yes
Where are the PDFs generated Server-side Client-side
Pros for Thilo Could easily execute the advanced React component logic which you might already have for the frontend No additional hosting effort, just serve the compiled JS code with the rest of the app
Cons for Thilo Makes deployment and operation of the app way harder. The server-side HTML generator needs to be chosen (e.g. next.js) Need to use separate React components which don't output a HTML tree with <div> / <p> elements etc., but react-pdf-specific primitives instead
How to host Run browserless Docker image in the cluster of your choice, or use the managed browserless.io service (which has issues when trying to use fonts other than Roboto). Either way, additionally also host the server-side HTML generator somewhere. If you use the pre-made react components, web workers aren't included I think, so for PDFs larger than 1-3 pages I would recommend rolling your own web worker integration (and maybe contributing it back).
Debuggability Can be hard because of the many systems involved. Maybe next.js is more easily debuggable than nuxt. Can be hard because of the web worker. Locally, for debugging I always just switch to non-web-worker rendering.
Maintenance of the tools While Chrome of course isn't focused on printing and PDF generation, there have been some people who were actively working on better print capabilities in recent years Open source project with historically a single maintainer, one or two active contributors and a LOT of people just writing support request issues. Releases every few months, but not regularly.
Current problems at eCamp v3 Scalability: Chrome uses a lot of resources, and depending on the peak loads this may skyrocket. Multi-column layouts might still have some issues with page breaking. No real HTML table support, everything is based on a flexbox implementation. Also, React-pdf can be slow for very large documents.
Conclusion Tries to solve problems using infrastructure Tries to solve problems using code

At Qualix, I use react-pdf, because that application is still runnable on a php shared hosting.

Options we have tried and ruled out in the past:

  • tcpdf / fpdf / all the other variants of php *pdf libraries. These are all deprecated / badly maintained and require hard-to-read code and manual layouting with lots of magic numbers
  • WeasyPrint, a python html to pdf converter, produced significantly worse results compared to browserless
  • Just writing a print CSS stylesheet and asking users to use their browser's print to pdf functionality. This led to WILDLY differing PDFs, and unreasonably large page margins were necessary to make it useable on all major browsers
  • paged.js is intended to be a polyfill to even out these browser differences, but it can't handle page breaks in multi-column layouts or tables, and is not very well maintained

@jimmylevell jimmylevell added v1.1 and removed v1 Issue needs to be resolved for first iteration labels Sep 20, 2023
@bodobraegger
Copy link
Collaborator

Thanks a lot for the detailed explanations and write-up @carlobeltrame.

We have also decided to implement the Thilo as a PWA, in order to preserve the cross chapter search functionality and for all the benefits you have also listed.

As far as getting PDF output from the page where needed, the built in browser PDF renderer do well enough with out current page stylings, where we flagged all navigation to not show up in print media - thus the produced PDFs only contain the actual content. The browser and OS differences we can account for, as we also rely on the page display to be reasonably similar and looking good across browsers and devices. We intentionally keep the content-creator's options limited in order to ensure that this will stay like this.

I think with the limited resources available, implementing a dedicated print to pdf feature is out of scope of this project, thus I am closing this issue for now. If more resources become available, we can always reopen it.
If we really need to produce print quality PDFs, this project will require the same resources as the original print Thilo has, especially when it comes to design.

Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants