Export to PDF #126

aladwani · 2023-08-28T13:06:41Z

Evaluate how much work it is to implement a PDF export functionality within Thilo

jimmylevell · 2023-09-16T08:10:48Z

@carlobeltrame : do you have a recommendation for a possible integration (with respect to e-camp v3)? Goal is to provide a option to print chapters as pdf.

carlobeltrame · 2023-09-16T10:33:21Z

At eCamp v3, we are still evaluating two completely different tech stacks, because each brings their own risks and downsides. I'll describe the two options below, with regard to how they could work in thilo.

But first, a more fundamental question: What is the PDF export intended for, and is a PDF export really the best option for this use case? I see two possibilities:

The PDF is intended to be read and shared digitally, without requiring internet access. In that case, I would suggest you instead implement PWA offline functionality. This has been done before for cudeschin as well as pfadinamen.app. Any interactive elements (which I assume there will be some, otherwise why use React) as well as advanced browser features such as reader view, screen reader compatibility, etc. won't work in a PDF. Additionally, it will be easier to automatically update the content if some of it has changed in the meantime. Otherwise, scout groups will inevitably be saving and sharing old versions of the contents.
The PDF is intended to be printed, by the scout leaders themselves. In this case, all the limitations mentioned above are also true. Additionally, the layouting process is not free! To make a print document look good and readable, you either need a separate manual layout step (which you can't have if your users can export their own PDFs at any time), or your content creators have to be aware of the layouting process and need to always think about how their contents will translate to the print domain (which requires a lot of knowledge). In eCamp v3, printing is a very central requirement, so we have tried to at least give the users the option to layout their contents somewhat. But still, our lead designer is currently unhappy with the way the PDFs turn out (because users can't be expected to know layouting best practices), and is thinking about ways to automate good print layouts. This is a very hard problem!

If you still insist on creating a PDF export for the digital thilo, here is a comparison of the two PDF engines we are currently evaluating at eCamp v3:

	Print Layout # 1	Print Layout # 2
Technologies used	Self-hosted Browserless (headless Chrome), using Chrome's PDF print features to print a HTML page generated on the server using nuxt	React-pdf rendering a component tree (Vue components in the case of eCamp v3, but you could use React components), inside a web worker to make sure the UI doesn't freeze
Open Source and free to use Toolchain	Yes, except if you want to use the managed browserless.io service	Yes
Where are the PDFs generated	Server-side	Client-side
Pros for Thilo	Could easily execute the advanced React component logic which you might already have for the frontend	No additional hosting effort, just serve the compiled JS code with the rest of the app
Cons for Thilo	Makes deployment and operation of the app way harder. The server-side HTML generator needs to be chosen (e.g. next.js)	Need to use separate React components which don't output a HTML tree with `<div>` / `<p>` elements etc., but react-pdf-specific primitives instead
How to host	Run browserless Docker image in the cluster of your choice, or use the managed browserless.io service (which has issues when trying to use fonts other than Roboto). Either way, additionally also host the server-side HTML generator somewhere.	If you use the pre-made react components, web workers aren't included I think, so for PDFs larger than 1-3 pages I would recommend rolling your own web worker integration (and maybe contributing it back).
Debuggability	Can be hard because of the many systems involved. Maybe next.js is more easily debuggable than nuxt.	Can be hard because of the web worker. Locally, for debugging I always just switch to non-web-worker rendering.
Maintenance of the tools	While Chrome of course isn't focused on printing and PDF generation, there have been some people who were actively working on better print capabilities in recent years	Open source project with historically a single maintainer, one or two active contributors and a LOT of people just writing support request issues. Releases every few months, but not regularly.
Current problems at eCamp v3	Scalability: Chrome uses a lot of resources, and depending on the peak loads this may skyrocket. Multi-column layouts might still have some issues with page breaking.	No real HTML table support, everything is based on a flexbox implementation. Also, React-pdf can be slow for very large documents.
Conclusion	Tries to solve problems using infrastructure	Tries to solve problems using code

At Qualix, I use react-pdf, because that application is still runnable on a php shared hosting.

Options we have tried and ruled out in the past:

tcpdf / fpdf / all the other variants of php *pdf libraries. These are all deprecated / badly maintained and require hard-to-read code and manual layouting with lots of magic numbers
WeasyPrint, a python html to pdf converter, produced significantly worse results compared to browserless
Just writing a print CSS stylesheet and asking users to use their browser's print to pdf functionality. This led to WILDLY differing PDFs, and unreasonably large page margins were necessary to make it useable on all major browsers
paged.js is intended to be a polyfill to even out these browser differences, but it can't handle page breaks in multi-column layouts or tables, and is not very well maintained

bodobraegger · 2024-02-02T19:19:03Z

Thanks a lot for the detailed explanations and write-up @carlobeltrame.

We have also decided to implement the Thilo as a PWA, in order to preserve the cross chapter search functionality and for all the benefits you have also listed.

As far as getting PDF output from the page where needed, the built in browser PDF renderer do well enough with out current page stylings, where we flagged all navigation to not show up in print media - thus the produced PDFs only contain the actual content. The browser and OS differences we can account for, as we also rely on the page display to be reasonably similar and looking good across browsers and devices. We intentionally keep the content-creator's options limited in order to ensure that this will stay like this.

I think with the limited resources available, implementing a dedicated print to pdf feature is out of scope of this project, thus I am closing this issue for now. If more resources become available, we can always reopen it.
If we really need to produce print quality PDFs, this project will require the same resources as the original print Thilo has, especially when it comes to design.

Thanks again!

aladwani assigned aladwani and bodobraegger Aug 28, 2023

jimmylevell added v1 Issue needs to be resolved for first iteration clarification Clarification labels Sep 16, 2023

jimmylevell added v1.1 and removed v1 Issue needs to be resolved for first iteration labels Sep 20, 2023

bodobraegger closed this as completed Feb 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export to PDF #126

Export to PDF #126

aladwani commented Aug 28, 2023 •

edited by jimmylevell

Loading

jimmylevell commented Sep 16, 2023

carlobeltrame commented Sep 16, 2023 •

edited

Loading

bodobraegger commented Feb 2, 2024

Export to PDF #126

Export to PDF #126

Comments

aladwani commented Aug 28, 2023 • edited by jimmylevell Loading

jimmylevell commented Sep 16, 2023

carlobeltrame commented Sep 16, 2023 • edited Loading

bodobraegger commented Feb 2, 2024

aladwani commented Aug 28, 2023 •

edited by jimmylevell

Loading

carlobeltrame commented Sep 16, 2023 •

edited

Loading