Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow with many elements for custom table element? #2032

Closed
imatwork opened this issue Sep 28, 2022 · 8 comments · Fixed by diegomura/react-pdf-site#116
Closed

Slow with many elements for custom table element? #2032

imatwork opened this issue Sep 28, 2022 · 8 comments · Fixed by diegomura/react-pdf-site#116

Comments

@imatwork
Copy link
Contributor

When I try rendering a large amount of elements for a table in a PDF, it takes @react-pdf a long time to render it. It specifically seems related to resolveDimensions and whatever Yoga is doing.

Here's an example of the timings I'm seeing:
110 rows (10 subheadings w/ 10 rows each), 26 columns - 2860 elements at 2.3s
510 rows (10 subheadings w/ 50 rows each), 26 columns - 13260 elements at 6.7s
1010 rows (10 subheadings w/ 100 rows each), 26 columns - 26260 elements at 21.7s
2010 rows (20 subheadings w/ 100 rows each), 26 columns - 52260 elements at 36.3s

Now I can understand @react-pdf does a lot of stuff internally that might make it slow. The tables we're rendering at least have fixed widths for columns and rows, so we can calculate everything up front. Unfortunately, there was no way that I could lay out the table that would make it much faster.

I tried using fixed page sizes, turning off wrap on page and all other elements, using position: 'absolute'; top: N; left: M; styling, using SVG elements to layout the table. None of these were able to get a 500 row 26 column table rendered in under 5s.

Am I doing something wrong? Any insight on how I can speed up this table, setting anything with the dimensions values we have precalculated up front (x, y, width, height) to speed it up?

Thanks

@carlobeltrame
Copy link
Contributor

carlobeltrame commented Sep 29, 2022

Yoga is a flexbox implementation, and is used for calculating the dimensions and positions of all elements rendered into the pdf. It is written in C and is run as WebAssembly, or using asm.js if WebAssembly is not supported on your platform, so it is already optimized for performance. It is an integral part of react-pdf and there is currently no way to "skip" it. Since Yoga is a separate software project from react-pdf, you could try researching speedup options there. But there is a chance that there is simply no way to make it faster.

One idea though: You could try splitting up the table into multiple <Page> elements. I am not completely sure, but that might trigger react-pdf to call yoga mutiple times with smaller workloads, which might make the internal calculations easier.

Worst case you could always just use https://pdfkit.org/ directly. React-pdf uses that internally to render the layouted PDF. But then you lose the layouting and pagination features which react-pdf implements, so you will have to calculate the positions and dimensions of all texts and lines yourself.

@imatwork
Copy link
Contributor Author

imatwork commented Sep 30, 2022

One idea though: You could try splitting up the table into multiple elements. I am not completely sure, but that might trigger react-pdf to call yoga mutiple times with smaller workloads, which might make the internal calculations easier.

This was something I tried previously as well but no luck, didn't see a difference in timings

Worst case you could always just use https://pdfkit.org/ directly. React-pdf uses that internally to render the layouted PDF. But then you lose the layouting and pagination features which react-pdf implements, so you will have to calculate the positions and dimensions of all texts and lines yourself.

I will look into this. Is there a way to do this for a single page and let the other pages layout normally? react-pdf has been really helpful, except for the performance issues with large tables we've had. Removing the tables from our largest report reduces the PDF generation from 47s to 3s.

Yoga is a flexbox implementation, and is used for calculating the dimensions and positions of all elements rendered into the pdf. It is written in C and is run as WebAssembly, or using asm.js if WebAssembly is not supported on your platform, so it is already optimized for performance. It is an integral part of react-pdf and there is currently no way to "skip" it. Since Yoga is a separate software project from react-pdf, you could try researching speedup options there. But there is a chance that there is simply no way to make it faster.

Well, there should still be a way for it to do less work. Like a fast path? If I lay out 2000 elements each with fixed width, height, top and left and no text wrapping, Yoga should be able to look at those values passed into its abstract box model and not need to do a whole lot of extra stuff. It should be lightning fast for a case like that but that's not what I'm seeing.

I tried to make some test cases to illustrate this a bit better. They each have really unintuitive performance.

Position Absolute 42 Rows, 26 Cols This one doesn't actually finish... If you bump down the rows to 39 it does finish in ~700ms. Maybe a bug in `resolvePageDimensions` when an element is position absolute'd off the page. That's at least the function it spends more of it's time in when I pause the debugger.
Position Absolute 500 Rows, 26 Cols w/ large static page size 13000 elements finishes in ~9s. Uses position absolute, top left width height flexWrap nowrap, page wrap={false} and size={[612,12000]} so _everything_ is fixed/set in styles and shouldn't do any extra layout stuff, but it is still 9s. The profile is below:

About 50% of the time is spent in resolvePagination but there's only one page with a fixed size of more than the elements I'm drawing. It also has wrap={false} so should not ever page break. I'm not sure why this is.

Flex box 500 Rows, 26 Cols 13000 elements finishes in ~7.5s. Uses one large column of fixed width/height Views, each with cells of fixed width and height. All use flexWrap: nowrap, wrap={false}. Everything but position is fixed/set in styles. Surprisingly faster than the above where everything is fixed.

The output of this breaks the table over 13 pages, but it had size="LETTER" and wrap={false}, so it should not break, right? A bug?

Here's the profile:

~50% in resolvePagination, but it should only be rendering one page.

These cases are not all that complex and should not be this slow, right? Especially when so many precalculated constraints are applied to each element (it already knows the width, height, etc in all cases) so it shouldn't have to measure the content, do word wrapping, etc to determine all the "hard"/expensive values. It should be able to take a fast path.

And specifically by fast path:

  • In resolveTextLayout, can't layoutText be skipped if node.attr.style.flexWrap === 'nowrap' (or sorry, I dont remember the exact property path of attributes on react-pdf's node, whatever the right path would be)?
  • In resolvePagination, can't paginate be skipped if page.attr.wrap === false?
  • In resolveDimensions, can't resolvePageDimensions be skipped if the dimensions of the nodes children (padding, margin, width/height, etc) are all known ahead of time?
  • And maybe others?

Thanks

@carlobeltrame
Copy link
Contributor

I just had some issues with the layouting engine in a table-like use case, where the width of one column containing text would influence the calculation of the available height in an element in another column. This happened even when I set a fixed width style on the first column. The flexbox calculations might be more complicated than we can see from the outside, so that might be why in your tests the fixed width and height did not make a difference in performance.

I managed to fix my problem by setting the first column to a fixed width and additionally wrapping its contents inside a <View style={{ position: 'absolute', top: 0, left: 0, right: 0, bottom: 0 }}></View>. This way, the contents of the first column cannot influence the sizing of the remaining columns in any way. Maybe you could try wrapping the contents of your fixed-size table cells in similar absolutely positioned Views and see whether that impacts the performance?

@imatwork
Copy link
Contributor Author

imatwork commented Oct 4, 2022

I managed to fix my problem by setting the first column to a fixed width and additionally wrapping its contents inside a <View style={{ position: 'absolute', top: 0, left: 0, right: 0, bottom: 0 }}>. This way, the contents of the first column cannot influence the sizing of the remaining columns in any way.

Hmm, did you try that on the examples and it was faster? I saw no change in those stackblitz when doing that.

And specifically by fast path... (from my last comment)

I implemented a few of these changes in my local branch of react-pdf and have begun seeing how they affect performance. One is also a bug fix. I've made some PRs for them:

@carlobeltrame
Copy link
Contributor

In my case I was not concerned with performance. I was simply reporting my experiences with the layouting engine, and how I managed to "disable" parts of the layouting logic by encapsulating some components in absolutely positioned wrappers.

Nice to see you are getting around to propose some fixes though :) let's hope the maintainer can find some time to have a look at it.

@imatwork
Copy link
Contributor Author

Worst case you could always just use https://pdfkit.org/ directly. React-pdf uses that internally to render the layouted PDF. But then you lose the layouting and pagination features which react-pdf implements, so you will have to calculate the positions and dimensions of all texts and lines yourself.

So I tried this and my tables render in a negligible amount of time now. I am going to switch my efforts to working with this and exposing more of the pdfkit/PDFDocument internals from react-pdf so I can work more with the low level stuff while also using the high level API.

If anyone else wanted to tackle performance issues though, here was my last findings:

I put together a table example in the packages/examples repo which you can find in my fork. I also added a timer to the examples so you can see how long a render takes, but Chrome/Firefox's profiler works better to check actual timings.

resolveDimensions takes a large amount of time but can't really be skipped because it traverses every node and calculates all dimensions (maybe you could make a layout with everything known ahead of time but I didn't want to mess with checking for that, seemed hacky). Here's the profile:

image

This mostly descends into Yoga, but what I didn't realize earlier, is that there are callbacks to react-pdf at some point (maybe the measure functions? I didn't look). You can see in the below profile though that Yoga eventually calls into layoutText.

image

layoutText has a simple responsibility and gets called a lot so I looked into optimizing this. What I noticed is that react-pdf/textkit gets passed a couple runs generated in getAttributedString as fragments but turned into runs with fromFragments() (it can generated excess fragments by not reusing old string fragments when possible). These runs start going through textkit's layoutEngine. Each one of these runs generate 3 more runs in preprocessRuns which then are flattened into a flat set of runs with unique attributes applied to each part. I started to optimize this but switch focus.

The above, when applied to many simple text elements causes a lot of excess work that doesn't need to be done to happen.

@imatwork
Copy link
Contributor Author

I have been able to solve my problem by switching to pdfkit and rendering my table pages through there instead. This PR would allow me to do this in the main package.

@astahmer
Copy link

related, created kind of the same issue but in the wrong repo (😅 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants