Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(dev/core#574) Prevent memory exhaustion when generating large PDFs #13232

Merged
merged 1 commit into from
Nov 15, 2019

Conversation

JO0st
Copy link
Contributor

@JO0st JO0st commented Dec 5, 2018

Overview

Creating large PDF crashes because of a memory error. This pull request lessens the memory impact of creating large PDFs.

Before

Creating large PDFs crashes with a memory error

After

Large pdfs get created

Technical Details

When creating a PDF, certain HTML tags get filtered out of the HTML before passing it to wkhtml2pdf. This was done using preg_replace. This method uses some memory, which sometimes is enough to make PHP run out of memory. mb_eregi_replace uses less memory.

Comments

The creation of a PDF sometimes still gives a memory error; however this is happening in the library snappy. This pull request should address that.

@civibot
Copy link

civibot bot commented Dec 5, 2018

(Standard links)

@civibot civibot bot added the master label Dec 5, 2018
@JO0st JO0st changed the title dev-core-574 dev-core-574-memory-error-large-pdfs Dec 5, 2018
@mattwire
Copy link
Contributor

mattwire commented Dec 5, 2018

A possible issue with this is I don't think it's guaranteed that PHP would have the mb* functions available. This has come up before and there has been consideration to require those functions but I can't remember the outcome @colemanw?

@JO0st
Copy link
Contributor Author

JO0st commented Dec 6, 2018

@mattwire according to the system administrator guide the multibyte php extension should be installed on the system. This php extension gives the mb* functions.
another option would be to do a check wehter the functions are available and use them if they are with a fallback to preg_replace. This is done here

@totten totten changed the title dev-core-574-memory-error-large-pdfs (dev/core#574) Prevent memory exhaustion when generating large PDFs Dec 6, 2018
@totten
Copy link
Member

totten commented Dec 7, 2018

In addition to the Sysadmin Guide, these three files all have validation checks for multibyte functionality (function_exists('mb_substr')):

  • install/index.php
  • CRM/Utils/Check/Component/Env.php
  • Civi/Install/Requirements.php

Not-with-standing some edge-cases involving old installations or polyfill functions, that means most systems should have it installed.

(Note: There was some discussion about an internalization extension dependency earlier this year -- looked it up and found #12633; however, that's a different extension -- iconv -- which doesn't have the same kind of pre-install validations.)

@totten
Copy link
Member

totten commented Dec 8, 2018

I just want to say... the patch looks small, but -- given how it relies on fairly obscure/non-obvious effects of PHP's runtime -- it probably took a bunch of research. Kudos on that!

Trying to keep it moving - here's a partial review:

(CiviCRM Review Template WORD-1.2)

  • General standards
  • Developer standards
    • (r-tech): Soft Pass: There's good point above about how this requires mbstring extension; but the install process already checks for it.
    • (r-code) Pass: Looks stylistically consistent. Added a small suggestion, but it's not critical/blocker for the PR.
    • (r-maint) Soft Pass: Generally, the best way to ensure that this kind of requirement is met in the future would be to include performance/load testing as part of the official suites. However, that's scope-creep, and we don't need to hold-up this fix.
    • (r-test) Pass: The reported is a common false-negative

@eileenmcnaughton
Copy link
Contributor

Hmm looks like this stalled due to only getting a partial review & now it's stale. Ideally we'd get a test but it needs rebasing at minimum

@JO0st
Copy link
Contributor Author

JO0st commented Mar 1, 2019

I have rebased the pull request on the current master

@eileenmcnaughton
Copy link
Contributor

@JO0st it still says there is a conflict

@JO0st
Copy link
Contributor Author

JO0st commented Mar 1, 2019

@JO0st it still says there is a conflict

used the wrong master, should be fixed now

@davejenx
Copy link
Contributor

@JO0st Thanks for this PR. I have been testing and have so far not found a scenario that runs out of memory without the patch but succeeds with it. I have mostly tested with the Constituent Summary report with varying numbers of rows and columns. Initially I got memory errors in DOMPDF, I then installed wkhtmltopdf and used this in all subsequent testing. Tested with memory limit 128MB.

With Constituent Summary, when memory ran out, it did so in a compiled template:
templates_c/en_GB/%%F5/F54/F5478576%%Table.tpl.php
at different lines depending on the number of columns.

In a different scenario memory ran out in Snappy as described at KnpLabs/snappy#335.

Also tried PDF letters & mailing labels, have not so far encountered memory errors with these, Currently importing more contacts to test further.

Could you give more info about your test scenario, that might assist with replicating the problem? E.g. what operation you are performing & details such as which report & which fields, with how many contacts, your memory limit etc.

Thanks,

Dave

@eileenmcnaughton
Copy link
Contributor

I think @alifrumin is working on upgrading snappy #14299 - not sure how that will fit in

@JO0st
Copy link
Contributor Author

JO0st commented Jun 5, 2019

@davejenx It happens on a custom report of ours. This contains a lot of information on a single "page" in the pages variable. With a 128MB memory limit it currently craps out with a report containing about 1000 fysical pages in a single $page.

@davejenx
Copy link
Contributor

@JO0st I've done more testing, concentrating on having a lot of info on a page but it always runs out of memory in files/civicrm/templates_c/en_GB/%%F5/F54/F5478576%%Table.tpl.php, with or without this PR. It's difficult to test the PR properly without a reproducible test case that the PR fixes.

In a scenario where we don't run out of memory, I tried creating the PDF with & without the patch and comparing the output. Visually they look the same, though the files themselves differ, e.g. fonts occurring in a different order. I don't suppose that's a real concern.

So on (r-run) I would say Soft Pass, because it reportedly fixes the reporter's case and appears to do no harm. A reproducible test case would get this up to a pass if it works.

I guess it's the merger's call how to weigh up the soft passes on (r-run) & (r-tech.)

@mlutfy
Copy link
Member

mlutfy commented Nov 15, 2019

I wonder how it improves performance, but I would say that this PR improves the readability of the code, and would be in favour of merging.

@mlutfy
Copy link
Member

mlutfy commented Nov 15, 2019

jenkins, test this please

@totten
Copy link
Member

totten commented Nov 15, 2019

(@mlutfy) I wonder how it improves performance, but I would say that this PR improves the readability of the code, and would be in favour of merging.

Ditto. This probably should have been merged after @davejenx's review/r-run. I agree that the change looks basically safe and feel inclined to take @JO0st's report that it improves the situation for some scenarios.

The main risk is that this maybe grew stale - but I think it's OK. Looks like Mathieu's new test-run passed. ✅ Also, I did a very light r-run on the autobuild test site, and the PDF generator still generated PDFs. ✅

Anyway, cheers @JO0st @davejenx @mlutfy etal for moving this along.

@totten totten merged commit 08ff628 into civicrm:master Nov 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants