Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export fails on initial request with "ContentNotFoundError" from wkhtmltopdf #2

Open
robbieaverill opened this issue Jan 24, 2018 · 4 comments

Comments

@robbieaverill
Copy link
Contributor

robbieaverill commented Jan 24, 2018

From #1:

Noted issue: clicking the "Export to PDF" link for a page initially fails with this message:

Fatal error: wkhtmltopdf failed: Exit with code 1 due to network error: ContentNotFoundError in /webroot/vendor/cwp/cwp-pdfexport/src/Extensions/PdfExportControllerExtension.php on line 164

Subsequent requests succeed and open the PDF.

I suspect this could be a macOS issue with piping the wkhtmltopdf output to stdout. We could write it to the PHP temp dir instead of piping to stdout.


To reproduce:

  • Ensure PDF export is enabled for pages
  • Load a page on the frontend
  • Click "Export to PDF" (note - I'm using the starter theme)
  • Observe this error
  • Refresh the screen and observe a valid download

You can clear the PDF generated on initial load via vendor/bin/sake dev/tasks/CleanupGeneratedPdfBuildTask which will allow you to continue to reproduce the initial PDF export page load with the error, while running this task afterwards each time.

@robbieaverill
Copy link
Contributor Author

robbieaverill commented Jan 25, 2018

Ok, it's not the piping to stdout that's the issue but looks like an error in wkhtmltopdf caused by having certain kinds of content in the HTML. We're not the only ones! See wkhtmltopdf/wkhtmltopdf#2051. Updated title to reflect this change.

@robbieaverill robbieaverill changed the title Modify exec command to write to temp file rather than stdout Export fails on initial request with "ContentNotFoundError" from wkhtmltopdf Jan 25, 2018
@robbieaverill
Copy link
Contributor Author

The cause is one or more of the following:

  1. a page has a URL without a protocol in it, e.g. <script src="//jquery.cdn.com/whatever.js"> - very common
  2. a page has a URL with a query string in the URL, e.g. <script src="http://mysite.com/javascript.js?m=1u983243"> - SS framework adds this to resources by default
  3. a page has a URL which is 404

We can "fix" the first two points with string manipulation, but we can't do anything about broken links in the page. It looks like a combination of one or more of those things causes the PDF generation to fail with an error message on the first attempt. It still generates a PDF but there's no guarantee that it would look the same as you'd expect it to.

I don't think we can fix this problem. There's 70 others on this GitHub issue complaining about the same problem, and it hasn't been acknowledged or fixed in 3 years so I don't expect it will be. wkhtmltopdf/wkhtmltopdf#2051

Our options:

  1. ignore the errors and use whatever it DOES generate. note this in the documentation (quick fix)
  2. use a different PDF generation library (more work)
  3. remove this functionality altogether and tell people to use their browsers to export PDFs (longer term this is the best solution)

@robbieaverill
Copy link
Contributor Author

Further - regression tested against CWP 1.x:

  • Relative URLs don't cause this
  • Protocol-less URLs don't cause this
  • Query string cache busting doesn't cause this
  • 404 links in content doesn't cause this
  • 404 links in script or stylesheet tags does cause this

Since it affects the CWP 1.x equivalent code as well, I'm going to reduce the impact slightly and treat it as non-blocking for release. This affects script and stylesheet references that don't exist, but we should be able to expect a lower number of these, since they're generally critical for a website to function. 404s in content would be more concerning, but that's not a problem.

@robbieaverill
Copy link
Contributor Author

For clarity, the affects/v3 label refers to the same functionality in the cwp/cwp module

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants