-
-
Notifications
You must be signed in to change notification settings - Fork 704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible memory leak following rendering #220
Comments
WeasyPrint is known to use a lot of memory during rendering. Maybe that can be reduced, but I suspect not dramatically without deep refactoring. Now, memory not being freed is another story. These may be relevant: http://effbot.org/pyfaq/why-doesnt-python-release-the-memory-when-i-delete-a-large-object.htm It’s also possible that there is indeed a leak in WeasyPrint or one of its dependencies. Unfortunately, if that is the case, I don’t have an immediate idea of where to look, nor much time available to investigate. I’m interested in your findings if you want to look into it, though. |
@ivanprice I've been using WeasyPrint to generate lots of big reports and documents, it can use a lot of memory but doesn't leak, at least for me. I've also spent a lot of time tracking memory usage in #384 and fixed all the memory leaks introduced by adding |
I'm having a similar issue with WeasyPrint running on a Celery task. I'm using a following line of code to render PDF into a file-like object: content_file = ContentFile(HTML(string=html, encoding='utf-8').write_pdf()) The Whatever WeasyPrint objects shouldn't even exist by the point Any hints about where to even start looking into it? |
As explained in the comment above, there's no reliable way to free the memory in Python, even with the garbage collector. The way to know if there's a "real" memory leak is to call your rendering multiple times and see if the memory is growing more and more. I've tried hard to hunt and kill memory leaks in #384 but I may have missed some of them, especially with tables (see #70). I'll try with big tables and see if I can reproduce. If you can provide a sample HTML file, it may help too. |
Here it is: Thanks for looking into it! Just for reference, I tried running the same function, which generates this PDF, in an IPython shell and it also doesn't reclaim the memory after the function returns. However, on multiple runs it grows only slightly past this 1.4G to 1.5G. So the bulk of it probably isn't really a memleak but still it's odd that such a huge amount of memory doesn't get reclaimed. |
The number of objects seems to increase substantially between the runs:
|
I tried this code with your sample: import gc
import weasyprint
print(len(gc.get_objects()))
for i in range(3):
weasyprint.HTML('/tmp/big_table.html', encoding='utf-8').write_pdf('/dev/null')
print(len(gc.get_objects())) I get:
It looks normal. I'm using:
Note that using an interactive shell may introduce side effects. For example, the default Python shell keeps the result of the last command in a variable called |
I can confirm that this minimal example doesn't increase the object count between the runs so this might be something else. However, I can see that the memory is not being reclaimed anyway. The environment running this code for me is rather old: Ubuntu 12.04, Python 2.7.3, latest WeasyPrint, Cairo 1.10.2, Pango 1.30.0. I thought maybe it's better with more recent dependencies so I ran it on my host with Fedora 25, Python 3.5.3, latest WeasyPrint, Cairo 1.14.8, Pango 1.40.5. In this case total memory usage is somewhat better but still doesn't reclaim anything until process shutdown. I also tried it with Fedora's bundled WeasyPrint 0.22 and it actually had object count increasing, while using no more than 900M of RSS. It didn't reclaim either though. |
When objects are garbage-collected the space they occupied becomes available for new Python objects but is not necessarily returned to the operating system, because of details of how the memory allocator works: http://effbot.org/pyfaq/why-doesnt-python-release-the-memory-when-i-delete-a-large-object.htm |
@SimonSapin Thanks, I've read it but this can actually be a problem on a memory-constrained environment (such as typical EC2 instances). Basically there seem to be two possible ways from here:
|
Killing the process is the only reliable way to get your memory back.
#70 is for you! |
As there's no evidence that there's a memory leak (even if I tried hard many times to find one), I think that we can close the bug. Decreasing WeasyPrint's memory footprint with large tables is a good idea, #70 is open to track this improvement. |
Thank you @liZe and @SimonSapin for looking into it! |
@excieve FWIW we pass |
First off: awesome work with WeasyPrint, it's super great !
We're using WeasyPrint under django and are experiencing a memory usage issue. if we make a big pdf (~360 pages with graphics) at the moment we do document.render() the memory jumps up a lot (1.4G), the problem is that once the pdf is made and the request is finished the webserver is still hanging onto that slice of memory, it seems never to be released.
Is anyone aware of memory release issues using WeasyPrint ? the relevant code is here:
maybe we should be spawning a new process to run WeasyPrint in for such large PDFs ? we've tried calling gc.collect() to no effect.
any ideas would be appreciated, cheers
-ivan
The text was updated successfully, but these errors were encountered: