-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak on repeat requests to PDF rendered by server #258
Comments
I can confirm that when you call document.end() there is a memory leak detected by node-memwatch. it's even detected if you use memwatch.on('leak', function(){}) i haven't seen it leak yet until I call document.end though.
|
It doesn't seem obvious to me where the leak is occurring. Does it happen if you just make a bunch of empty documents every request rather than adding content? Trying to deduce where it may be happening. Could even be at the Node stream level, but that's less likely. |
I didn't notice it until I did a large document and ran my computer out of memory forcing lots of gc's... will try to look into it more. |
I haven't been able to isolate the problem. it is possible my problem is unrelated to pdfkit but i'm still trying to figure it out. my above statement where memwatch only shows leak after calling document.end also may be a red herring. I think this is possibly the only chance where the js event loop lets memwatch output the stats in my sample program. |
After a bit of poking around, it seems that the leak is occurring in Here's a much simpler server that leaks memory every time a request is made:
If I take out the call to addPage(), there's no leak, even if I add a bunch of text and graphics. |
Here's another interesting behavior: If I run the server above (which creates a blank 1000-page PDF doc each time), the heap shows the following pattern:
I'm not sure what would explain this behavior -- it seems like the first 3 requests leak memory, but subsequent requests don't. Could there be some kind of cache in the global state of PDFKit that eventually starts filling up and purging old objects? I tried calling the global v8 garbage collector at the start of every request, to make sure this wasn't just a problem with lazily collecting garbage objects, but that didn't change the behavior at all. |
No, there should be no global state anywhere within PDFKit. I can look into this more this weekend. In the mean time, have you tried disabling compression? You can do so when you create the document: var doc = new PDFDocument({ compress: false }); Just trying to weed out all possible external factors. |
Hello, I'm still trying to blame my own code for the issue but I'm doing a comparison of heap snapshots using node-webkit-agent and i'm seeing a lot of retained closures below are some screen shots of what i'm looking at. in the final image i've highlighted classes that I figure that i'm leaking. ^ all of the closures leaked look to be of the same type |
Okay.... I'm not sure if the issue i'm reporting is same issue as the OP any more but after some diagnostics I believe that fs.createWriteStream is the problem for me. I removed PDFKit completely from my project and simply wrote out text data using the same stream i piped into the pdf document.
Essentially instead of writing
I called stream.write directly (no pdfkit involved)
I ran into the same issue: the program runs and eventually leaks enough memory to start thrashing on the node garbage collector. To further diagnose the issue I replaced the stream in my program with a regular file descriptor and used it to write out my text instead;
The program ran successfully. In conclusion I actually don't think my problem is caused by pdfkit itself but rather fs.createWriteStream. I'm looking at ways of working around this issue now. |
Which version of node are you running? You should report this bug to node if it hasn't already been fixed in a newer version. |
I'm running 10.29... i'm gonna try upgrading to 10.30 and see if it helps. |
to confirm i'm seeing the same thing on node 10.30, going to try to make a minimum test case for fs.createWriteStream to make certain it's not me doing something weird. |
I've created a test case and filed a bug here: |
Okay, so i created a new file stream that doesn't have the issues i ran into: It'll still be slow if you don't set a sane growth pattern but doesn't die completely like fs.createWriteStream. Now that I can write out my pdf I did notice an issue with compress: true vs compress: false but I will open a new topic and post there. |
@devongovett, here's the simplest possible example I can come up with to reproduce the leak. There's no server or anything -- it's just creating a bunch of If I turn compression off, the memory usage shrinks, but there's still a leak.
|
I'm trying to sort out the problem with turning compression on so I will have to tackle {compress: true} later. However, I cannot confirm that pdfkit is leaking memory in your example. Here is my test: https://gist.github.com/ashelley/46a89bfa8dd09b60e79a I am using jmeter to simulate 100 concurrent users hitting the server. After 5000 requests to the server it seems that the server is using just as much memory as when it started. It seems like node is able to garbage collect everything no problem. Basically, I try not to look at the memwatch stats output too deeply because I think these numbers are confusing and possibly will mislead me into thinking there is a problem when there is not. What I do for finding a memory leak is put a high load on the server using jmeter and watch the output of this command:
Even after generating over 5 million pages my server never increases memory usage:
This is what it looks like only after generating a approximately 10000 pages
Personally I'd say it looks like there is no problem with compression turned off. I'm working at trying to make it possible to test with compression turned but due to performance problems that I need to solve I currently cannot test {compress: true} EDIT: This is what the server looks like after generating over 10000000 pages
|
Okay, Here is the server code I am testing with: With 30 concurrent users generating a total of 1000 requests (1000000 pages) my node server floats at around 1GB of ram:
Note: going over 30 concurrent users simply eats too much memory for node to handle... you can squeeze like 40 concurrent users out of it and float at around 1.3GB, 50 users pretty much maxes out node at around 1.6GB I decided to checkout the 0.4.3 release of pdfkit (last version before streams api) and I could easily run up to 100 concurrent users... here is what that looked like (1000 requests approx, 1000000 pages):
EDIT: After 160000000 pages
Note: i just copy and pasted it at 1.2G it still garbage collects down to around 1GB |
var fs = require('fs');
var PDFDocument = require('pdfkit');
var util = require('util');
var memwatch = require('memwatch');
function generatePDF (i) {
var doc = new PDFDocument({ compress: false });
doc.pipe(fs.createWriteStream('file' + i + '.pdf'));
doc.end();
}
for (var i=0; i<=10000; i++) {
generatePDF(i);
// console.log(util.inspect(process.memoryUsage()));
var mem = process.memoryUsage();
console.log('FINISHED ' + mem.heapUsed/1000000 +'MB/'+mem.heapTotal/1000000+'MB - ' + mem.rss/1000000+'MB');
//global.gc();
memwatch.gc();
} This simple code is leaking and will crash on too many open files '/.../..path to some font file' |
Please check this code var fs = require('fs');
var PDFDocument = require('pdfkit');
var util = require('util');
function generatePDF (i) {
var doc = new PDFDocument({ compress: false });
var wstream = fs.createWriteStream('file' + i + '.pdf');
doc.pipe(wstream);
doc.end();
wstream.end();
}
for (var i=0; i<=10000; i++) {
generatePDF(i);
// console.log(util.inspect(process.memoryUsage()));
var mem = process.memoryUsage();
console.log('FINISHED ' + mem.heapUsed/1000000 +'MB/'+mem.heapTotal/1000000+'MB - ' + mem.rss/1000000+'MB');
} It is simple example of using pdfkit to generate many files. You can see how memory usage will grow with each replication. Please can someone tell me if this problem will be solved? It will be very helpfull. Thank you. |
Update.... Even if you remove writeStream from my example (so the pdf will be generated only in memory) there is still huge memory leak. So it can't be caused by fs - writeStream |
You are doing great job guys. |
Hello, Are you sure this is a memory leak? My problem I found was related to the fact that node streams has a bug in it so that if you write past the streams high water mark node becomes very slow because it is does a slice on an array every write operation. Also it was not easy at the time to fix pdfkit to make its code asynchronous: (note: the above issue links to the underlying issues with node streams) I have not yet worked on this project in a while but I have a feeling some of this will be fixed in node 0.12 although making pdfkit be async would also be a good thing too. (not sure if it is yet) One thing to realize is that node will try to use as much memory as it can and it's not going to GC memory it doesn't have to. Unless you can make node crash with this code by increasing the loop count and letting it run longer you can't really look at increased memory usage as a bad thing as only v8 decides what it's going to garbage collect and chances are it won't garbage collect at all until the javascript runs out of memory at around 1.8GB or your computer runs out of memory (whatever is smaller). EDIT: I'm not sure if this issue was linked in the above issue so here it is: |
Hello @TrashOut @miklacko @adamperelman I've done some more work on looking at memory usage in PDFKit. The backlog is here: Specifically, relating to a memory leak, I've updated my test script described in the above issue to produce multiple documents. I configured it to use "the workaround" which requires this patch to be installed into pdf kit: Here is a link to the updated test script I used: https://gist.github.com/ashelley/bd15fbbb642ac45a1e47 I ran it over night contraining v8 to just 50MB of heap space:
As you can see that the script completed generating 2.0GB (500x500page pdfs). You can see the full output here: https://gist.github.com/ashelley/b3213e60278a7a1128aa Noting that I did only restrict the max_old_space_size to 50 megabytes the performance was pretty consistent through out the night and it definitely did not OOM. (which is what happens if you try to use this script without my workaround)
There was a consistent slowdown near the end (potentially due to gradual GC thrashing) but I can't confirm there was a memory leak because of PDFKit,... it could have just been the morning processes on my system or a problem with my own test script. I still want to say there is no memory leak in PDFKit due to the fact that this test script was able to complete in full in a very restricted memory footprint. Hope this helps |
Old thread, but I wanted to share my experience to possibly help the next person. I also thought PDFKit had a memory leak, but @ashelley is right, it doesn't. What I didn't realize was that if you use it wrong, it's easy to create a memory leak if you're creating multiple documents. So, let's assume you're doing something like this:
You'd probably be surprised to see that you're running out of memory. Isn't everything done when you call doc.end()? NO IT'S NOT! The solution? Do this instead:
The reports will still be created in order, but NodeJS will have a chance to write the first report to disk before creating the second report, and so on. |
Some mitigations were done in recent releases. Test with current release and reopen if still having the issue |
I've been trying to track down a memory leak in my pdfkit-based project and stumbled on this thread. Currently, we are pinned to version 0.11.0 of pdfkit. I seem to have pinpointed the issue to the fonts mixin, but I am not sure how to properly fix it. You can reproduce it by following the steps below.
Running the first script, the memory continues to climb. Running the second script, it stays relatively stable around the same amount. I think the culprit may be with the standard fonts, as I've also experimented with using a TTF file (Roboto) as a default font as well, and memory usage remained relatively low. |
Was also experiencing a major memory leak while generating only roughly 60 documents (one page invoices). I'm using custom font which I registered like this When I replaced the reference to my "normal" font definition the memory was not leaking anymore |
We're running into an issue rendering PDFs on the server, where each rendered PDF seems to be creating a memory leak.
Here's a small demo Express server that demonstrates the issue:
https://gist.github.com/adamperelman/f80c6f8afcc90753187d
The memory allocated by the server increases with every request.
The text was updated successfully, but these errors were encountered: