Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak on repeat requests to PDF rendered by server #258

Closed
adamperelman opened this issue Jul 30, 2014 · 27 comments
Closed

Memory leak on repeat requests to PDF rendered by server #258

adamperelman opened this issue Jul 30, 2014 · 27 comments

Comments

@adamperelman
Copy link

We're running into an issue rendering PDFs on the server, where each rendered PDF seems to be creating a memory leak.

Here's a small demo Express server that demonstrates the issue:
https://gist.github.com/adamperelman/f80c6f8afcc90753187d

The memory allocated by the server increases with every request.

@ashelley
Copy link

I can confirm that when you call document.end() there is a memory leak detected by node-memwatch. it's even detected if you use memwatch.on('leak', function(){})

i haven't seen it leak yet until I call document.end though.

{ start: Wed Jul 30 2014 14:55:16 GMT-0700 (PDT),
  end: Wed Jul 30 2014 14:55:16 GMT-0700 (PDT),
  growth: 85262720,
  reason: 'heap growth over 5 consecutive GCs (0s) - -2147483648 bytes/hr' }
{ start: Wed Jul 30 2014 14:55:16 GMT-0700 (PDT),
  end: Wed Jul 30 2014 14:55:16 GMT-0700 (PDT),
  growth: 403146464,
  reason: 'heap growth over 5 consecutive GCs (0s) - -2147483648 bytes/hr' }

@devongovett
Copy link
Member

It doesn't seem obvious to me where the leak is occurring. Does it happen if you just make a bunch of empty documents every request rather than adding content? Trying to deduce where it may be happening. Could even be at the Node stream level, but that's less likely.

@ashelley
Copy link

I didn't notice it until I did a large document and ran my computer out of memory forcing lots of gc's... will try to look into it more.

@ashelley
Copy link

I haven't been able to isolate the problem. it is possible my problem is unrelated to pdfkit but i'm still trying to figure it out. my above statement where memwatch only shows leak after calling document.end also may be a red herring. I think this is possibly the only chance where the js event loop lets memwatch output the stats in my sample program.

@adamperelman
Copy link
Author

After a bit of poking around, it seems that the leak is occurring in addPage().

Here's a much simpler server that leaks memory every time a request is made:

var express = require('express');
var PDFDocument = require('pdfkit');
var memwatch = require('memwatch');
var router = express.Router();

router.get('/', function(req, res) {
  var hd = new memwatch.HeapDiff();

  console.log("creating document...");

  var doc = new PDFDocument();
  for (var i = 0; i < 1000; i++) {
    doc.addPage();
  }
  doc.end();

  var diff = hd.end();
  console.log("diff:", diff);

  res.send(diff);
});

module.exports = router;

If I take out the call to addPage(), there's no leak, even if I add a bunch of text and graphics.

@adamperelman
Copy link
Author

Here's another interesting behavior:

If I run the server above (which creates a blank 1000-page PDF doc each time), the heap shows the following pattern:

  • after 1 request: ~25mb of memory are used by the heap
  • after 2 requests: ~50mb of memory are used
  • after 3 requests: ~75mb of memory are used
  • after more than 3 requests: still about ~75mb of memory are used; the heap never seems to grow beyond that, even if I continue refreshing the page

I'm not sure what would explain this behavior -- it seems like the first 3 requests leak memory, but subsequent requests don't.

Could there be some kind of cache in the global state of PDFKit that eventually starts filling up and purging old objects?

I tried calling the global v8 garbage collector at the start of every request, to make sure this wasn't just a problem with lazily collecting garbage objects, but that didn't change the behavior at all.

@devongovett
Copy link
Member

No, there should be no global state anywhere within PDFKit. addPage creates a PDFPage object each time it is called, which would obviously allocate memory. However, this page object only stays around until it is written to the file, which would be whenever another page is added.

I can look into this more this weekend. In the mean time, have you tried disabling compression? You can do so when you create the document:

var doc = new PDFDocument({ compress: false });

Just trying to weed out all possible external factors.

@ashelley
Copy link

Hello,

I'm still trying to blame my own code for the issue but I'm doing a comparison of heap snapshots using node-webkit-agent and i'm seeing a lot of retained closures

below are some screen shots of what i'm looking at. in the final image i've highlighted classes that I figure that i'm leaking.

closure

^ all of the closures leaked look to be of the same type

buffer
summary

@ashelley
Copy link

Okay....

I'm not sure if the issue i'm reporting is same issue as the OP any more but after some diagnostics I believe that fs.createWriteStream is the problem for me.

I removed PDFKit completely from my project and simply wrote out text data using the same stream i piped into the pdf document.

var stream = fs.createWriteStream('output.pdf');
doc.pipe(stream);

Essentially instead of writing

doc.text() 

I called stream.write directly (no pdfkit involved)

stream.write() 

I ran into the same issue: the program runs and eventually leaks enough memory to start thrashing on the node garbage collector.

To further diagnose the issue I replaced the stream in my program with a regular file descriptor and used it to write out my text instead;

fd.write(new Buffer()) 

The program ran successfully.

In conclusion I actually don't think my problem is caused by pdfkit itself but rather fs.createWriteStream.

I'm looking at ways of working around this issue now.

@devongovett
Copy link
Member

Which version of node are you running? You should report this bug to node if it hasn't already been fixed in a newer version.

@ashelley
Copy link

I'm running 10.29... i'm gonna try upgrading to 10.30 and see if it helps.

@ashelley
Copy link

to confirm i'm seeing the same thing on node 10.30, going to try to make a minimum test case for fs.createWriteStream to make certain it's not me doing something weird.

@ashelley
Copy link

ashelley commented Aug 1, 2014

I've created a test case and filed a bug here:

nodejs/node-v0.x-archive#8048

@ashelley
Copy link

ashelley commented Aug 1, 2014

Okay, so i created a new file stream that doesn't have the issues i ran into:
https://github.com/ashelley/writable-filestream

It'll still be slow if you don't set a sane growth pattern but doesn't die completely like fs.createWriteStream. Now that I can write out my pdf I did notice an issue with compress: true vs compress: false but I will open a new topic and post there.

@adamperelman
Copy link
Author

@devongovett, here's the simplest possible example I can come up with to reproduce the leak. There's no server or anything -- it's just creating a bunch of PDFDocuments in a loop.

If I turn compression off, the memory usage shrinks, but there's still a leak.

# Run with: coffee --nodejs --expose-gc demo.coffee

PDFDocument = require 'pdfkit'
memwatch = require 'memwatch'

for i in [1..20]
  # Force garbage collection.
  gc()

  # Capture a snapshot of the heap.
  hd = new memwatch.HeapDiff()

  # Create a doc with a bunch of blank pages.
  console.log "creating document #{i}"
  doc = new PDFDocument compress: false
  for j in [1..500]
    doc.addPage()
  doc.end()

  # Voila -- a leak!
  # The size of the heap increases by about 10mb
  # with each iteration of the loop (if compression is on),
  # or about 1.5mb per iteration (if compression is off).
  diff = hd.end()
  console.log diff

@ashelley
Copy link

@adamperelman

I'm trying to sort out the problem with turning compression on so I will have to tackle {compress: true} later. However, I cannot confirm that pdfkit is leaking memory in your example.

Here is my test:

https://gist.github.com/ashelley/46a89bfa8dd09b60e79a

I am using jmeter to simulate 100 concurrent users hitting the server. After 5000 requests to the server it seems that the server is using just as much memory as when it started. It seems like node is able to garbage collect everything no problem.

Basically, I try not to look at the memwatch stats output too deeply because I think these numbers are confusing and possibly will mislead me into thinking there is a problem when there is not.

What I do for finding a memory leak is put a high load on the server using jmeter and watch the output of this command:

top -p `pgrep node | tr "\\n" "," | sed 's/,$//'`

Even after generating over 5 million pages my server never increases memory usage:

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND                                                                                                              
14115 adam      20   0  706m  75m 5980 R  89.0  3.7   9:07.24 node 

This is what it looks like only after generating a approximately 10000 pages

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND                                                                                                              
14170 adam      20   0  704m  73m 5980 R  96.9  3.7   0:13.30 node                                                                                                                 

Personally I'd say it looks like there is no problem with compression turned off. I'm working at trying to make it possible to test with compression turned but due to performance problems that I need to solve I currently cannot test {compress: true}

EDIT:

This is what the server looks like after generating over 10000000 pages

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND                                                                                                              
14170 adam      20   0  705m  74m 5980 R  90.9  3.7  24:26.46 node 

@ashelley
Copy link

@adamperelman

Okay,
I think I can confirm that with {compress: true} on pdfkit doesn't have a memory leak.

Here is the server code I am testing with:
https://gist.github.com/ashelley/20b1da610efd46202286

With 30 concurrent users generating a total of 1000 requests (1000000 pages) my node server floats at around 1GB of ram:

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND                                                                                                                                                                                                          
20956 adam      20   0 1732m 1.0g 6024 R  88.6 53.0   6:58.94 node   

Note: going over 30 concurrent users simply eats too much memory for node to handle... you can squeeze like 40 concurrent users out of it and float at around 1.3GB, 50 users pretty much maxes out node at around 1.6GB

I decided to checkout the 0.4.3 release of pdfkit (last version before streams api) and I could easily run up to 100 concurrent users... here is what that looked like (1000 requests approx, 1000000 pages):

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND                                                                                                                                                                                                          
20462 adam      20   0 1575m 895m 6024 R  91.3 44.7   5:08.16 node

EDIT:

After 160000000 pages

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND                                                                                                                                                                                                          
21016 adam      20   0 1921m 1.2g 3284 R  75.9 62.2   1153:33 node        

Note: i just copy and pasted it at 1.2G it still garbage collects down to around 1GB

@miklacko
Copy link

miklacko commented Sep 5, 2014

var fs = require('fs');
var PDFDocument = require('pdfkit');
var util = require('util');
var memwatch = require('memwatch');

function generatePDF (i) {
  var doc = new PDFDocument({ compress: false });
  doc.pipe(fs.createWriteStream('file' + i + '.pdf'));
  doc.end();
}

for (var i=0; i<=10000; i++) {
  generatePDF(i);
  // console.log(util.inspect(process.memoryUsage()));
  var mem = process.memoryUsage();
  console.log('FINISHED ' + mem.heapUsed/1000000 +'MB/'+mem.heapTotal/1000000+'MB  -  ' + mem.rss/1000000+'MB');
  //global.gc();
  memwatch.gc();
}

This simple code is leaking and will crash on too many open files '/.../..path to some font file'
I think you open font file and never close it. So it stacking in memory.
And check console.log and memory usage. It's rising with each interation.

@miklacko
Copy link

miklacko commented Oct 1, 2014

Please check this code

var fs = require('fs');
var PDFDocument = require('pdfkit');
var util = require('util');

function generatePDF (i) {
  var doc = new PDFDocument({ compress: false });
  var wstream = fs.createWriteStream('file' + i + '.pdf');
  doc.pipe(wstream);
  doc.end();
  wstream.end();
}

for (var i=0; i<=10000; i++) {
  generatePDF(i);
  // console.log(util.inspect(process.memoryUsage()));
  var mem = process.memoryUsage();
  console.log('FINISHED ' + mem.heapUsed/1000000 +'MB/'+mem.heapTotal/1000000+'MB  -  ' + mem.rss/1000000+'MB');
}

It is simple example of using pdfkit to generate many files. You can see how memory usage will grow with each replication.

Please can someone tell me if this problem will be solved? It will be very helpfull.

Thank you.

@miklacko
Copy link

miklacko commented Oct 1, 2014

Update....

Even if you remove writeStream from my example (so the pdf will be generated only in memory) there is still huge memory leak. So it can't be caused by fs - writeStream

@TrashOut
Copy link

You are doing great job guys.
It's leaking as described above by Michal Klacko.
We are running on Heroku. Do you know how to fix it?

@ashelley
Copy link

Hello,

Are you sure this is a memory leak?

My problem I found was related to the fact that node streams has a bug in it so that if you write past the streams high water mark node becomes very slow because it is does a slice on an array every write operation. Also it was not easy at the time to fix pdfkit to make its code asynchronous:

#269

(note: the above issue links to the underlying issues with node streams)

I have not yet worked on this project in a while but I have a feeling some of this will be fixed in node 0.12 although making pdfkit be async would also be a good thing too. (not sure if it is yet)

One thing to realize is that node will try to use as much memory as it can and it's not going to GC memory it doesn't have to. Unless you can make node crash with this code by increasing the loop count and letting it run longer you can't really look at increased memory usage as a bad thing as only v8 decides what it's going to garbage collect and chances are it won't garbage collect at all until the javascript runs out of memory at around 1.8GB or your computer runs out of memory (whatever is smaller).

EDIT:

I'm not sure if this issue was linked in the above issue so here it is:

nodejs/node-v0.x-archive#6623

@ashelley
Copy link

Hello @TrashOut @miklacko @adamperelman

I've done some more work on looking at memory usage in PDFKit. The backlog is here:

#269 (comment)

Specifically, relating to a memory leak, I've updated my test script described in the above issue to produce multiple documents. I configured it to use "the workaround" which requires this patch to be installed into pdf kit:

ashelley@82bdfb6

Here is a link to the updated test script I used:

https://gist.github.com/ashelley/bd15fbbb642ac45a1e47

I ran it over night contraining v8 to just 50MB of heap space:

$time node --max_old_space_size=50 pdfkit-perf-test-multiple docs.js
...
real    666m45.933s
user    251m14.190s
sys     119m25.874s

As you can see that the script completed generating 2.0GB (500x500page pdfs). You can see the full output here:

https://gist.github.com/ashelley/b3213e60278a7a1128aa

Noting that I did only restrict the max_old_space_size to 50 megabytes the performance was pretty consistent through out the night and it definitely did not OOM. (which is what happens if you try to use this script without my workaround)

$ time node --max_old_space_size=50 perf3-test1.js 
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
Aborted (core dumped)

There was a consistent slowdown near the end (potentially due to gradual GC thrashing) but I can't confirm there was a memory leak because of PDFKit,... it could have just been the morning processes on my system or a problem with my own test script. I still want to say there is no memory leak in PDFKit due to the fact that this test script was able to complete in full in a very restricted memory footprint.

Hope this helps

@rsshilli
Copy link

rsshilli commented Oct 4, 2017

Old thread, but I wanted to share my experience to possibly help the next person. I also thought PDFKit had a memory leak, but @ashelley is right, it doesn't. What I didn't realize was that if you use it wrong, it's easy to create a memory leak if you're creating multiple documents.

So, let's assume you're doing something like this:

for(let report of allReports) {
    let doc = new PDFDocument({bufferPages: true});
    doc.pipe(fs.createWriteStream(createFilenameForReport(report)));
    writeReport(doc);
    doc.end();
}

You'd probably be surprised to see that you're running out of memory. Isn't everything done when you call doc.end()? NO IT'S NOT! doc.end() does not actually write everything to disk. It just flushes everything to the stream, which will only write everything to the disk once all the synchronous code is done (ie. at the end of your program).

The solution? Do this instead:

let Promise = require("bluebird"):
Promise.reduce(allReports, (ignored, report) => {
    return new Promise(resolve => {
        let doc = new PDFDocument({bufferPages: true});
        let stream = fs.createWriteStream(createFilenameForReport(report));
        doc.pipe(stream);
        writeReport(doc);
        stream.on("finish", () => {
          console.log("Finished writing to disk.  Only now can the memory be freed.");
          resolve();
        });
        doc.end();
    });
}, 0);

The reports will still be created in order, but NodeJS will have a chance to write the first report to disk before creating the second report, and so on.

@blikblum
Copy link
Member

blikblum commented Mar 3, 2019

Some mitigations were done in recent releases. Test with current release and reopen if still having the issue

@blikblum blikblum closed this as completed Mar 3, 2019
@fstrube
Copy link

fstrube commented Mar 26, 2022

I've been trying to track down a memory leak in my pdfkit-based project and stumbled on this thread. Currently, we are pinned to version 0.11.0 of pdfkit.

I seem to have pinpointed the issue to the fonts mixin, but I am not sure how to properly fix it. You can reproduce it by following the steps below.

  1. Set the available memory for node:
export NODE_OPTIONS="--max-old-space-size=1024"
  1. Execute the following script:
        const PDFDocument = require('pdfkit');

        console.log('memory:', process.memoryUsage().heapUsed / 1024 / 1024, 'MB /', process.memoryUsage().heapTotal / 1024 / 1024, 'MB');

        for (let i = 0; i < 5000; i++) {
            new PDFDocument();
            console.log(i, 'memory:', process.memoryUsage().heapUsed / 1024 / 1024, 'MB /', process.memoryUsage().heapTotal / 1024 / 1024, 'MB');
        }
  1. Compared with the following script:
        const PDFDocument = require('pdfkit');

        console.log('memory:', process.memoryUsage().heapUsed / 1024 / 1024, 'MB /', process.memoryUsage().heapTotal / 1024 / 1024, 'MB');

        for (let i = 0; i < 5000; i++) {
            new PDFDocument({font: false});
            console.log(i, 'memory:', process.memoryUsage().heapUsed / 1024 / 1024, 'MB /', process.memoryUsage().heapTotal / 1024 / 1024, 'MB');
        }

Running the first script, the memory continues to climb. Running the second script, it stays relatively stable around the same amount.

I think the culprit may be with the standard fonts, as I've also experimented with using a TTF file (Roboto) as a default font as well, and memory usage remained relatively low.

@Glogo
Copy link

Glogo commented Dec 11, 2022

Was also experiencing a major memory leak while generating only roughly 60 documents (one page invoices).

I'm using custom font which I registered like this registerFont('normal', './calibri-regular.ttf') and it turned out it was caused by me not referencing this normal font everywhere properly. In some places I used Calibri font name and it was loading the font again. For me this was happening while using pdfkit-table in prepareRow function.

When I replaced the reference to my "normal" font definition the memory was not leaking anymore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants