Performance slow with streams compared to old version #264

ashelley · 2014-08-08T21:57:54Z

Okay,

So I've really been battling performance with pdfkit recently. You can read a bit about my problems here:

I decided to download a known fast version of pdfkit that i've worked with before that I knew was high performance. I picked an arbitrary version from around the time I last used pdfkit on a project. This means that versions after this are probably fast too but I didn't want to dig deep, I just wanted document the difference.

https://github.com/devongovett/pdfkit/tree/933aee7d6062c4f5c5a147c15ca3d999f224f0b9

I have a series of pdf documents that I create for my own testing purposes. I am generating these pdfs 3 times:

current version pdf kit with compression on
current version pdf kit with compression off
old version 933aee with compression on (I think it was always on back then)

Note that I had to change my code slightly to work with the old version of pdf kit. Also note that this is the only change that I'm making to my program before running the tests.

-                       var PdfDocument = require('pdfkit');                    
+                       var PdfDocument = require('/var/www/git/pdfkit');  
...
-                       this.doc.pipe(this.stream);
...
-                       this.doc.end();
+                       var self = this;
                this.stream.on('finish', function() {
                    done(start);            
-               });                     
+               });                             
+                       this.doc.output(function(result) {
+                               self.stream.write(result, 'binary', function() {
+                                       self.stream.end();
+                               });
+                       });             
                }

The test documents produced are visually identical in all scenarios

Example 1:

version	render ms	write ms	bytes	pages
c/on	54	5	1148	1
c/off	54	3	1331	1
933aee	40	5	928	1

Example 2:

version	render ms	write ms	bytes	pages
c/on	14	13	1536	1
c/off	16	12	7795	1
933aee	14	1	1308	1

Example 3:

version	render ms	write ms	bytes	pages
c/on	26	27	1794	1
c/off	29	14	12810	1
933aee	16	5	1558	1

Example 4:

version	render ms	write ms	bytes	pages
c/on	445	454	15181	3
c/off	406	14	306804	3
933aee	144	165	14674	3

Example 5:

version	render ms	write ms	bytes	pages
c/on	148	197	9924	3
c/off	239	5	165274	3
933aee	95	13	9448	3

Example 6:

version	render ms	write ms	bytes	pages
c/on	244	15	21931	8
c/off	216	5	239602	8
933aee	80	20	21041	8

The problem seems to compound when you do large documents to the point where it becomes unusable:

Example 7:

version	render ms	write ms	bytes	pages
c/on	44261	339	896822	469
c/off	27733	56	13606075	469
933aee	12756	517	869524	469

Example 8:

version	render ms	write ms	bytes	pages
c/on	402392 (ouch)	65087	2638032	416
c/off	59203	152	56262905 (ouch)	416
933aee	14531	62721	2590486	416

One thing to note about the above numbers is that "write ms" actually means the time since we called doc.end() until the time we get stream.finish(). We can see by the above numbers that when compression turns on there seems to be some different way the document gets output so the problems that caused by turning on compression are probably in that code some where.

The main issue I think with pdfkit as it stands right now is possibly this line:

from https://github.com/devongovett/pdfkit/blob/master/lib/document.coffee

  _write: (data) ->
    unless Buffer.isBuffer(data)
      data = new Buffer(data + '\n', 'binary') <-----

I'm not sure if there is any other way to do this in the code base yet but I think because every .write operation creates a new buffer (it seems) that you can't really write a big document with pdf kit without turning off compression and even if you turn off compression it's still a lot slower than it used to be.

In summary, i think there are two separate issues...

something goes awry when compression is turned on....
I think we need a way to not create a buffer for every single write to the document.

Obviously any insight into this definitely appreciated.

The text was updated successfully, but these errors were encountered:

ashelley · 2014-08-12T00:29:35Z

Looking more into this more..., I've instrumented my code to hack pdfkit's compression stream to detect when it's write state is over the high water mark to see if that would speed it it up.

version	render ms	write ms	bytes	pages
no hack	433082	2322	2638032	416
hack	124042	56	2638032	416
933aee (no hack)	16542	16542	2590486	416

As we can see from the above table that by respecting the highwatermark on the zlib compression stream we increase the speed with compression enabled.

Here is what my hack looks like:

function renderStuff() {
       // do some stufff
    if(doc.page.content.deflate._writableState.length >  doc.page.content.deflate._writableState.highWaterMark) {
          doc.page.content.deflate.once('drain', function() {
            setImmediate(function() {
                renderStuff() 
            });
          }
    } else {
        setImmedate(function() {
            renderStuff();
        })
    }   
}

without the hack it would look like this:

function renderStuff() {
               // do some stuff
        setImmedate(function() {
            renderStuff();
        })
}

I'll probably try to come up with some PDFKit specific test cases now that I know that I'm starting to get to the bottom of this.

One unfortunate thing is that I feel as if PDFKit might need a big refactor to enable asynchronous code to allow for these callbacks to happen. I also might look at the old implementation as well to see if I can use the old implementation that was high speed but still expose the readable stream interface more easily than with the current version.

ashelley · 2014-08-12T17:24:19Z

I've updated my report on node to include a test suite for testing just the zlib compression stream. It seems to have the same issue. You can view it here:

nodejs/node-v0.x-archive#8048 (comment)

If someone could run this test and verify they are seeing the same issues that would be much appreciated... thank you.

ashelley · 2014-08-13T21:03:25Z

@devongovett

Here is a proposed workaround to deal with the node steams bug mentioned above:

https://gist.github.com/ashelley/6cbad3f489499996a95e

Basically we have to queue writes to the zlib stream when compression is turned on and defer compressing the chunks until we can do them all at once.

Here are some numbers to show the improvement:

version	render ms	write ms	bytes	pages
with patch	28081	38	896822	469
without patch	51823	155	896822	469
933aee	15957	551	869524	469

version	render ms	write ms	bytes	pages
with patch	44786	23	2638032	416
without patch	376554	3506	2638032	416
933aee	19588	67346	2590486	416

Note that this doesn't fix overfilling the pipe'd stream mentioned here:

#269

Also note that I think the true fix is that pdfkit should possibly call the .drain event when needed on the zlib stream but I think that patch is non-trival.

ashelley · 2014-08-16T21:05:23Z

I just ran into a memory leak situation without the above patch:

nodejs/node-v0.x-archive#6623 (comment)

Adding the patch lets my program finish running. No minimum test case yet.

Kartoffelsalat · 2014-11-22T14:45:55Z

I took a look at the machine performance and i detected that the mem and cpu used, doesnt get released after rendering pdf.

floledermann · 2015-10-12T16:09:07Z

@devongovett any chance this may be fixed sometime? @ashelley did great work here analyzing the problem.

ashelley · 2015-10-12T16:40:15Z

hello @floledermann I did some more recent work on this discussed in this comment:
#269 (comment)

also note that the underlying streams issue was also recently discussed here but still no fix i stable (i don't think)

nodejs/node-v0.x-archive#6623 (comment)

csvan · 2016-10-26T07:45:48Z

Possibly related: bpampuch/pdfmake#280

The above also discusses a workaround by hacking Readable.

shivamshekhar · 2019-01-21T11:11:24Z

Hi.

Is this issue fixed in any updated version of PDFkit? Or is a fix coming anytime soon?

blikblum · 2019-03-03T01:04:38Z

Some performance related stuff was done in recent releases please test again and reopen if still a problem

ashelley mentioned this issue Aug 11, 2014

PDFDocument readable stream not respecting highWatermark on _write (push) #269

Open

pascoual mentioned this issue Oct 6, 2015

upgrade to PDFkit version 0.7.1 pascoual/meteor-pdfkit#18

Closed

blikblum closed this as completed Mar 3, 2019

jsoo1 mentioned this issue Apr 9, 2020

Performance poor in webworker using data uri pngs. #1102

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance slow with streams compared to old version #264

Performance slow with streams compared to old version #264

ashelley commented Aug 8, 2014

ashelley commented Aug 12, 2014

ashelley commented Aug 12, 2014

ashelley commented Aug 13, 2014

ashelley commented Aug 16, 2014

Kartoffelsalat commented Nov 22, 2014

floledermann commented Oct 12, 2015

ashelley commented Oct 12, 2015

csvan commented Oct 26, 2016

shivamshekhar commented Jan 21, 2019

blikblum commented Mar 3, 2019

Performance slow with streams compared to old version #264

Performance slow with streams compared to old version #264

Comments

ashelley commented Aug 8, 2014

ashelley commented Aug 12, 2014

ashelley commented Aug 12, 2014

ashelley commented Aug 13, 2014

ashelley commented Aug 16, 2014

Kartoffelsalat commented Nov 22, 2014

floledermann commented Oct 12, 2015

ashelley commented Oct 12, 2015

csvan commented Oct 26, 2016

shivamshekhar commented Jan 21, 2019

blikblum commented Mar 3, 2019