Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance slow with streams compared to old version #264

Closed
ashelley opened this issue Aug 8, 2014 · 10 comments
Closed

Performance slow with streams compared to old version #264

ashelley opened this issue Aug 8, 2014 · 10 comments

Comments

@ashelley
Copy link

ashelley commented Aug 8, 2014

Okay,

So I've really been battling performance with pdfkit recently. You can read a bit about my problems here:

#258

I decided to download a known fast version of pdfkit that i've worked with before that I knew was high performance. I picked an arbitrary version from around the time I last used pdfkit on a project. This means that versions after this are probably fast too but I didn't want to dig deep, I just wanted document the difference.

https://github.com/devongovett/pdfkit/tree/933aee7d6062c4f5c5a147c15ca3d999f224f0b9

I have a series of pdf documents that I create for my own testing purposes. I am generating these pdfs 3 times:

  1. current version pdf kit with compression on
  2. current version pdf kit with compression off
  3. old version 933aee with compression on (I think it was always on back then)

Note that I had to change my code slightly to work with the old version of pdf kit. Also note that this is the only change that I'm making to my program before running the tests.

-                       var PdfDocument = require('pdfkit');                    
+                       var PdfDocument = require('/var/www/git/pdfkit');  
...
-                       this.doc.pipe(this.stream);
...
-                       this.doc.end();
+                       var self = this;
                this.stream.on('finish', function() {
                    done(start);            
-               });                     
+               });                             
+                       this.doc.output(function(result) {
+                               self.stream.write(result, 'binary', function() {
+                                       self.stream.end();
+                               });
+                       });             
                }

The test documents produced are visually identical in all scenarios

Example 1:

version render ms write ms bytes pages
c/on 54 5 1148 1
c/off 54 3 1331 1
933aee 40 5 928 1

Example 2:

version render ms write ms bytes pages
c/on 14 13 1536 1
c/off 16 12 7795 1
933aee 14 1 1308 1

Example 3:

version render ms write ms bytes pages
c/on 26 27 1794 1
c/off 29 14 12810 1
933aee 16 5 1558 1

Example 4:

version render ms write ms bytes pages
c/on 445 454 15181 3
c/off 406 14 306804 3
933aee 144 165 14674 3

Example 5:

version render ms write ms bytes pages
c/on 148 197 9924 3
c/off 239 5 165274 3
933aee 95 13 9448 3

Example 6:

version render ms write ms bytes pages
c/on 244 15 21931 8
c/off 216 5 239602 8
933aee 80 20 21041 8

The problem seems to compound when you do large documents to the point where it becomes unusable:

Example 7:

version render ms write ms bytes pages
c/on 44261 339 896822 469
c/off 27733 56 13606075 469
933aee 12756 517 869524 469

Example 8:

version render ms write ms bytes pages
c/on 402392 (ouch) 65087 2638032 416
c/off 59203 152 56262905 (ouch) 416
933aee 14531 62721 2590486 416

One thing to note about the above numbers is that "write ms" actually means the time since we called doc.end() until the time we get stream.finish(). We can see by the above numbers that when compression turns on there seems to be some different way the document gets output so the problems that caused by turning on compression are probably in that code some where.

The main issue I think with pdfkit as it stands right now is possibly this line:

from https://github.com/devongovett/pdfkit/blob/master/lib/document.coffee

  _write: (data) ->
    unless Buffer.isBuffer(data)
      data = new Buffer(data + '\n', 'binary') <----- 

I'm not sure if there is any other way to do this in the code base yet but I think because every .write operation creates a new buffer (it seems) that you can't really write a big document with pdf kit without turning off compression and even if you turn off compression it's still a lot slower than it used to be.

In summary, i think there are two separate issues...

  1. something goes awry when compression is turned on....
  2. I think we need a way to not create a buffer for every single write to the document.

Obviously any insight into this definitely appreciated.

@ashelley
Copy link
Author

Looking more into this more..., I've instrumented my code to hack pdfkit's compression stream to detect when it's write state is over the high water mark to see if that would speed it it up.

version render ms write ms bytes pages
no hack 433082 2322 2638032 416
hack 124042 56 2638032 416
933aee (no hack) 16542 16542 2590486 416

As we can see from the above table that by respecting the highwatermark on the zlib compression stream we increase the speed with compression enabled.

Here is what my hack looks like:

function renderStuff() {
       // do some stufff
    if(doc.page.content.deflate._writableState.length >  doc.page.content.deflate._writableState.highWaterMark) {
          doc.page.content.deflate.once('drain', function() {
            setImmediate(function() {
                renderStuff() 
            });
          }
    } else {
        setImmedate(function() {
            renderStuff();
        })
    }   
}

without the hack it would look like this:

function renderStuff() {
               // do some stuff
        setImmedate(function() {
            renderStuff();
        })
}

I'll probably try to come up with some PDFKit specific test cases now that I know that I'm starting to get to the bottom of this.

One unfortunate thing is that I feel as if PDFKit might need a big refactor to enable asynchronous code to allow for these callbacks to happen. I also might look at the old implementation as well to see if I can use the old implementation that was high speed but still expose the readable stream interface more easily than with the current version.

@ashelley
Copy link
Author

I've updated my report on node to include a test suite for testing just the zlib compression stream. It seems to have the same issue. You can view it here:

nodejs/node-v0.x-archive#8048 (comment)

If someone could run this test and verify they are seeing the same issues that would be much appreciated... thank you.

@ashelley
Copy link
Author

@devongovett

Here is a proposed workaround to deal with the node steams bug mentioned above:

https://gist.github.com/ashelley/6cbad3f489499996a95e

Basically we have to queue writes to the zlib stream when compression is turned on and defer compressing the chunks until we can do them all at once.

Here are some numbers to show the improvement:

version render ms write ms bytes pages
with patch 28081 38 896822 469
without patch 51823 155 896822 469
933aee 15957 551 869524 469
version render ms write ms bytes pages
with patch 44786 23 2638032 416
without patch 376554 3506 2638032 416
933aee 19588 67346 2590486 416

Note that this doesn't fix overfilling the pipe'd stream mentioned here:

#269

Also note that I think the true fix is that pdfkit should possibly call the .drain event when needed on the zlib stream but I think that patch is non-trival.

@ashelley
Copy link
Author

I just ran into a memory leak situation without the above patch:

nodejs/node-v0.x-archive#6623 (comment)

Adding the patch lets my program finish running. No minimum test case yet.

@Kartoffelsalat
Copy link

I took a look at the machine performance and i detected that the mem and cpu used, doesnt get released after rendering pdf.

@floledermann
Copy link

@devongovett any chance this may be fixed sometime? @ashelley did great work here analyzing the problem.

@ashelley
Copy link
Author

hello @floledermann I did some more recent work on this discussed in this comment:
#269 (comment)

also note that the underlying streams issue was also recently discussed here but still no fix i stable (i don't think)

nodejs/node-v0.x-archive#6623 (comment)

@csvan
Copy link

csvan commented Oct 26, 2016

Possibly related: bpampuch/pdfmake#280

The above also discusses a workaround by hacking Readable.

@shivamshekhar
Copy link

Hi.

Is this issue fixed in any updated version of PDFkit? Or is a fix coming anytime soon?

@blikblum
Copy link
Member

blikblum commented Mar 3, 2019

Some performance related stuff was done in recent releases please test again and reopen if still a problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants