Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pdf files are HUGE #517

Closed
doutriaux1 opened this issue Jul 25, 2014 · 26 comments
Closed

pdf files are HUGE #517

doutriaux1 opened this issue Jul 25, 2014 · 26 comments
Assignees
Milestone

Comments

@doutriaux1
Copy link
Contributor

@aashish24 there must be a way for VTK to produce smaller pdfs no?

@doutriaux1 doutriaux1 added this to the 2.0.0 milestone Jul 25, 2014
@aashish24
Copy link
Contributor

I believe there is. I will ask David Lonie as he added that feature into the VTK.

@williams13 williams13 modified the milestones: 2.1, 2.0.0 Sep 3, 2014
@aashish24 aashish24 assigned alliepiper and unassigned aashish24 Sep 16, 2014
@alliepiper
Copy link
Contributor

Can someone point me towards a sample script to generate a problematic PDF? I have some ideas that might fix this...

@alliepiper
Copy link
Contributor

I just grabbed a random boxfill script, hopefully this is a good representation of the issue...? Also, "huge" isn't very descriptive -- How big are the files you are seeing, and how big are the references you're comparing them to?

I used:

import vcs, cdms2
cdmsfile = cdms2.open( 'clt.nc' )
data = cdmsfile('clt')
x=vcs.init()
t=x.gettemplate('default')
x.plot( data, t )
x.pdf("/tmp/test.pdf")

The generated file is 453k and breaks down into the following:

Component Size Percentage
Text 7k 1.5%
Continents (lines) 196k 43.3%
CLT data (polygons) 249k 55.0%

There's not a whole lot that can be done on the exporter side to tighten things up -- it's only drawing the visible primitives in the scene, and compresses the bulk of the data (the triangles) into a binary representation.

Suggestions to make them smaller:

  • The format is highly compressable: 90k (20% of 453k) with bzip2, 101k (22%) with gzip. Use one of these utilities to store the files when size matters.
  • Use coarser resolution for continent data. This isn't likely to make a huge difference, though.

@doutriaux1
Copy link
Contributor Author

you're right the size seems reasonable, again, i'll try to find the "complicated" example where the size was in 10s of Mb.

@aashish24
Copy link
Contributor

closing it now.. if we find that its an issue again. .then we will create a new one or reopen this one

@aashish24 aashish24 reopened this Feb 2, 2015
@aashish24
Copy link
Contributor

Re-opened just in case.

@doutriaux1 doutriaux1 modified the milestones: 2.2, 2.1 Feb 2, 2015
@alliepiper alliepiper removed their assignment Feb 2, 2015
@alliepiper
Copy link
Contributor

Unassigning myself -- can't do much about this without an example ;)

@aashish24
Copy link
Contributor

Right.. assigning it to Charles for now as we need to have an example.

@doutriaux1
Copy link
Contributor Author

all right data is at:
https://www.dropbox.com/home/HUGE%20PS
how to reproduce:
notice the time (over 9 minutes) .... and the errors messages
reassignnig to @dlonie now that he has something to work with.
Thanks

In [1]: import cdms2,vcs

In [2]: x=vcs.init()

In [3]: f=cdms2.open("pr_TRMMtS2_Jul99-08.nc")

In [4]: s=f("prMAXt")

In [5]: x.plot(s)
Out[5]: <vcs.displayplot.Dp at 0x7f90791d39b0>

In [6]: import time

In [7]: a=time.time();x.posts
x.postscript      x.postscript_old  

In [7]: a=time.time();x.postscript("test_slow_and_big");print time.time()-a
/lgm/uvcdat/2015-01-26/lib/python2.7/site-packages/vcs/VTKPlots.py:1371: UserWarning: the right_margin keyword for postscript has been deprecated in 2.0 and is being ignored
  warnings.warn("the right_margin keyword for postscript has been deprecated in 2.0 and is being ignored")
/lgm/uvcdat/2015-01-26/lib/python2.7/site-packages/vcs/VTKPlots.py:1373: UserWarning: the left_margin keyword for postscript has been deprecated in 2.0 and is being ignored
  warnings.warn("the left_margin keyword for postscript has been deprecated in 2.0 and is being ignored")
/lgm/uvcdat/2015-01-26/lib/python2.7/site-packages/vcs/VTKPlots.py:1375: UserWarning: the top_margin keyword for postscript has been deprecated in 2.0 and is being ignored
  warnings.warn("the top_margin keyword for postscript has been deprecated in 2.0 and is being ignored")
GL2PS info: OpenGL feedback buffer overflow
GL2PS info: OpenGL feedback buffer overflow
GL2PS info: OpenGL feedback buffer overflow
GL2PS info: OpenGL feedback buffer overflow
GL2PS info: OpenGL feedback buffer overflow
GL2PS info: OpenGL feedback buffer overflow
GL2PS info: OpenGL feedback buffer overflow
GL2PS info: OpenGL feedback buffer overflow
GL2PS info: OpenGL feedback buffer overflow
GL2PS info: OpenGL feedback buffer overflow
546.508140087

In [8]: 

@doutriaux1 doutriaux1 assigned alliepiper and unassigned doutriaux1 Feb 2, 2015
@doutriaux1
Copy link
Contributor Author

also file is 54Mb

@alliepiper
Copy link
Contributor

Took a look -- some first thoughts:

Size

Not much we can do here. Postscript is a verbose text format, and this script is pushing a 1441x401 grid into it. The data gets rendered as a set of triangles, so for that size grid, we get 1440 x 400 = 576000 quads, which is rendered as 2 x 576000 = 1,152,000 triangles. If you open the postscript file with a text editor, you'll see a ton of lines that look like:

[x1] [y1] [x2] [y2] [x3] [y3] T

which is the postscript command to draw a triangle between the specified vertices. We can count them with grep -E \ T$ test_slow_and_big.ps | wc -l, which gives me 1,152,452 "draw triangle" commands, which includes the background, boxfill, and scalar bar.

Incidentally, the file consists of 1,157,874 lines, which means the boxfill triangles are roughly 99.5% of the file.

tl;dr If the data is this big, expect a large postscript file. If you want the file to be smaller for transfer/storage, consider compressing it with gzip (goes to 6.2MB) or bzip2 (goes to 4.8MB). If you want to reduce the size of the data in the file, it will need to be downsampled prior to exporting. But in any case, if we specify >1M triangles to go into the file, they will require a lot of space.

Time

The overflow warnings are the result of GL2PS using too small of a memory buffer to hold the postscript instructions that it's building, and having to reallocate memory. When the buffer is exhausted like this, the exporter prints the "overflow" warning, doubles the buffer size, and restarts the render, so most of the time went into this export was spent in failed write attempts.

If we're going to be regularly exporting data this large, I can add an option to the VTK layer of the exporter to specify the initial buffer size -- this would potentially reduce the number of overflows and export attempts and speed things up by (in this particular case) a factor of ~10.

Big black square

For some reason, that postscript file is specifying every triangle as black, meaning that somehow the colors aren't being rendered in a way that makes it through to the OpenGL feedback buffer. I can look into this. However, be warned that once this is fixed, the file will get approximately 30% larger, since specifying colors for the triangles will need space, too.

In short:

  • The file size is appropriate for the volume of exported data and the verbosity of the format.
  • If this much data is going to be pushed into vector exporter regularly, I can make some tweaks to speed the export process up. Let me know if we need this.
  • The colors aren't making it through. I'll investigate.

@doutriaux1
Copy link
Contributor Author

@dlonie thanks for looking into this. I agree the size might be ps related, but pdf is about 5Mb. Didn't try svg.
About the black squares. I need to double check they might be correct, it maybe missing values.

@alliepiper
Copy link
Contributor

PDF uses compression internally -- it basically takes the draw instructions and runs them through gzip or similar before storing them in big binary blobs in the file. Hence why the pdf export is roughly the size of the gzip'd postscript format.

Postscript doesn't offer anything similar -- the format specifies raw text, and any compression has to happen externally, either through manual compression with gzip/bzip, or setting vtkGL2PSExporter::CompressOn() to produce a gzip'd file automatically.

SVG would be similar to postscript, maybe a little larger due to the xml markup overhead.

The image looks fine on-screen for me when rendered. I'm not sure why the color information is being lost between the rendering and OGL feedback buffer, though. Very odd.

@alliepiper
Copy link
Contributor

I dug around in our polydata rendering implementations, and the UVCDAT fork of VTK is still using OpenGL fixed pipeline APIs for rendering, which should definitely work with GL2PS. I'll investigate more deeply after we bump VTK to current master, as the rendering backends have been reworked a lot in recent months and may include a fix.

@doutriaux1
Copy link
Contributor Author

@dlonie on that note, @aashish24 stated on Monday that this VTK update should be done by today. Is that still valid? I'm really looking forward to seeing the bug fixes it's supposed to bring in.

@aashish24
Copy link
Contributor

VTK is now updated. @dlonie if you can fix/verify the color issue then we can close this issue.

@doutriaux1
Copy link
Contributor Author

hum... Before we close this issue can we implement the cache fix that @dlonie mentioned.

@alliepiper
Copy link
Contributor

Cache fix?

@doutriaux1
Copy link
Contributor Author

buffer fix, sorry multitasking here 😉

 I can add an option to the VTK layer of the exporter to specify the initial buffer size -- this would potentially reduce the number of overflows and export attempts and speed things up by (in this particular case) a factor of ~10.

@alliepiper
Copy link
Contributor

Got the buffer changes in locally, and started debugging the color issue.

@doutriaux1 Can you confirm that the postscript file resulting from this script is missing the color data? I've confirmed that we're feeding the colors into OpenGL correctly, but for some reason the colors in the feedback buffer (which GL2PS uses to build up the postscript file) are all rgba=(0,0,0,0).

This makes me think it might be a driver bug on my machine, since the color info disappears while OpenGL is processing the scene.

@alliepiper
Copy link
Contributor

I take that back -- the colors are coming out in the feedback buffer. gl2ps seems to be dropping them somewhere. I'll keep looking.

alliepiper pushed a commit that referenced this issue Feb 9, 2015
To prevent having to resize and rerender multiple times when a very large
amount of data is pushed through (see #517), start with a 50MB buffer.

This buffer size is sufficient to store the 1M triangles used in the #517
example.
alliepiper pushed a commit that referenced this issue Feb 9, 2015
To prevent having to resize and rerender multiple times when a very large
amount of data is pushed through (see #517), start with a 50MB buffer.

This buffer size is sufficient to store the 1M triangles used in the #517
example.
@alliepiper
Copy link
Contributor

@doutriaux1 @aashish24

#1016 fixes the buffer issue, and http://review.source.kitware.com/#/t/5443/
needs to be reviewed / merged into VTK.

Still looking into the color issue. It's very strange...

@alliepiper
Copy link
Contributor

I think I've figured out what's going on with the colors.

  1. They were being removed by the combination of sorting / occlusion culling. It seems that the triangles end up being so small that GL2PS's sorting/culling algorithms were having difficulties determining overlap and depth properly.

  2. Disabling sorting stops the colored primitives from being culled, but the triangles are still so tiny that many of the viewers I tried still render a big black rectangle due to how they sample/rasterize the image. Evince and okular both show a black rectangle, but acroread (after ps2pdf) and display (from imagemagick) both show the colored triangles along with the black ones. Turning sorting off reduced the export time significantly as well.

Since we're now running into viewer limitations, I think this is as good as it gets from the export side of things, short of rasterizing the boxfill data into the vector image.

@doutriaux1
Copy link
Contributor Author

ok, should I try and approve this ?

@alliepiper
Copy link
Contributor

Sure, #1016 can go in whenever. The only difference it should make is faster export times 👍

@alliepiper
Copy link
Contributor

Closing with a quote from "Limitations" section of the GL2PS website (http://www.geuz.org/gl2ps/), which summarizes the issue well:

"Rendering large and/or complicated scenes is slow and/or can lead to large output files. This is normal: vector-based images are not destined to replace bitmap images. They just offer an alternative when high quality (especially for 2D and small 3D plots) and ease of manipulation (how do you change the scale, the labels or the colors in a bitmap picture long after the picture was produced, and without altering its quality?) are important."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants