Skip to content

Commit

Permalink
DOC: Compression of content streams (#1040)
Browse files Browse the repository at this point in the history
  • Loading branch information
MartinThoma authored Jun 29, 2022
1 parent 08c54d9 commit f2ffa7a
Showing 1 changed file with 18 additions and 9 deletions.
27 changes: 18 additions & 9 deletions docs/user/file-size.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,17 +27,17 @@ with open("smaller-new-file.pdf", "wb") as fp:
```

It depends on the PDF how well this works, but we have seen an 86% file
reduction from 5.7 MB to 0.8 MB within a real PDF.
reduction (from 5.7 MB to 0.8 MB) within a real PDF.


## Remove images


```python
import PyPDF2
from PyPDF2 import PdfReader, PdfWriter

reader = PyPDF2.PdfReader("example.pdf")
writer = PyPDF2.PdfWriter()
reader = PdfReader("example.pdf")
writer = PdfWriter()

for page in reader.pages:
writer.add_page(page)
Expand All @@ -48,18 +48,27 @@ with open("out.pdf", "wb") as f:
writer.write(f)
```

## Compression
## Loss-less Compression

PyPDF2 supports the FlateDecode filter which uses the zlib/deflate compression
method. It is a loss-less compression, meaning the resulting PDF looks exactly
the same.

Deflate compression can be applied to a page via [`page.compress_content_streams`](https://pypdf2.readthedocs.io/en/latest/modules/PageObject.html#PyPDF2._page.PageObject.compress_content_streams):

```python
import PyPDF2
from PyPDF2 import PdfReader, PdfWriter

reader = PyPDF2.PdfReader("example.pdf")
writer = PyPDF2.PdfWriter()
reader = PdfReader("example.pdf")
writer = PdfWriter()

for page in reader.pages:
page.compress_content_streams()
page.compress_content_streams() # This is CPU intensive!
writer.add_page(page)

with open("out.pdf", "wb") as f:
writer.write(f)
```

Using this method, we have seen a reduction by 70% (from 11.8 MB to 3.5 MB)
with a real PDF.

0 comments on commit f2ffa7a

Please sign in to comment.