Skip to content

Commit

Permalink
DOC: File size reduction
Browse files Browse the repository at this point in the history
  • Loading branch information
MartinThoma committed Jun 28, 2022
1 parent a89ff74 commit 2c8914e
Showing 1 changed file with 27 additions and 0 deletions.
27 changes: 27 additions & 0 deletions docs/user/file-size.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,33 @@
There are multiple ways to reduce the size of a given PDF file. The easiest
one is to remove content (e.g. images) or pages.

## Removing duplication

Some PDF documents contain the same object multiple times. For example, if an
image appears three times in a PDF it could be embedded three times. Or it can
be embedded once and referenced twice.

This can be done by reading and writing the file:

```python
from PyPDF2 import PdfReader, PdfWriter

reader = PdfReader("big-old-file.pdf")
writer = PdfWriter()

for page in reader.pages:
writer.add_page(page)

writer.add_metadata(reader.metadata)

with open("smaller-new-file.pdf", "wb") as fp:
writer.write(fp)
```

It depends on the PDF how well this works, but we have seen an 86% file
reduction from 5.7 MB to 0.8 MB within a real PDF.


## Remove images


Expand Down

0 comments on commit 2c8914e

Please sign in to comment.