Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add level parameter to compress_content_streams #2044

Merged
merged 2 commits into from
Aug 2, 2023

Conversation

MartinThoma
Copy link
Member

@MartinThoma MartinThoma commented Jul 30, 2023

Provide more options / details on how to reduce the file size with compression.

See #1910

A small experiment

File    File name
size    The suffix is the compression level
-------------------
5321132 GeoTopo.pdf
9959402 out-0.pdf
5976025 out-1.pdf
5914204 out-2.pdf
5885818 out-3.pdf
5816263 out-4.pdf
5762359 out-5.pdf
5738259 out-6.pdf
5731877 out-7.pdf
5726121 out-8.pdf
5725267 out-9.pdf

Level 1 gives a very good improvement, but already level 2 might not be worth the CPU cycles.

Interestingly, the original is smaller than the best compression ⚠️

File    File name
size    The suffix is the compression level
-------------------
5321132 GeoTopo.pdf
9959402 out-0.pdf
5976025 out-1.pdf
5914204 out-2.pdf
5885818 out-3.pdf
5816263 out-4.pdf
5762359 out-5.pdf
5738259 out-6.pdf
5731877 out-7.pdf
5726121 out-8.pdf
5725267 out-9.pdf

Level 1 gives a very good improvement, but already level 2
might not be worth the CPU cycles

See #1910
@codecov
Copy link

codecov bot commented Jul 30, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (534c7b4) 94.17% compared to head (643533d) 94.17%.
Report is 1 commits behind head on main.

❗ Current head 643533d differs from pull request most recent head b247600. Consider uploading reports for the commit b247600 to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2044   +/-   ##
=======================================
  Coverage   94.17%   94.17%           
=======================================
  Files          41       41           
  Lines        7332     7332           
  Branches     1441     1441           
=======================================
  Hits         6905     6905           
  Misses        266      266           
  Partials      161      161           
Files Changed Coverage Δ
pypdf/_page.py 93.61% <100.00%> (ø)
pypdf/filters.py 94.30% <100.00%> (ø)
pypdf/generic/_data_structures.py 92.54% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

docs/user/file-size.md Outdated Show resolved Hide resolved
docs/user/file-size.md Outdated Show resolved Hide resolved
@MartinThoma MartinThoma merged commit e3707a1 into main Aug 2, 2023
@MartinThoma MartinThoma deleted the file-compression branch August 2, 2023 06:49
@MartinThoma
Copy link
Member Author

Thanks for the improvement suggestions @stefan6419846 and thank you for the review @pubpub-zz 🙏

MartinThoma added a commit that referenced this pull request Aug 6, 2023
### New Features (ENH)
-  Add `level` parameter to compress_content_streams (#2044)
-  Process /uniHHHH for text_extract (#2043)

### Bug Fixes (BUG)
-  Fix AnnotationBuilder.link (#2066)
-  JPX image without ColorSpace  (#2062)
-  Added check for field /Info when cloning reader document (#2055)
-  Fix indexed/CMYK images (#2039)

### Maintenance (MAINT)
-  Cryptography as primary dependency (#2053)

[Full Changelog](3.14.0...3.15.0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants