-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix merge of a cropped page #879
Conversation
tracked in py-pdf#636 thee smaller box between cropBox is and trimBox(== mediaBox by default) is used
Codecov Report
@@ Coverage Diff @@
## main #879 +/- ##
=======================================
Coverage 92.15% 92.16%
=======================================
Files 24 24
Lines 4948 4950 +2
Branches 1024 1025 +1
=======================================
+ Hits 4560 4562 +2
Misses 244 244
Partials 144 144
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
@MartinThoma , |
@pubpub-zz I simply didn't have the time to review it so far 😅 |
😳 |
I've just fixed the merge conflicts I've caused. For the review, I still need to do those things:
This takes some time. Getting the 2.0.0-dev branch back into main has priority to me as this unblocks other changes. Still, I'm confident that this PR will be handled in the next 4 weeks. |
I was reading "10.10.1 Page Boundaries":
There is also a pretty helpful image. |
This PR changes the behavior of It changes which rectangle is appened to the path ( If the page that is about to be merged has a crop box that is smaller in any dimension than its trim box, the crop box will be added. Otherwise the trim box will be added. |
@pubpub-zz Why don't we simply always take the cropbox? |
Pdfjam seems to also use the cropbox, but the positioning is different (I have no clue what pdfjam does there; PyPDF2 pins the mediabox to the lower-left corner. PyPDF2 essentially just uses the same coordinate system to overlay):
Pdftk also seems to use the cropbox, but the positioning is different from PyPDF2 and pdfjam (pinning the merged pages cropbox to the upper right corner):
|
@MartinThoma |
The part that I'm really confused about is what if the crop box and the trim box are completely disjunct? If they don't overlap at all? What result would you expect? It seems weird to me to switch from trim box to crop box just because one of the dimensions of the crop box is smaller. What is the logic behind that? |
By the way, this is how I checked the results: from PyPDF2 import PdfReader, PdfWriter
from PyPDF2.generic import RectangleObject
reader = PdfReader("box.pdf")
crop_page = reader.pages[0]
print(crop_page.mediabox)
print(crop_page.cropbox)
print(crop_page.trimbox)
writer = PdfWriter()
crop_page.cropbox = RectangleObject((0, 0, 400, 400))
writer.add_page(crop_page)
with open("git-cropped.pdf", "wb") as fp:
writer.write(fp)
reader = PdfReader("crazyones.pdf")
crazy_page = reader.pages[0]
crazy_page.merge_page(crop_page)
writer = PdfWriter()
writer.add_page(crazy_page)
with open("merged-pypdf2-crop.pdf", "wb") as fp:
writer.write(fp) |
@pubpub-zz I'm sorry that I didn't merge this PR so far. I'm simply really uncertain if this is doing the right thing. The test also currently succeeds (without your change). This makes it hard for me to judge. Let's make an example: Scenario 1Assume the gray part is where we actually have content. The green rectangle is the cropbox and the red one is the trimbox. What should we use for Scenario 2The green rectangle is the cropbox and the red one is the trimbox. What should we use for Scenario 3The green rectangle is the trimbox and the red one is the cropbox. What should we use for |
New Features (ENH): - Add PdfReader.xfa attribute (#1026) Bug Fixes (BUG): - Wrong page inserted when PdfMerger.merge is done (#1063) - Resolve IndirectObject when it refers to a free entry (#1054) Developer Experience (DEV): - Added {posargs} to tox.ini (#1055) Maintenance (MAINT): - Remove PyPDF2._utils.bytes_type (#1053) Testing (TST): - Scale page (indirect rect object) (#1057) - Simplify pathlib PdfReader test (#1056) - IndexError of VirtualList (#1052) - Invalid XML in xmp information (#1051) - No pycryptodome (#1050) - Increase test coverage (#1045) Code Style (STY): - DOC of compress_content_streams (#1061) - Minimize diff for #879 (#1049) Full Changelog: 2.4.1...2.4.2
tracked in #636
thee smaller box between cropBox is and trimBox(== mediaBox by default) is used