Broken characters when merging pages #1640

raphaelm · 2023-02-17T15:35:10Z

We are using PyPDF to implement a "n-up" feature in our application. With the upgrade from PyPDF 2.12.x to PyPDF 3.x and the fix for #1601, this now generally works again but breaks in funny ways with text.

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-6.0.9-arch1-1-x86_64-with-glibc2.36

$ python -c "import pypdf;print(pypdf.__version__)"
3.4.1

Code + PDF

This is a minimal, complete example that shows the issue:

from decimal import Decimal
from pypdf import PdfWriter, PdfReader, Transformation
from pypdf.generic import RectangleObject

mm = 72.0 / 2.54 * 0.1

out_pdf = PdfWriter()
page_1 = out_pdf.add_blank_page(210 * mm, 297 * mm)

in_pdf_1 = PdfReader('badges_3vjrh_7LXDZ_1-1.pdf')
in_page_1 = in_pdf_1.pages[0]
page_1.merge_page(in_page_1)

in_pdf_2 = PdfReader('badges_3vjrh_7LXDZ_2-1.pdf')
in_page_2 = in_pdf_2.pages[0]
in_page_2.add_transformation(Transformation().translate(0, +150 * mm))
in_page_2.mediabox = RectangleObject((
    Decimal('%.5f' % (in_page_2.mediabox.left.as_numeric())),
    Decimal('%.5f' % (in_page_2.mediabox.bottom.as_numeric() + 150 * mm)),
    Decimal('%.5f' % (in_page_2.mediabox.right.as_numeric() )),
    Decimal('%.5f' % (in_page_2.mediabox.top.as_numeric() + 150 * mm))
))
in_page_2.trimbox = in_page_2.mediabox
page_1.merge_page(in_page_2)

out_pdf.write('merge.pdf')

Input files:
badges_3vjrh_7LXDZ_1-1.pdf
badges_3vjrh_7LXDZ_2-1.pdf

Expected output

The expected output is this. We can obtain this output by using e.g. PyPDF 2.12.1:
pypdf2.pdf

Actual output

The actual output from PyPDF3 is this:
pypdf3.pdf

Note that the first line now reads "Hans-Jörgen" instead of "Hans-Jürgen"

MartinThoma · 2023-02-17T17:41:29Z

The actual output from PyPDF3 is this:
pypdf3.pdf

I guess you mean pypdf==3.4.1, right?
I'm asking because PyPDF3 is a completely different project.

fixed py-pdf#1640

pubpub-zz · 2023-02-17T18:22:33Z

Error found. not referening the good object. Funny effect and quite tricky to locate
I've produced the PR If you want to try

raphaelm · 2023-02-17T22:51:45Z

I guess you mean pypdf==3.4.1, right?
I'm asking because PyPDF3 is a completely different project.

Yes! Sorry.

Error found. not referening the good object. Funny effect and quite tricky to locate
I've produced the PR If you want to try

It's amazing how quick you are ❤️ Happy to test the PR on Monday!

raphaelm · 2023-02-21T15:41:14Z

Yup, PR seems to work for me! :)

Fixed #1640

MartinThoma · 2023-02-25T05:44:02Z

The fix was just merged and will be in pypdf>3.4.1 (this weekend on PyPI)

MartinThoma · 2023-02-25T05:44:23Z

Thank you for reporting it! If you want I can add you as a contributor: https://pypdf.readthedocs.io/en/latest/meta/CONTRIBUTORS.html

raphaelm · 2023-02-28T21:11:41Z

Nah, that's fine, but thanks! :)

raphaelm added a commit to pretix/pretix that referenced this issue Feb 17, 2023

Temporarily work around pypdf bug py-pdf/pypdf#1640

093eaac

pubpub-zz added a commit to pubpub-zz/pypdf that referenced this issue Feb 17, 2023

BUG : invalid font pointed during merge_resources

f5dbbe2

fixed py-pdf#1640

pubpub-zz mentioned this issue Feb 17, 2023

BUG: Invalid font pointed during merge_resources #1641

Merged

MartinThoma closed this as completed in #1641 Feb 25, 2023

MartinThoma pushed a commit that referenced this issue Feb 25, 2023

BUG : Invalid font pointed during merge_resources (#1641)

5173238

Fixed #1640

MartinThoma added the is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF label Mar 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Broken characters when merging pages #1640

Broken characters when merging pages #1640

raphaelm commented Feb 17, 2023 •

edited

Loading

MartinThoma commented Feb 17, 2023

pubpub-zz commented Feb 17, 2023

raphaelm commented Feb 17, 2023

raphaelm commented Feb 21, 2023

MartinThoma commented Feb 25, 2023

MartinThoma commented Feb 25, 2023

raphaelm commented Feb 28, 2023

Broken characters when merging pages #1640

Broken characters when merging pages #1640

Comments

raphaelm commented Feb 17, 2023 • edited Loading

Environment

Code + PDF

Expected output

Actual output

MartinThoma commented Feb 17, 2023

pubpub-zz commented Feb 17, 2023

raphaelm commented Feb 17, 2023

raphaelm commented Feb 21, 2023

MartinThoma commented Feb 25, 2023

MartinThoma commented Feb 25, 2023

raphaelm commented Feb 28, 2023

raphaelm commented Feb 17, 2023 •

edited

Loading