Output pdf has wrong colour, incorrect translation of markup, and wrong scaling #1615

zain910128 · 2023-02-07T04:00:59Z

I am converting a PDF to A4 paper size and the output pdf has several bugs:

Wrong colour of markup. Gone from Red to black.
Incorrect translation of markup. It is slightly to the right whereas it should be exactly overlapping the background grid.
Wrong scaling (see comment in the attached screenshot)
Lack of central alignment as per transformation in code. Expectation is that there should be equal gap on top and bottom of the page, but the output has only empty space on top.
The thick and thin lines of the background grid have been interchanged (strangest bug).

The output file is:
graph_letter_output.pdf

A side by side comparison is:

I am using pypdf-3.4.0 in Google Colab.
The code used is:

from pypdf import PdfReader, PdfWriter, Transformation, PageObject, PaperSize
from pypdf.generic import RectangleObject

reader = PdfReader("input.pdf")
page = reader.pages[0]
writer = PdfWriter()

A4_w = PaperSize.A4.width
A4_h = PaperSize.A4.height

# resize page to fit *inside* A4
h = float(page.mediabox.height)
w = float(page.mediabox.width)
scale_factor = min(A4_h / h, A4_w / w)
print(scale_factor)

transform = (
    Transformation()
    .scale(scale_factor, scale_factor)
    .translate(A4_w / 2 - w * scale_factor / 2, A4_h / 2 - h * scale_factor / 2)
)
page.add_transformation(transform)

page.cropbox = RectangleObject((0, 0, A4_w, A4_h))

# merge the pages to fit inside A4

# prepare A4 blank page
page_A4 = PageObject.create_blank_page(width=A4_w, height=A4_h)
page.mediabox = page_A4.mediabox
page_A4.merge_page(page)

writer.add_page(page_A4)
writer.write("output.pdf")

Please let me know if I am doing something wrong.

pubpub-zz · 2023-02-07T20:59:17Z

I've looked quickly and here are my conclusions:
a) you are using add_transformation which is only affecting the content and do not resize/repose the annotations (the red shapes/curves are actually annotations and not part of the original page) if you use add_transformation with pages part of a PdfWriter this should be fixed.
b) you have extended the cropping so more elements of the grid are shown:
c) about color change : I see no difference with acrobat reader. 🤔😑
d) (from my test) :
this is the code I propose

from pypdf import PdfReader, PdfWriter, Transformation, PageObject, PaperSize 
from pypdf.generic import RectangleObject 
 
reader = PdfReader("graph_letter.pdf") 
page = reader.pages[0] 
writer = PdfWriter() 
 
A4_w = PaperSize.A4.width 
A4_h = PaperSize.A4.height 
 
# resize page to fit *inside* A4 
h = float(page.mediabox.height) 
w = float(page.mediabox.width) 
scale_factor = min(A4_h/h, A4_w/w) 
print(scale_factor, A4_w/2 - w*scale_factor/2 , A4_h/2 - h*scale_factor/2) 

# prepare A4 blank page 
page_A4 = writer.add_blank_page(width = A4_w, height = A4_h) 
page_A4.merge_transformed_page( 
    page, 
    Transformation().scale(scale_factor,scale_factor).translate( 
        A4_w/2-w*scale_factor/2, 
        A4_h/2-h*scale_factor/2) 
    ) 
writer.write("graph_letter_output.pdf")

Other issue detected during the test the source Annots seems to be modified, not being able to merge into a new page.
Thanks for the new PR to be written 😀

zain910128 · 2023-02-13T02:48:52Z

Your suggested code works better now. Thank you.

However, the colour issue is still present.

Like you said, the colour is the same in Adobe Acrobat, but it is different in mac's Preview and also when i open the pdf in Chrome browser (colour is white) and also when i view in Google drive.

The input file does not have this problem so the output file is also expected to be the same.

pubpub-zz · 2023-02-13T17:11:38Z

from #1607 (comment) (@zain910128)
Sorry, i didn't get the chance to check earlier.
But I checked it now and the issue is partially resolved.

The original input file that i provided is now converted properly and looks fine in all PDF viewers.

Then i tried with a new input file which is very similar and has one extra page. Attached here for reference:
input.pdf

The output of this file has colours all wrong in mac's Preview and google drive and the browser's pdf viewer, but fine in Adobe acrobat.

So I think we have to reopen this issue.

This may be related to my other issue here:
#1615

fixes py-pdf#1615 "/N" attributes wrongly ignored

pubpub-zz · 2023-02-15T22:12:00Z

The color issue has been understand and fixed : During the copy, the field "/N" in the ICCBased attribute was not correctly copied (wrongly ignored)
This is fixed in #1635

zain910128 · 2023-02-20T04:52:57Z

Is this fix available in the latest version if i do a pip install ?

pubpub-zz · 2023-02-20T06:06:31Z

No it has still to be merged and released

MartinThoma · 2023-02-26T18:31:43Z

pypdf==3.50 was just released 🎉

pubpub-zz · 2023-02-26T18:35:05Z

pypdf==3.50 was just released 🎉

yes but the PR was not part of it... wait for next version 😉

MartinThoma · 2023-02-26T18:36:30Z

Oh, damn 😅 🙈 I'm sorry 🙈

* "/N" attributes wrongly ignored during copy process * The object referenced via `src[field]` needs to be a dictionary Fixes #1615, #1671 Fixes #1673

pubpub-zz added the is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF label Feb 8, 2023

This was referenced Feb 11, 2023

BUG: Switch from trimbox to cropbox when merging pages #1622

Merged

Can not merge multiple times the same pages with annots #1623

Closed

zain910128 mentioned this issue Feb 13, 2023

Output PDF has loss of data #1607

Closed

pubpub-zz added a commit to pubpub-zz/pypdf that referenced this issue Feb 15, 2023

ROB : some attributes not copied

1f0f102

fixes py-pdf#1615 "/N" attributes wrongly ignored

pubpub-zz mentioned this issue Feb 15, 2023

ROB: Some attributes not copied in DictionaryObject._clone #1635

Merged

MartinThoma closed this as completed in #1635 Mar 5, 2023

MartinThoma pushed a commit that referenced this issue Mar 5, 2023

ROB: Some attributes not copied in DictionaryObject._clone (#1635)

39f52dc

* "/N" attributes wrongly ignored during copy process * The object referenced via `src[field]` needs to be a dictionary Fixes #1615, #1671 Fixes #1673

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output pdf has wrong colour, incorrect translation of markup, and wrong scaling #1615

Output pdf has wrong colour, incorrect translation of markup, and wrong scaling #1615

zain910128 commented Feb 7, 2023 •

edited by MartinThoma

Loading

pubpub-zz commented Feb 7, 2023 •

edited by MartinThoma

Loading

zain910128 commented Feb 13, 2023

pubpub-zz commented Feb 13, 2023 •

edited

Loading

pubpub-zz commented Feb 15, 2023

zain910128 commented Feb 20, 2023

pubpub-zz commented Feb 20, 2023

MartinThoma commented Feb 26, 2023

pubpub-zz commented Feb 26, 2023

MartinThoma commented Feb 26, 2023

Output pdf has wrong colour, incorrect translation of markup, and wrong scaling #1615

Output pdf has wrong colour, incorrect translation of markup, and wrong scaling #1615

Comments

zain910128 commented Feb 7, 2023 • edited by MartinThoma Loading

pubpub-zz commented Feb 7, 2023 • edited by MartinThoma Loading

zain910128 commented Feb 13, 2023

pubpub-zz commented Feb 13, 2023 • edited Loading

pubpub-zz commented Feb 15, 2023

zain910128 commented Feb 20, 2023

pubpub-zz commented Feb 20, 2023

MartinThoma commented Feb 26, 2023

pubpub-zz commented Feb 26, 2023

MartinThoma commented Feb 26, 2023

zain910128 commented Feb 7, 2023 •

edited by MartinThoma

Loading

pubpub-zz commented Feb 7, 2023 •

edited by MartinThoma

Loading

pubpub-zz commented Feb 13, 2023 •

edited

Loading