Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output pdf has wrong colour, incorrect translation of markup, and wrong scaling #1615

Closed
zain910128 opened this issue Feb 7, 2023 · 9 comments · Fixed by #1635
Closed

Output pdf has wrong colour, incorrect translation of markup, and wrong scaling #1615

zain910128 opened this issue Feb 7, 2023 · 9 comments · Fixed by #1635
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF

Comments

@zain910128
Copy link

zain910128 commented Feb 7, 2023

I am converting a PDF to A4 paper size and the output pdf has several bugs:

  1. Wrong colour of markup. Gone from Red to black.
  2. Incorrect translation of markup. It is slightly to the right whereas it should be exactly overlapping the background grid.
  3. Wrong scaling (see comment in the attached screenshot)
  4. Lack of central alignment as per transformation in code. Expectation is that there should be equal gap on top and bottom of the page, but the output has only empty space on top.
  5. The thick and thin lines of the background grid have been interchanged (strangest bug).

The input file is:
graph_letter.pdf

The output file is:
graph_letter_output.pdf

A side by side comparison is:
comparison

I am using pypdf-3.4.0 in Google Colab.
The code used is:

from pypdf import PdfReader, PdfWriter, Transformation, PageObject, PaperSize
from pypdf.generic import RectangleObject

reader = PdfReader("input.pdf")
page = reader.pages[0]
writer = PdfWriter()

A4_w = PaperSize.A4.width
A4_h = PaperSize.A4.height

# resize page to fit *inside* A4
h = float(page.mediabox.height)
w = float(page.mediabox.width)
scale_factor = min(A4_h / h, A4_w / w)
print(scale_factor)

transform = (
    Transformation()
    .scale(scale_factor, scale_factor)
    .translate(A4_w / 2 - w * scale_factor / 2, A4_h / 2 - h * scale_factor / 2)
)
page.add_transformation(transform)

page.cropbox = RectangleObject((0, 0, A4_w, A4_h))

# merge the pages to fit inside A4

# prepare A4 blank page
page_A4 = PageObject.create_blank_page(width=A4_w, height=A4_h)
page.mediabox = page_A4.mediabox
page_A4.merge_page(page)

writer.add_page(page_A4)
writer.write("output.pdf")

Please let me know if I am doing something wrong.

@pubpub-zz
Copy link
Collaborator

pubpub-zz commented Feb 7, 2023

I've looked quickly and here are my conclusions:
a) you are using add_transformation which is only affecting the content and do not resize/repose the annotations (the red shapes/curves are actually annotations and not part of the original page) if you use add_transformation with pages part of a PdfWriter this should be fixed.
b) you have extended the cropping so more elements of the grid are shown:
c) about color change : I see no difference with acrobat reader. 🤔😑
d) (from my test) :
this is the code I propose

from pypdf import PdfReader, PdfWriter, Transformation, PageObject, PaperSize 
from pypdf.generic import RectangleObject 
 
reader = PdfReader("graph_letter.pdf") 
page = reader.pages[0] 
writer = PdfWriter() 
 
A4_w = PaperSize.A4.width 
A4_h = PaperSize.A4.height 
 
# resize page to fit *inside* A4 
h = float(page.mediabox.height) 
w = float(page.mediabox.width) 
scale_factor = min(A4_h/h, A4_w/w) 
print(scale_factor, A4_w/2 - w*scale_factor/2 , A4_h/2 - h*scale_factor/2) 

# prepare A4 blank page 
page_A4 = writer.add_blank_page(width = A4_w, height = A4_h) 
page_A4.merge_transformed_page( 
    page, 
    Transformation().scale(scale_factor,scale_factor).translate( 
        A4_w/2-w*scale_factor/2, 
        A4_h/2-h*scale_factor/2) 
    ) 
writer.write("graph_letter_output.pdf") 

Other issue detected during the test the source Annots seems to be modified, not being able to merge into a new page.
Thanks for the new PR to be written 😀

@pubpub-zz pubpub-zz added the is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF label Feb 8, 2023
@zain910128
Copy link
Author

Your suggested code works better now. Thank you.

However, the colour issue is still present.

Like you said, the colour is the same in Adobe Acrobat, but it is different in mac's Preview and also when i open the pdf in Chrome browser (colour is white) and also when i view in Google drive.

The input file does not have this problem so the output file is also expected to be the same.

@pubpub-zz
Copy link
Collaborator

pubpub-zz commented Feb 13, 2023

from #1607 (comment) (@zain910128)
Sorry, i didn't get the chance to check earlier.
But I checked it now and the issue is partially resolved.

The original input file that i provided is now converted properly and looks fine in all PDF viewers.

Then i tried with a new input file which is very similar and has one extra page. Attached here for reference:
input.pdf

The output of this file has colours all wrong in mac's Preview and google drive and the browser's pdf viewer, but fine in Adobe acrobat.

So I think we have to reopen this issue.

This may be related to my other issue here:
#1615

pubpub-zz added a commit to pubpub-zz/pypdf that referenced this issue Feb 15, 2023
fixes  py-pdf#1615

"/N" attributes wrongly ignored
@pubpub-zz
Copy link
Collaborator

The color issue has been understand and fixed : During the copy, the field "/N" in the ICCBased attribute was not correctly copied (wrongly ignored)
This is fixed in #1635

@zain910128
Copy link
Author

Is this fix available in the latest version if i do a pip install ?

@pubpub-zz
Copy link
Collaborator

No it has still to be merged and released

@MartinThoma
Copy link
Member

pypdf==3.50 was just released 🎉

@pubpub-zz
Copy link
Collaborator

pypdf==3.50 was just released 🎉

yes but the PR was not part of it... wait for next version 😉

@MartinThoma
Copy link
Member

Oh, damn 😅 🙈 I'm sorry 🙈

MartinThoma pushed a commit that referenced this issue Mar 5, 2023
* "/N" attributes wrongly ignored during copy process
* The object referenced via `src[field]` needs to be a dictionary

Fixes #1615, #1671
Fixes #1673
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants