Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative to add_transformation translate #1426

Closed
felle9900 opened this issue Nov 7, 2022 · 56 comments · Fixed by #1567
Closed

Alternative to add_transformation translate #1426

felle9900 opened this issue Nov 7, 2022 · 56 comments · Fixed by #1567

Comments

@felle9900
Copy link

I'm trying to update a older pypdf2 program I've made that can step and repeat several pdf-files on a bigger pdf page.

But the current method seams to work on a bit primitive way:

page_box = reader.pages[0]
page_box.add_transformation(Transformation().rotate(0).translate(tx=50, ty=0))

Keeps translating 50 pt between every pdf im placing. Thats not what Im looking for :)

Any way of doing it the old way with dedicated x1, y,1, x2, y2 values instead?
I used to use the mergeRotatedTranslatedPage()

@pubpub-zz
Copy link
Collaborator

you should have a look at #558 (comment)

@felle9900
Copy link
Author

Well the #558 just mentions the the code I've already listed.
It don't work in a loop

@felle9900
Copy link
Author

Maybe this can explain it a bit better:

# x1, y1, x2, y2 of impositions in Milimeter. Real list have 16 sets of coords.
all_coords = [[47.5, 42.5, 132.5, 97.5], [137.5, 42.5, 222.5, 97.5], [227.5, 42.5, 312.5, 97.5]]

# SRA3 sheet (450 x 320 millimeter)
reader_base = PdfReader("test_files/Blank_sheet_450x320.pdf")
page_base = reader_base.pages[0]

# businesscard to be placed many times on the big sheet.
reader = PdfReader("businesscard.pdf")

for coord in all_coords:
    page_box = reader.pages[0]
    x1 = int(points(coord[0])) # temp set as int to not upset adobe acrobat
    y1 = int(points(coord[1]))
    x2 = int(points(coord[2]))
    y2 = int(points(coord[3]))
    page_box.add_transformation(Transformation().rotate(0).translate(tx=x1, ty=y1))
    page_base.merge_page(page_box)

writer = PdfWriter()
writer.add_page(page_base)
with open("Merge_test.pdf", "wb") as fp:
    writer.write(fp)


@felle9900
Copy link
Author

Ok I solved my problem.

My solution was to keep changing the translate(tx, ty) in each loop.

# First imposition
if i == 0:
    column = points(coord[0]) - media_trim_diff
    row = points(coord[1]) - media_trim_diff

# First imposition in a NEW row
elif i % COLUMNS == 0:
    column = -3 * (TRIM_WIDTH + GAP)
    row = TRIM_HEIGHT + GAP

 # all the rest
else:
    column = TRIM_WIDTH + GAP
    row = points(0)

# page_box.add_transformation(Transformation().rotate(0).translate(tx=column, ty=row))

After that I move the trimbox because that's the only thing that does not get moved with the translate() automatically.

@felle9900
Copy link
Author

For some reason it breaks if i want to use different page numbers to impose.

@pubpub-zz
Copy link
Collaborator

just note that the add_transformation will modify page_box so each transformation needs to be relative to previous one

@pubpub-zz
Copy link
Collaborator

For some reason it breaks if i want to use different page numbers to impose.

can you please clarify

@felle9900
Copy link
Author

Yes so im placing the same pdf (a businesscard) on a bigger pdf.
Everything is fine as long as its the same page of the businesscard im placing.
The placement (transmute) is behaving as expecting, and the trimbox is also behaving right.

But if I mix the pages (not the same page of the pdf), the pages are then moved way off like there's something not resetting right.

@felle9900
Copy link
Author

from PyPDF2 import PdfReader, PdfWriter, Transformation

def mm(my_input):
    output = round(my_input / 72 * 25.4, 1)
    return int(output)

def points(my_input):
    output = my_input * 2.83464567
    return output


GAP = points(5)
COLUMNS = 4
ROWS = 4
TRIM_WIDTH = points(85)
TRIM_HEIGHT = points(55)

# x1, y1, x2, y2, scale, page_nr (index_nr)
all_coords = [
                [47.5, 42.5, 132.5, 97.5, 1, 0],
                [137.5, 42.5, 222.5, 97.5, 1, 0],
                [227.5, 42.5, 312.5, 97.5, 1, 0],
                [317.5, 42.5, 402.5, 97.5, 1, 0],
                [47.5, 102.5, 132.5, 157.5, 1, 0],
                [137.5, 102.5, 222.5, 157.5, 1, 0],
                [227.5, 102.5, 312.5, 157.5, 1, 0],
                [317.5, 102.5, 402.5, 157.5, 1, 0],
                [47.5, 162.5, 132.5, 217.5, 1, 0],
                [137.5, 162.5, 222.5, 217.5, 1, 0],
                [227.5, 162.5, 312.5, 217.5, 1, 0],
                [317.5, 162.5, 402.5, 217.5, 1, 0],
                [47.5, 222.5, 132.5, 277.5, 1, 0],
                [137.5, 222.5, 222.5, 277.5, 1, 0],
                [227.5, 222.5, 312.5, 277.5, 1, 0],
                [317.5, 222.5, 402.5, 277.5, 0, 0]
            ]

# big sheet
reader_base = PdfReader("test_files/Blank_sheet_450x320.pdf")
page_base = reader_base.pages[0]

# pdf to impose on the big sheet
reader = PdfReader("test_files/Mobildisko-visitkort.pdf")

# difference between the imposed mediabox and trimbox
media_trim_diff = float((reader.pages[0].mediabox.right - reader.pages[0].trimbox.right))

# trimbox needs to be expanded 2.5 mm on all 4 sides after been moved, so we can se the cropmarks for cutting
trimbox_expanding = int(points(2.5))

for i, coord in enumerate(all_coords):
    page_box = reader.pages[0]

    x1 = points(coord[0])
    y1 = points(coord[1])
    x2 = points(coord[2])
    y2 = points(coord[3])

    # First imposition
    if i == 0:
        column = points(coord[0]) - media_trim_diff
        row = points(coord[1]) - media_trim_diff

    # First imposition in a NEW row
    elif i % COLUMNS == 0:
        column = -3 * (TRIM_WIDTH + GAP)
        row = TRIM_HEIGHT + GAP

    # all the rest
    else:
        column = TRIM_WIDTH + GAP
        row = points(0)

    # move the mediabox and most of the content it is placed correctly, but the viewbox needs to be moved (trimbox)
    page_box.add_transformation(Transformation().rotate(0).translate(tx=column, ty=row))

    # move the trimbox before the expanding
    if GAP == points(0):
        # This is currently not used/working atm
        print("GAP is 0")
        page_box.trimbox.left = x1# - (media_trim_diff / 2)
        page_box.trimbox.bottom = y1# - (media_trim_diff / 2)
        page_box.trimbox.right = x2# - (media_trim_diff / 2)
        page_box.trimbox.top = y2# - (media_trim_diff / 2)

    if GAP == points(5):
        # this is working
        # moving the trimbox
        print("GAP is 5 millimeter")
        page_box.trimbox.left = x1
        page_box.trimbox.bottom = y1
        page_box.trimbox.right = x2
        page_box.trimbox.top = y2

        # expanding the trimbox
        page_box.trimbox.left = float(page_box.trimbox.left - trimbox_expanding)
        page_box.trimbox.bottom = float(page_box.trimbox.bottom - trimbox_expanding)
        page_box.trimbox.right = float(page_box.trimbox.right + trimbox_expanding)
        page_box.trimbox.top = float(page_box.trimbox.top + trimbox_expanding)

    page_base.merge_page(page_box)

# Write the result back
writer = PdfWriter()
writer.add_page(page_base)
with open("Merged_translated_rotated.pdf", "wb") as fp:
    writer.write(fp)

@pubpub-zz
Copy link
Collaborator

can you provide your failing please blank page

@felle9900
Copy link
Author

Here is the pdf files I use:
Blank_sheet_450x320.pdf
Mobildisko-visitkort.pdf

The code I posted above should work and create a 4column, 4 row pdf.

If you change the following code:
page_box = reader.pages[0]

to this code:

if i % 2 == 0: # every 2nd loop
    page_box = reader.pages[0]
else:
    page_box = reader.pages[1]

Id should now be messed up. but when you look in outline mode in adobe illustrator you can se it will place the correct pdf pages, but the placement (mediabox) is wrong.

@pubpub-zz
Copy link
Collaborator

@MartinThoma / @MasterOdin,
Looking at this usecase, reintroducing the mergeTransformedPage (renamed into merge_transformed_page) sounds as the best option. Your opinion ?

@MartinThoma
Copy link
Member

Huh, interesting. I don't understand yet why the issue occurs. It sounds like a bug and thus it would be preferable to fix it. But re-introducing the old (working) functions as an intermediate solution would be OK to me.

We would need to document that issue for the new functions though

@felle9900
Copy link
Author

Any news on this problem ?

@pubpub-zz
Copy link
Collaborator

lost in the fifo... will come back on it this week-end

@felle9900
Copy link
Author

Still no update?

@felle9900
Copy link
Author

Could we please reintroduce the "mergeRotatedTranslatedPage" class and make it take normal cords and not the tx, ty.

The current functionality breaks when I try to rotate or try mix page numbers.
Please I'm stuck with the current classes - It used to work so good before.

@MartinThoma
Copy link
Member

MartinThoma commented Jan 21, 2023

It's hard for me to understand the issue as the information is scattered in this thread.

Could you maybe adjust the first comment in this ticket to contain all the information?

A great bug ticket follows this pattern:

1. What I did (as short as possible, but complete - including the full code necessary to re-produce, the PDF used as input, and the versions of the all libraries being used)
2. What I wanted to achieve
3. What happened instead
4. For this issue: The latest version of PyPDF2 that worked as you expected with the same code as mentioned in (1)
5. Really awesome would be a test that fails for the new (broken) code and works with the old code

I'm open to a PR re-introducing mergeTransformedPage with the old way it worked for as long as this issue exists. But I need a way to check if it (still) exists so that we can deprecate it at some point.

@pubpub-zz
Copy link
Collaborator

@MartinThoma
the PR is in progress should come soone

@pubpub-zz
Copy link
Collaborator

@felle9900,
you should be able to test the PR
here is an code example

import pypdf
r1=pypdf.PdfReader("resources/labeled-edges-center-image.pdf")
w = pypdf.PdfWriter()
r2=pypdf.PdfReader("resources/box.pdf")
w.append(r1)  # to add the page
w.pages[0].merge_transformed_page(r2.pages[0],pypdf.Transformation().scale(2).rotate(45).translate(100,100),False,False)
w.pages[0].merge_transformed_page(r2.pages[0],pypdf.Transformation().scale(2).rotate(45).translate(200,200),False,False)
w.write("output.pdf")

still some clean-up (mypy) and testing to be done

@felle9900
Copy link
Author

I've just upgraded pypdf to version 3.3.0 to test that code.
It tells me: AttributeError: 'PageObject' object has no attribute 'merge_transformed_page'. Did you mean: 'mergeTransformedPage'?

Did I miss anything ? ( I used my own pdf files)

@pubpub-zz
Copy link
Collaborator

you have to copy the modifed files from the PR

@felle9900
Copy link
Author

Hmm can't se a ez way to download the 9 files. I'm not gonna go thru it manually so I think Ill just wait for it to be implemented.
Thanks a lot for the work.

@MartinThoma
Copy link
Member

MartinThoma commented Jan 29, 2023

@felle9900 It is implemented in #1567 . We just need somebody to check if it worked as expected.

You can do it like this:

# Go into a clean directory
mkdir issue-1426
cd issue-1426
#... add your script in the directory

# Create and load a virtual environment:
python -m venv venv
source venv/bin/activate

# get the modified code
git clone https://github.com/pubpub-zz/PyPDF2.git
cd PyPDF2
git checkout -b pubpub-zz-merge_trsf_page main
git pull [email protected]:pubpub-zz/PyPDF2.git merge_trsf_page

# Install the modified version
pip install -e .

# Execute your script

@felle9900
Copy link
Author

I tried to follow along but the line:
git pull [email protected]:pubpub-zz/PyPDF2.git merge_trsf_page
made an error in my terminal ending with:

[email protected]: Permission denied (publickey).
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.

@MartinThoma
Copy link
Member

Uh, right, you need the https URL instead of the git one

@felle9900
Copy link
Author

I did the venv to clone the PyPDF2 in the directory (should it not be pypdf btw?).

Now the code don't recognize "pypdf" so i changed them to "PyPDF2", but I then get a error that PyPDF2 does not have a method called "merge_transformed_page"

@felle9900
Copy link
Author

Ok now I got it working - why is the placed pdf pulled in cropped to the trim box? shouldn't it import the whole media box size?

@felle9900
Copy link
Author

There is a problem when I'm placing several impositions (businesscard pdf) on my big sheet-pdf.
I'm looping over 20 coords I have in a list and calculate the tx and ty for each placement.
The tx and ty are correct, but some weird stuff is happening where its not updating correctly so only the first placement is correct the rest is being placed way off to the left of the sheet-pdf, and the following placements on that row are placed on top of each other. It looks very much like the same error as before.

@pubpub-zz
Copy link
Collaborator

I dislike the idea to add extra parameters: The best for me is to adjust/modify the boxes in the source page before inserting.

@pubpub-zz
Copy link
Collaborator

pubpub-zz commented Jan 30, 2023

I've successfully got this result (requires latest fix):
test card:
visitcard.pdf

the code

import pypdf

r = PdfReader("visitcard.pdf")
w = pypdf.PdfWriter()
w.add_blank_page(pypdf.PaperSize.A6.width, pypdf.PaperSize.A6.height)
for x in range(4):
    for y in range(7):
        w.pages[0].merge_translated_page(
            r.pages[0],
            x * r.pages[0].trimbox[2],
            y * r.pages[0].trimbox[3],
            True,
            True,
        )
w.write("tt.pdf")

the output
tt.pdf

@felle9900
Copy link
Author

Ok this is pretty cool, Could I use 450 x 320 mm instead of "A6" somehow ?

I managed to crop my pdf the way i like (trimbox+5mm) = It displays correctly

I also managed to test that it will place different page numbers from the "visitcard" - tested by using randint(0,1)

But there is a big bug I can't get past:
The merge_translated_page() uses the mediabox for the translate, even if I change that before translating.

If you swap out your "visitcard.pdf" with my card
Mobildisko-visitkort.pdf
tt.pdf

@felle9900
Copy link
Author

At the current state you can't translate less than the mediabox. Maybe that is hardcoded somewhere?

@MartinThoma
Copy link
Member

Ok this is pretty cool, Could I use 450 x 320 mm instead of "A6" somehow ?

Not really as all depends on the user_unit of the document. It's typically 1/72 inch which is about 0.352806mm.
That means the dimensions you need would be (in default user units): 450/0.352806 ~= 1275 and 320/0.352806 = 907

@felle9900
Copy link
Author

Ok cool I got it, thanks.

What about the translate bug ?

@felle9900
Copy link
Author

Take look on this code, there's some weird stuff going on. Only the first imposition is cropped correctly (bottom left)
Rest is placed correctly but the cropping is off.

from pypdf import PdfReader, PdfWriter, Transformation, PaperSize

def mm(my_input):
    output = round(my_input / 72 * 25.4, 1)
    return int(output)

def points(my_input):
    output = my_input * 2.83464567
    return output


GAP = points(5)
COLUMNS = 4
ROWS = 5
TRIM_WIDTH = points(85)
TRIM_HEIGHT = points(55)

all_page_numbers = [1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

sheet = PdfReader("test_files/Blank_sheet_450x320.pdf")
imposition = PdfReader("test_files/Mobildisko-visitkort.pdf")

# create write object (sheet)
write_object = PdfWriter()
#write_object.append(sheet)

write_object.add_blank_page(PaperSize.A6.width, PaperSize.A6.height)
#write_object.add_blank_page(points(650), points(320))

# difference between the imposition mediabox and trimbox
media_trim_diff = float((imposition.pages[0].mediabox.right - imposition.pages[0].trimbox.right))

# trimbox needs to be expanded 2.5 mm on all 4 sides after been moved, so we can se the cropmarks for cutting
trimbox_expanding = int(points(2.5))

imposition_index = 0
for x in range(COLUMNS):
    for y in range(ROWS):

        page_nr = all_page_numbers[imposition_index]
        print("imposition_index:", imposition_index, "page_nr", page_nr)

        # expanding the trimbox
        imp_page = imposition.pages[page_nr]
        imp_page.trimbox.left = float(imp_page.trimbox.left - trimbox_expanding)
        imp_page.trimbox.bottom = float(imp_page.trimbox.bottom - trimbox_expanding)
        imp_page.trimbox.right = float(imp_page.trimbox.right + trimbox_expanding)
        imp_page.trimbox.top = float(imp_page.trimbox.top + trimbox_expanding)


        write_object.pages[0].merge_translated_page(
            imp_page,
            x * TRIM_WIDTH + trimbox_expanding,# x * imposition.pages[0].trimbox[2]
            y * TRIM_HEIGHT + trimbox_expanding,# y * imposition.pages[0].trimbox[3]
            True,
            True,
        )
        imposition_index += 1
write_object.write("tt.pdf")

@pubpub-zz
Copy link
Collaborator

Ok this is pretty cool, Could I use 450 x 320 mm instead of "A6" somehow ?

Not really as all depends on the user_unit of the document. It's typically 1/72 inch which is about 0.352806mm. That means the dimensions you need would be (in default user units): 450/0.352806 ~= 1275 and 320/0.352806 = 907

In the test code I've produced, I've set the expand to true : the boxes are expanded : I've used A6 to start with but the final size is far much more bigger

@pubpub-zz
Copy link
Collaborator

Take look on this code, there's some weird stuff going on. Only the first imposition is cropped correctly (bottom left)
Rest is placed correctly but the cropping is off.

I'm confused about your code : Why are you change the trimbox every cycle : you should modify it once and the box is applied.

However reviewing the code I agree that there is something odd (even in the old code) : the cropping is done based on the trim box instead of the crop box (which define the clipping for display and printing)
@MartinThoma before commiting the change can you give me your opinion about it ?

@MartinThoma
Copy link
Member

I'm sorry, I don't understand the question @pubpub-zz . What do you want to know?

@MartinThoma
Copy link
Member

At the current state you can't translate less than the mediabox. Maybe that is hardcoded somewhere?
@felle9900 The transformations are not applied to the boxes (mediabox / trimbox / cropbox). That means if you translate the content out of the mediabox, you will no longer see the content.

This behavior is often confusing for people, but I'm uncertain about the best way to improve it. Maybe adding a parameter transform_boxes: bool=False to add_transformation? But what would you expect if a translation is happening?

A method fit_boxes_to_content() might be desirable.

@pubpub-zz
Copy link
Collaborator

pubpub-zz commented Jan 31, 2023

I'm sorry, I don't understand the question @pubpub-zz . What do you want to know?

Currently merge_transformed_page crops the content to trimbox whereas pdf reference states that the cropping should be done based on cropbox. for me, merge_transformed_page is buggy. Do you confirm my analysis?

@pubpub-zz
Copy link
Collaborator

pubpub-zz commented Jan 31, 2023

A method fit_boxes_to_content() might be desirable.

this my be very tough to implement...😕

@felle9900
Copy link
Author

The reson im adjusting the trimbox on each cycle is because that can only be done to a specific page on the pdf. That page can be any pagenr at every cycle.

But I was maybe thinking of doing a seperate loop of cropping the businesscard pages so all of the boxes (mediabox/tribox/cropbox) are removed - then the script might work because it just have to place those pages right next to each other. Im gonna go test that out.

@felle9900
Copy link
Author

I did at test by using a cropped file (cropbox = (trimbox + 5 mm)) - file was cropped in Acrobat.

Works like a charm, se the pdf. I remember I did try this a long time ago but I ran into a problem because the old pypdf2 would not respect the cropping of the pdf file it had made itself (wierd).

Im going to test that part now.
tt.pdf

@felle9900
Copy link
Author

It worked :) :) :)

from pypdf import PdfReader, PdfWriter, Transformation, PaperSize

def mm(my_input):
    output = round(my_input / 72 * 25.4, 1)
    return int(output)

def points(my_input):
    output = my_input * 2.83464567
    return output


GAP = points(5)
COLUMNS = 4
ROWS = 5
TRIM_WIDTH = points(85)
TRIM_HEIGHT = points(55)

all_page_numbers = [1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

#sheet = PdfReader("test_files/Blank_sheet_450x320.pdf")
imposition = PdfReader("test_files/Mobildisko-visitkort.pdf")

# create write object (sheet)
write_object = PdfWriter()
#write_object.append(sheet)
[tt.pdf](https://github.com/py-pdf/pypdf/files/10550607/tt.pdf)
[Mobildisko-visitkort.pdf](https://github.com/py-pdf/pypdf/files/10550608/Mobildisko-visitkort.pdf)


write_object.add_blank_page(PaperSize.A6.width, PaperSize.A6.height)
#write_object.add_blank_page(points(650), points(320))

# difference between the imposition mediabox and trimbox
media_trim_diff = float((imposition.pages[0].mediabox.right - imposition.pages[0].trimbox.right))

# trimbox needs to be expanded 2.5 mm on all 4 sides after been moved, so we can se the cropmarks for cutting
trimbox_expanding = int(points(2.5))

# expanding the trimbox
for page in imposition.pages:
    page.trimbox.left = float(page.trimbox.left - trimbox_expanding)
    page.trimbox.bottom = float(page.trimbox.bottom - trimbox_expanding)
    page.trimbox.right = float(page.trimbox.right + trimbox_expanding)
    page.trimbox.top = float(page.trimbox.top + trimbox_expanding)

imposition_index = 0
for x in range(COLUMNS):
    for y in range(ROWS):

        page_nr = all_page_numbers[imposition_index]
        print("imposition_index:", imposition_index, "page_nr", page_nr)

        imp_page = imposition.pages[page_nr]

        write_object.pages[0].merge_translated_page(
            imp_page,
            x * points(90),
            y * points(60),
            True,
            True,
        )
        imposition_index += 1
write_object.write("tt.pdf")

Mobildisko-visitkort.pdf
tt.pdf

@felle9900
Copy link
Author

The only thing I would need is being able to place this centered on a 420 x 320 pdf.
But that could be achieved by a separate function

@pubpub-zz
Copy link
Collaborator

I'm sorry, I don't understand the question @pubpub-zz . What do you want to know?

Currently merge_transformed_page crops the content to trimbox whereas pdf reference states that the cropping should be done based on cropbox. for me, merge_transformed_page is buggy. Do you confirm my analysis?

@MartinThoma
This point remained open

@MartinThoma MartinThoma reopened this Feb 5, 2023
@pubpub-zz
Copy link
Collaborator

@MartinThoma what do you think about using CropBox instead of TrimBox ?

@MartinThoma
Copy link
Member

@pubpub-zz I guess you are referring to this piece:

rect = page2.trimbox
page2content.operations.insert(
    0,
    (
        map(
            FloatObject,
            [
                rect.left,
                rect.bottom,
                rect.width,
                rect.height,
            ],
        ),
        "re",
    ),
)

It's similar to our discussion in #879 , right?

What does that part of the code actually do?

@MartinThoma
Copy link
Member

From the description I tend to agree: cropbox sounds more reasonable than trimbox

MartinThoma added a commit that referenced this issue Feb 11, 2023
While the old behavior can be considered a bug, people might rely on trimbox being used.
To allow them to switch back from cropbox to trimbox, they can set

    pypdf._page.MERGE_CROP_BOX = "trimbox"

See discussions in #879 and #1426
MartinThoma pushed a commit that referenced this issue Feb 25, 2023
Several issues could have been avoided if the example in this PR existed before, e.g. #1630, #1426

Co-authored-by: Louis <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants