Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Improve merging docs #2247

Merged
merged 1 commit into from
Oct 10, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
121 changes: 66 additions & 55 deletions docs/user/merging-pdfs.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,108 +29,119 @@ input1 = open("document1.pdf", "rb")
input2 = open("document2.pdf", "rb")
input3 = open("document3.pdf", "rb")

# add the first 3 pages of input1 document to output
# Add the first 3 pages of input1 document to output
merger.append(fileobj=input1, pages=(0, 3))

# insert the first page of input2 into the output beginning after the second page
# Insert the first page of input2 into the output beginning after the second page
merger.merge(position=2, fileobj=input2, pages=(0, 1))

# append entire input3 document to the end of the output document
# Append entire input3 document to the end of the output document
merger.append(input3)

# Write to an output PDF document
output = open("document-output.pdf", "wb")
merger.write(output)

# Close File Descriptors
# Close file descriptors
merger.close()
output.close()
```

## append
`append` has been slighlty extended in `PdfWriter`.

see [pdfWriter.append](../modules/PdfWriter.html#pypdf.PdfWriter.append) for more details
`append` has been slightly extended in `PdfWriter`. See [PdfWriter.append](../modules/PdfWriter.html#pypdf.PdfWriter.append) for more details.

**parameters:**
### Examples

*fileobj*: PdfReader or filename to merge
*outline_item*: string of a outline/bookmark pointing to the beginning of the inserted file.
if None, or omitted, no bookmark will be added.
*pages*: pages to merge ; you can also provide a list of pages to merge
None(default) means that the full document will be merged.
*import_outline*: import/ignore the pertinent outlines from the source (default True)
*excluded_fields*: list of keys to be ignored for the imported objects;
if "/Annots" is part of the list, the annotation will be ignored
if "/B" is part of the list, the articles will be ignored

examples:

`writer.append("source.pdf",(0,10)) # append the first 10 pages of source.pdf`

`writer.append(reader,"page 1 and 10",[0,9]) #append first and 10th page from reader and create an outline)`
```python
# Append the first 10 pages of source.pdf
writer.append("source.pdf", (0, 10))

During the merging, the relevant named destination will also imported.
# Append the first and 10th page from reader and create an outline
writer.append(reader, "page 1 and 10", [0, 9])
```

If you want to insert pages in the middle of the destination, use merge (which provides (insert) position)
During merging, the relevant named destination will also imported.

You can now insert the same page multiple times. You can also insert the same page many time at once with a list:
If you want to insert pages in the middle of the destination, use `merge` (which provides (insertion) position).
You can insert the same page multiple times, if necessary even using a list-based syntax:

eg:
`writer.append(reader,[0,1,0,2,0])`
will insert the pages (1), (2), with page (0) before, in the middle and after
```python
writer.append(reader, [0, 1, 0, 2, 0])
```
will insert the pages 1 and 2 with page 0 before, in the middle and after.

## add_page / insert_page
It is recommended to use `append` or `merge` instead

It is recommended to use `append` or `merge` instead.

## Merging forms
When Merging forms, some form fields may have the same names, preventing access
to some data.

When merging forms, some form fields may have the same names, preventing access to some data.

A grouping field should be added before adding the source PDF to prevent that.
The original fields will be identified by adding the group name.

For example, after calling `reader.add_form_topname("form1")`, the field
previously named "field1" will now identified as "form1.field1" when calling
previously named `field1` will now identified as `form1.field1` when calling
`reader.get_form_text_fields(True)` or `reader.get_fields()`.

After that, you can append the input PDF completely or partially using
`writer.append` or `writer.merge`. If you insert a set of pages, only those
fields will be listed.

## reset_translation
During the cloning, if an object has been already cloned, it will not be cloned again,
a pointer this previously cloned object is returned. because of that, if you add/merge a page that has
been already added, the same object will be added the second time. If later you modify any of these two page,
both pages can be modified independantly.

To reset, call `writer.reset_translation(reader)`
During cloning, if an object has been already cloned, it will not be cloned again, and a pointer
to this previously cloned object is returned instead. Because of that, if you add/merge a page that has
already been added, the same object will be added the second time. If you modify any of these two pages later,
both pages can be modified independently.

To reset, call `writer.reset_translation(reader)`.

## Advanced cloning
In order to prevent side effect between pages/objects and all objects linked are linked during merging.

This process will be automatically applied if you use PdfWriter.append/merge/add_page/insert_page.
If you want to clone an object before attaching it "manually", use clone function of any PdfObject:
eg:
In order to prevent side effects between pages/objects and all objects linked cloning is done during the merge.

This process will be automatically applied if you use `PdfWriter.append/merge/add_page/insert_page`.
If you want to clone an object before attaching it "manually", use the `clone` method of any *PdfObject*:

```python
cloned_object = object.clone(writer)
```

If you try to clone an object already belonging to the writer, it will return the same object:

```python
assert cloned_object == object.clone(writer)
```

`cloned_object = object.clone(writer)`
The same holds true if you try to clone an object twice. It will return the previously cloned object:

if you try clone an object already belonging to writer, it will return the same object
```python
assert object.clone(writer) == object.clone(writer)
```

`cloned_object == object.clone(writer) # -> returns True`
Please note that if you clone an object, you will clone all the objects below as well,
including the objects pointed by *IndirectObject*. Due to this, if you clone a page that
includes some articles (`"/B"`), not only the first article, but also all the chained articles
and the pages where those articles can be read will be copied.
It means that you may copy lots of objects which will be saved in the output PDF as well.

the same, if you try to clone twice an object it will return the previously cloned object
In order to prevent this, you can provide the list of fields in the dictionaries to be ignored:

`object.clone(writer) == object.clone(writer) # -> returns True`
```python
new_page = writer.add_page(reader.pages[0], excluded_fields=["/B"])
```

Also, note that if you clone an object, you will clone all the objects below
including the objects pointed by IndirectObject. because of that if you clone
a page that includes some articles ("/B"),
not only the first article, but also all the chained articles, and the pages
where those articles can be read will be copied.
It means that you may copy lots of objects, that will be saved in the output pdf.
### Merging rotated pages

In order to prevent, that you can provide the list of defined the fields in the dictionaries to be ignored:
If you are working with rotated pages, you might want to call `transfer_rotation_to_content()` on the page
before merging to avoid wrongly rotated results:

eg:
`new_page = writer.add_page(reader.pages[0],excluded_fields=["/B"])`
```python
for page in writer.pages:
if page.rotation != 0:
page.transfer_rotation_to_content()
page.merge_page(background, over=False)
```