diff --git a/docs/user/merging-pdfs.md b/docs/user/merging-pdfs.md index 797e1f510..afb640cc2 100644 --- a/docs/user/merging-pdfs.md +++ b/docs/user/merging-pdfs.md @@ -29,69 +29,61 @@ input1 = open("document1.pdf", "rb") input2 = open("document2.pdf", "rb") input3 = open("document3.pdf", "rb") -# add the first 3 pages of input1 document to output +# Add the first 3 pages of input1 document to output merger.append(fileobj=input1, pages=(0, 3)) -# insert the first page of input2 into the output beginning after the second page +# Insert the first page of input2 into the output beginning after the second page merger.merge(position=2, fileobj=input2, pages=(0, 1)) -# append entire input3 document to the end of the output document +# Append entire input3 document to the end of the output document merger.append(input3) # Write to an output PDF document output = open("document-output.pdf", "wb") merger.write(output) -# Close File Descriptors +# Close file descriptors merger.close() output.close() ``` ## append -`append` has been slighlty extended in `PdfWriter`. -see [pdfWriter.append](../modules/PdfWriter.html#pypdf.PdfWriter.append) for more details +`append` has been slightly extended in `PdfWriter`. See [PdfWriter.append](../modules/PdfWriter.html#pypdf.PdfWriter.append) for more details. -**parameters:** +### Examples -*fileobj*: PdfReader or filename to merge -*outline_item*: string of a outline/bookmark pointing to the beginning of the inserted file. - if None, or omitted, no bookmark will be added. -*pages*: pages to merge ; you can also provide a list of pages to merge - None(default) means that the full document will be merged. -*import_outline*: import/ignore the pertinent outlines from the source (default True) -*excluded_fields*: list of keys to be ignored for the imported objects; - if "/Annots" is part of the list, the annotation will be ignored - if "/B" is part of the list, the articles will be ignored - -examples: - -`writer.append("source.pdf",(0,10)) # append the first 10 pages of source.pdf` - -`writer.append(reader,"page 1 and 10",[0,9]) #append first and 10th page from reader and create an outline)` +```python +# Append the first 10 pages of source.pdf +writer.append("source.pdf", (0, 10)) -During the merging, the relevant named destination will also imported. +# Append the first and 10th page from reader and create an outline +writer.append(reader, "page 1 and 10", [0, 9]) +``` -If you want to insert pages in the middle of the destination, use merge (which provides (insert) position) +During merging, the relevant named destination will also imported. -You can now insert the same page multiple times. You can also insert the same page many time at once with a list: +If you want to insert pages in the middle of the destination, use `merge` (which provides (insertion) position). +You can insert the same page multiple times, if necessary even using a list-based syntax: -eg: -`writer.append(reader,[0,1,0,2,0])` -will insert the pages (1), (2), with page (0) before, in the middle and after +```python +writer.append(reader, [0, 1, 0, 2, 0]) +``` +will insert the pages 1 and 2 with page 0 before, in the middle and after. ## add_page / insert_page -It is recommended to use `append` or `merge` instead + +It is recommended to use `append` or `merge` instead. ## Merging forms -When Merging forms, some form fields may have the same names, preventing access -to some data. + +When merging forms, some form fields may have the same names, preventing access to some data. A grouping field should be added before adding the source PDF to prevent that. The original fields will be identified by adding the group name. For example, after calling `reader.add_form_topname("form1")`, the field -previously named "field1" will now identified as "form1.field1" when calling +previously named `field1` will now identified as `form1.field1` when calling `reader.get_form_text_fields(True)` or `reader.get_fields()`. After that, you can append the input PDF completely or partially using @@ -99,38 +91,57 @@ After that, you can append the input PDF completely or partially using fields will be listed. ## reset_translation -During the cloning, if an object has been already cloned, it will not be cloned again, - a pointer this previously cloned object is returned. because of that, if you add/merge a page that has - been already added, the same object will be added the second time. If later you modify any of these two page, - both pages can be modified independantly. -To reset, call `writer.reset_translation(reader)` +During cloning, if an object has been already cloned, it will not be cloned again, and a pointer +to this previously cloned object is returned instead. Because of that, if you add/merge a page that has +already been added, the same object will be added the second time. If you modify any of these two pages later, +both pages can be modified independently. + +To reset, call `writer.reset_translation(reader)`. ## Advanced cloning -In order to prevent side effect between pages/objects and all objects linked are linked during merging. -This process will be automatically applied if you use PdfWriter.append/merge/add_page/insert_page. -If you want to clone an object before attaching it "manually", use clone function of any PdfObject: -eg: +In order to prevent side effects between pages/objects and all objects linked cloning is done during the merge. + +This process will be automatically applied if you use `PdfWriter.append/merge/add_page/insert_page`. +If you want to clone an object before attaching it "manually", use the `clone` method of any *PdfObject*: + +```python +cloned_object = object.clone(writer) +``` + +If you try to clone an object already belonging to the writer, it will return the same object: + +```python +assert cloned_object == object.clone(writer) +``` -`cloned_object = object.clone(writer)` +The same holds true if you try to clone an object twice. It will return the previously cloned object: -if you try clone an object already belonging to writer, it will return the same object +```python +assert object.clone(writer) == object.clone(writer) +``` -`cloned_object == object.clone(writer) # -> returns True` +Please note that if you clone an object, you will clone all the objects below as well, +including the objects pointed by *IndirectObject*. Due to this, if you clone a page that +includes some articles (`"/B"`), not only the first article, but also all the chained articles +and the pages where those articles can be read will be copied. +It means that you may copy lots of objects which will be saved in the output PDF as well. -the same, if you try to clone twice an object it will return the previously cloned object +In order to prevent this, you can provide the list of fields in the dictionaries to be ignored: -`object.clone(writer) == object.clone(writer) # -> returns True` +```python +new_page = writer.add_page(reader.pages[0], excluded_fields=["/B"]) +``` -Also, note that if you clone an object, you will clone all the objects below -including the objects pointed by IndirectObject. because of that if you clone -a page that includes some articles ("/B"), -not only the first article, but also all the chained articles, and the pages -where those articles can be read will be copied. -It means that you may copy lots of objects, that will be saved in the output pdf. +### Merging rotated pages -In order to prevent, that you can provide the list of defined the fields in the dictionaries to be ignored: +If you are working with rotated pages, you might want to call `transfer_rotation_to_content()` on the page +before merging to avoid wrongly rotated results: -eg: -`new_page = writer.add_page(reader.pages[0],excluded_fields=["/B"])` +```python +for page in writer.pages: + if page.rotation != 0: + page.transfer_rotation_to_content() + page.merge_page(background, over=False) +```