Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: "with" support for PdfMerger and PdfWriter #1193

Merged
merged 46 commits into from
Aug 3, 2022
Merged
Show file tree
Hide file tree
Changes from 41 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
a2b7b55
Add `with ... as ...` usage (#1108)
JianzhengLuo Jul 15, 2022
2a88bd5
Update PyPDF2/_merger.py according to @MasterOdin's suggestions
JianzhengLuo Jul 15, 2022
b6d0351
Update PyPDF2/_merger.py according to @MasterOdin's suggestions
JianzhengLuo Jul 15, 2022
80813aa
Update PyPDF2/_merger.py according to @MasterOdin's suggestions
JianzhengLuo Jul 16, 2022
e6ec1f6
Update PyPDF2/_merger.py according to @MasterOdin's suggestions
JianzhengLuo Jul 16, 2022
386be5b
Update PyPDF2/_merger.py according to @MasterOdin's suggestions
JianzhengLuo Jul 16, 2022
31786bb
Sorry, I forgot to run before committing, so didn't notice that `Trac…
JianzhengLuo Jul 16, 2022
ae1bec0
Modify the wrong closing place that cause.
JianzhengLuo Jul 16, 2022
0b7c64e
Merge branch 'main' into add-with-as-usage-#1108
JianzhengLuo Jul 17, 2022
b695113
Modify PyPDF2/_writer.py according to @MasterOdin's suggestions
JianzhengLuo Jul 18, 2022
58797c2
Modify PyPDF2/_writer.py according to @MasterOdin's suggestions
JianzhengLuo Jul 18, 2022
1caa9ec
Modify PyPDF2/_writer.py according to @MasterOdin's suggestions
JianzhengLuo Jul 18, 2022
4fbe3cc
Modify PyPDF2/_writer.py according to @MasterOdin's suggestions
JianzhengLuo Jul 18, 2022
d598e8a
Modify PyPDF2/_writer.py according to @MasterOdin's suggestions
JianzhengLuo Jul 18, 2022
7ecf9ff
Merge branch 'add-with-as-usage-#1108' of https://github.com/Jianzhen…
JianzhengLuo Jul 18, 2022
562ebc7
Merge branch 'main' into add-with-as-usage-#1108
Jul 19, 2022
519dad1
Fix accident
JianzhengLuo Jul 21, 2022
336053a
Merge branch 'add-with-as-usage-#1108' of https://github.com/Jianzhen…
JianzhengLuo Jul 21, 2022
90af68a
Fix error raising while using half traditional usage
JianzhengLuo Jul 21, 2022
0f67658
Add a unit test (Problems still exist, please help.)
JianzhengLuo Jul 21, 2022
a6f973e
Update PyPDF2/_merger.py according to @MartinThoma's suggestions
JianzhengLuo Jul 21, 2022
580ad8c
Modify PyPDF2/_merger.py according to flake8
JianzhengLuo Jul 21, 2022
bf264bc
Modify to compatible with the existing usage
JianzhengLuo Jul 21, 2022
c705300
Modify to compatible with the with .. as ... usage
JianzhengLuo Jul 21, 2022
2940589
Removed a meaningless annotation
JianzhengLuo Jul 21, 2022
cda38ac
Fixed wrong test function names
JianzhengLuo Jul 21, 2022
0b18198
Fixed those I can fix only.
JianzhengLuo Jul 22, 2022
62f970f
Removed useless `else` section
JianzhengLuo Jul 22, 2022
d85c91b
Renamed argument name `fileobj` back to `stream` to keep the existing…
JianzhengLuo Jul 22, 2022
5cd3f3e
Modified annoation for `PdfWriter().write()` and `PdfWriter().write_s…
JianzhengLuo Jul 22, 2022
a7346ef
Switched `fileobj` and `strict` in initializing `PdfWriter()` to keep…
JianzhengLuo Jul 23, 2022
d3e62ee
Modified annoation to make the it match the arguments
JianzhengLuo Jul 23, 2022
59be94d
Modified undefined name `fileobj`
JianzhengLuo Jul 24, 2022
092f209
Merge branch 'main' into add-with-as-usage-#1108
MartinThoma Aug 1, 2022
da7a239
Fixes
MartinThoma Aug 1, 2022
fa8880f
os.path.join
MartinThoma Aug 1, 2022
626d064
Fix issues
MartinThoma Aug 1, 2022
6af6969
Add type hint
MartinThoma Aug 1, 2022
73c2ffb
Test one more line
MartinThoma Aug 1, 2022
5e6f663
Update comments
MartinThoma Aug 1, 2022
8ffda32
Add JianzhengLuo to contributors list
MartinThoma Aug 1, 2022
2888353
Allow using Path
MartinThoma Aug 2, 2022
aa2f64a
Remove print
MartinThoma Aug 2, 2022
d731281
Apply suggestions from code review
MartinThoma Aug 2, 2022
698821e
More tests
MartinThoma Aug 2, 2022
5464bd0
Use tmp_path fixture
MartinThoma Aug 2, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CONTRIBUTORS.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ history and [GitHubs 'Contributors' feature](https://github.com/py-pdf/PyPDF2/gr

## Contributors to the pyPdf / PyPDF2 project

* [JianzhengLuo](https://github.com/JianzhengLuo)
* [Karvonen, Harry](https://github.com/Hatell/)
* [KourFrost](https://github.com/KourFrost)
* [Lightup1](https://github.com/Lightup1)
Expand Down
41 changes: 34 additions & 7 deletions PyPDF2/_merger.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,18 @@

from io import BytesIO, FileIO, IOBase
from pathlib import Path
from typing import Any, Dict, Iterable, List, Optional, Tuple, Union, cast
from types import TracebackType
from typing import (
Any,
Dict,
Iterable,
List,
Optional,
Tuple,
Type,
Union,
cast,
)

from ._encryption import Encryption
from ._page import PageObject
Expand Down Expand Up @@ -84,18 +95,38 @@ class PdfMerger:
:param bool strict: Determines whether user should be warned of all
problems and also causes some correctable problems to be fatal.
Defaults to ``False``.
:param fileobj: Output file. Can be a filename or any kind of
file-like object.
"""

@deprecate_bookmark(bookmarks="outline")
def __init__(self, strict: bool = False) -> None:
def __init__(self, strict: bool = False, fileobj: StrByteType = "") -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should fileobj be allowed to be path-like following from #1190?

Additionally, why not make the type Optional[StrByteType] and just have it default to None instead of ""?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to allow Path objects 👍

What is the advantage of using None?

self.inputs: List[Tuple[Any, PdfReader, bool]] = []
self.pages: List[Any] = []
self.output: Optional[PdfWriter] = PdfWriter()
self.outline: OutlineType = []
self.named_dests: List[Any] = []
self.id_count = 0
self.fileobj = fileobj
self.strict = strict

def __enter__(self) -> "PdfMerger":
# There is nothing to do.
return self

def __exit__(
self,
exc_type: Optional[Type[BaseException]],
exc: Optional[BaseException],
traceback: Optional[TracebackType],
) -> None:
"""Write to the fileobj and close the merger."""
if self.fileobj:
print(f"write to {self.fileobj}")
self.write(self.fileobj)
print("nope")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove these prints?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.close()

@deprecate_bookmark(bookmark="outline_item", import_bookmarks="import_outline")
def merge(
self,
Expand Down Expand Up @@ -263,10 +294,6 @@ def write(self, fileobj: StrByteType) -> None:
"""
if self.output is None:
raise RuntimeError(ERR_CLOSED_WRITER)
my_file = False
if isinstance(fileobj, str):
fileobj = FileIO(fileobj, "wb")
my_file = True

# Add pages to the PdfWriter
# The commented out line below was replaced with the two lines below it
Expand All @@ -285,7 +312,7 @@ def write(self, fileobj: StrByteType) -> None:
self._write_outline()

# Write the output to the file
self.output.write(fileobj)
my_file, fileobj = self.output.write(fileobj)

if my_file:
fileobj.close()
MartinThoma marked this conversation as resolved.
Show resolved Hide resolved
Expand Down
58 changes: 50 additions & 8 deletions PyPDF2/_writer.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@
import time
import uuid
from hashlib import md5
from io import BufferedReader, BufferedWriter, BytesIO, FileIO
from types import TracebackType
from typing import (
Any,
Callable,
Expand All @@ -44,6 +46,7 @@
List,
Optional,
Tuple,
Type,
Union,
cast,
)
Expand All @@ -52,6 +55,7 @@
from ._reader import PdfReader
from ._security import _alg33, _alg34, _alg35
from ._utils import (
StrByteType,
StreamType,
_get_max_pdf_version_header,
b_,
Expand Down Expand Up @@ -121,7 +125,7 @@ class PdfWriter:
class (typically :class:`PdfReader<PyPDF2.PdfReader>`).
"""

def __init__(self) -> None:
def __init__(self, fileobj: StrByteType = "") -> None:
MartinThoma marked this conversation as resolved.
Show resolved Hide resolved
self._header = b"%PDF-1.3"
self._objects: List[Optional[PdfObject]] = [] # array of indirect objects
self._idnum_hash: Dict[bytes, IndirectObject] = {}
Expand Down Expand Up @@ -158,6 +162,23 @@ def __init__(self) -> None:
)
self._root: Optional[IndirectObject] = None
self._root_object = root
self.fileobj = fileobj
self.with_as_usage = False

def __enter__(self) -> "PdfWriter":
"""Store that writer is initialized by 'with'."""
self.with_as_usage = True
return self

def __exit__(
self,
exc_type: Optional[Type[BaseException]],
exc: Optional[BaseException],
traceback: Optional[TracebackType],
) -> None:
"""Write data to the fileobj."""
if self.fileobj:
self.write(self.fileobj)

@property
def pdf_header(self) -> bytes:
Expand Down Expand Up @@ -771,13 +792,7 @@ def encrypt(
self._encrypt = self._add_object(encrypt)
self._encrypt_key = key

def write(self, stream: StreamType) -> None:
"""
Write the collection of pages added to this object out as a PDF file.

:param stream: An object to write the file to. The object must support
the write method and the tell method, similar to a file object.
"""
def write_stream(self, stream: StreamType) -> None:
if hasattr(stream, "mode") and "b" not in stream.mode:
logger_warning(
f"File <{stream.name}> to write to is not in binary mode. " # type: ignore
Expand All @@ -803,6 +818,33 @@ def write(self, stream: StreamType) -> None:
self._write_trailer(stream)
stream.write(b_(f"\nstartxref\n{xref_location}\n%%EOF\n")) # eof

def write(
self, stream: StrByteType
) -> Tuple[bool, Union[FileIO, BytesIO, BufferedReader, BufferedWriter]]:
"""
Write the collection of pages added to this object out as a PDF file.

:param stream: An object to write the file to. The object can support
the write method and the tell method, similar to a file object, or
be a file path, just like the fileobj, just named it stream to keep
existing workflow.
"""
my_file = False

if stream == "":
raise ValueError(f"Output(stream={stream}) is empty.")

if isinstance(stream, str):
stream = FileIO(stream, "wb")
my_file = True

self.write_stream(stream)

if self.with_as_usage:
stream.close()

return my_file, stream

def _write_header(self, stream: StreamType) -> List[int]:
object_positions = []
stream.write(self.pdf_header + b"\n")
Expand Down
53 changes: 46 additions & 7 deletions tests/test_merger.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,12 @@
sys.path.append(str(PROJECT_ROOT))


def test_merge():
def merger_operate(merger):
pdf_path = RESOURCE_ROOT / "crazyones.pdf"
outline = RESOURCE_ROOT / "pdflatex-outline.pdf"
pdf_forms = RESOURCE_ROOT / "pdflatex-forms.pdf"
pdf_pw = RESOURCE_ROOT / "libreoffice-writer-password.pdf"

merger = PyPDF2.PdfMerger()

# string path:
merger.append(pdf_path)
merger.append(outline)
Expand Down Expand Up @@ -95,10 +93,8 @@ def test_merge():
merger.set_page_layout("/SinglePage")
merger.set_page_mode("/UseThumbs")

tmp_path = "dont_commit_merged.pdf"
merger.write(tmp_path)
merger.close()

def check_outline(tmp_path):
# Check if outline is correct
reader = PyPDF2.PdfReader(tmp_path)
assert [el.title for el in reader.outline if isinstance(el, Destination)] == [
Expand All @@ -117,7 +113,50 @@ def test_merge():

# TODO: There seem to be no destinations for those links?

# Clean up

tmp_path = "dont_commit_merged.pdf"


def test_merger_operations_by_traditional_usage():
# Arrange
merger = PdfMerger()
merger_operate(merger)

# Act
merger.write(tmp_path)
merger.close()

# Assert
check_outline(tmp_path)

# cleanup
os.remove(tmp_path)
MartinThoma marked this conversation as resolved.
Show resolved Hide resolved


def test_merger_operations_by_semi_traditional_usage():
with PdfMerger() as merger:
# Arrange
merger_operate(merger)
# Act
merger.write(tmp_path)

# Assert
assert os.path.isfile(tmp_path)
check_outline(tmp_path)

# cleanup
os.remove(tmp_path)


def test_merger_operation_by_new_usage():
with PdfMerger(fileobj=tmp_path) as merger:
merger_operate(merger)

# Assert
assert os.path.isfile(tmp_path)
check_outline(tmp_path)

# cleanup
os.remove(tmp_path)


Expand Down
68 changes: 57 additions & 11 deletions tests/test_writer.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,20 +26,16 @@ def test_writer_clone():
assert len(writer.pages) == 4


def test_writer_operations():
def writer_operate(writer):
"""
This test just checks if the operation throws an exception.

This should be done way more thoroughly: It should be checked if the
output is as expected.
To test the writer that initialized by each of the four usages.
"""
pdf_path = RESOURCE_ROOT / "crazyones.pdf"
pdf_outline_path = RESOURCE_ROOT / "pdflatex-outline.pdf"

reader = PdfReader(pdf_path)
reader_outline = PdfReader(pdf_outline_path)

writer = PdfWriter()
page = reader.pages[0]
with pytest.raises(PageSizeNotDefinedError) as exc:
writer.add_blank_page()
Expand Down Expand Up @@ -91,17 +87,57 @@ def test_writer_operations():

writer.add_attachment("foobar.gif", b"foobarcontent")

# finally, write "output" to PyPDF2-output.pdf
tmp_path = "dont_commit_writer.pdf"
with open(tmp_path, "wb") as output_stream:
writer.write(output_stream)

# Check that every key in _idnum_hash is correct
objects_hash = [o.hash_value() for o in writer._objects]
for k, v in writer._idnum_hash.items():
assert v.pdf == writer
assert k in objects_hash, "Missing %s" % v


tmp_path = "dont_commit_writer.pdf"


def test_writer_operations_by_totally_traditional_usage():
writer = PdfWriter()

writer_operate(writer)

# finally, write "output" to PyPDF2-output.pdf
with open(tmp_path, "wb") as output_stream:
writer.write(output_stream)

# cleanup
os.remove(tmp_path)
MartinThoma marked this conversation as resolved.
Show resolved Hide resolved


def test_writer_operations_by_semi_traditional_usage():
with PdfWriter() as writer:
writer_operate(writer)

# finally, write "output" to PyPDF2-output.pdf
with open(tmp_path, "wb") as output_stream:
writer.write(output_stream)

# cleanup
os.remove(tmp_path)


def test_writer_operations_by_semi_new_traditional_usage():
with PdfWriter() as writer:
writer_operate(writer)

# finally, write "output" to PyPDF2-output.pdf
writer.write(tmp_path)

# cleanup
os.remove(tmp_path)


def test_writer_operation_by_totally_new_usage():
# This includes write "output" to PyPDF2-output.pdf
with PdfWriter(tmp_path) as writer:
writer_operate(writer)

# cleanup
os.remove(tmp_path)

Expand Down Expand Up @@ -656,3 +692,13 @@ def test_colors_in_outline_item():

# Cleanup
os.remove(target) # remove for testing


def test_write_empty_stream():
reader = PdfReader(EXTERNAL_ROOT / "004-pdflatex-4-pages/pdflatex-4-pages.pdf")
writer = PdfWriter()
writer.clone_document_from_reader(reader)

with pytest.raises(ValueError) as exc:
writer.write("")
assert exc.value.args[0] == "Output(stream=) is empty."