Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrupted PDF in 1.26 when older versions throw "ValueError: I/O operation on closed file" #263

Closed
TheFrostyboss opened this issue May 19, 2016 · 1 comment

Comments

@TheFrostyboss
Copy link

TheFrostyboss commented May 19, 2016

So I was using v 1.26 to splice/merge different pages of different PDFs together, and invariably most of the pages would be corrupted when I output them (e.g. reader would display a blank page and say "There is an error displaying this page", or some pages might retain the same structure as the original PDF, but portions of the page would be populated with random text).

When I rolled back to 1.24-1.25, the same code produced a "ValueError: I/O operation on closed file", presumably for trying to write out pages from PDFs I had closed after reading them in.

Basically, trying to debug this kind of error in the current version is extremely difficult, especially to someone not familiar with PyPDF2. That is, I would always rather get some kind of error to point me in the right direction, rather than just a corrupted output.

The original code:

full_path = r"path_to_pdf.pdf"
full_path2 = full_path.replace(".pdf", "2.pdf")
output_pdf = PdfFileWriter()
py_open_file = open(full_path, "rb") 
with open(full_path, "rb") as f:
    open_pdf = PdfFileReader(f)
    last_page = len(open_pdf.pages)-1
    output_pdf.addPage(open_pdf.getPage(last_page))

outputStream = file(full_path2, "wb")
output_pdf.write(outputStream)
outputStream.close()

To resolve the error, I replaced the with/open statement with:

f = open(full_path, "rb")

@mstamy2
Copy link
Collaborator

mstamy2 commented May 19, 2016

26e5077 should allow a graceful exit as in prior to v1.26.0.

I suppose it's a little counterintuitive that input files must remain open during the write process...

hannal added a commit to hannal/PyPDF2 that referenced this issue Jul 5, 2016
* commit '036789a4664e3f572292bc7dceec10f08b7dbf62':
  Write binary data comment
  Python 3 type fixes in LZWDecode
  Appropriate error message for closed file, warn when returning null object, resolves py-pdf#263
  Read Indirect Objects with a sign, fixes py-pdf#248
  Version 1.26.0 update
  Fix a bug in _readInlineImage. We were looking for the operation EI and Q, but were not checking to ensure that there was whitespace between EI and Q.  Accordingly, any image that had EIQ in its ascii encoded data would trigger the end of the image, and cause errors.
  Remove extraneous zeros from the standard formatting.
  Remove extraneous zeros from the standard formatting.
  Ignore xref table zero index error if self.strict = False
  Working around unresolved objects and returning NullObject instead of raising a ValueError.
  Python 3 compatibility with inline images
  Python2/3 compatibility on merging pages with eps img into single page
  Adding unit tests for addJS.
  Parameterized JavaScript.
  Added convenience method for retrieving form text fields
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants