-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stream has ended unexpectedly error on certain PDF files #99
Comments
The PDF file that triggered this error can be found here: https://drive.google.com/file/d/0B_P1mlgsZIJpRjNnQkxCenUzTkU/edit?usp=sharing |
Thank you for the detailed bug report! I will try to track down the issue soon, though I will be unavailable for the following week. |
Hello, |
Thanks for the response. I am using |
|
Thanks! I set |
Hello, I have been recently getting a similar error. can you please post me an example on how/where to implement the fix? Thank you! |
Hello, Just to give an update.... what I did was manually edit the pdf.py file to set strict = False (I was hoping not to do it this way as I don't want to run into issues later on when I upgrade. However, after running the script again with strict set to false, it splits the PDF's no problem, however it still returns an error: PdfReadWarning: Invalid stream (index 77) within object 1444 0: Stream has ended unexpectedly [pdf.py:1162] Any ideas? |
Well, you don't have to change When in strict mode, PyPDF2 quits when encountering this stream error and throws a Ignoring the error doesn't seem to harm the output in any way (as you noticed), so we need to investigate why the error is thrown at all (maybe PyPDF2 is too strict on slightly 'irregular' PDFs?). Or maybe the error is significant but the output PDFs haven't displayed any symptoms? Hope that made a little sense. |
maybe works for PdfFileReader but not with PdfFileMerger. I try and also into merger.append(PdfFileReader(open(os.path.join(files_dir, f), "rb"), strict = False))
I also set def __init__(self, stream, strict=False, warndest = None, overwriteWarnings = True): at line 891 of pdf.py and doesn't works, any solution? |
relaxes when dealing with slightly dodgy PDF's. py-pdf/pypdf#99
Try merging your PDFs by using the 'append' and 'merge' functionality of PyPDF2 instead. I faced the same issue and following approach worked for me - from PyPDF2 import PdfFileMerger
merger = PdfFileMerger()
input1 = open("file1.pdf", "rb")
input2 = open("file2.pdf", "rb")
# add the first 3 pages of first file to output
merger.append(fileobj = input1, pages = (0,3))
# insert the first page of second file into the output beginning after the second page
merger.merge(position = 2, fileobj = input2, pages = (0,1))
# Write to an output PDF document
output = open("document-output.pdf", "wb")
merger.write(output) Remove the 'pages' argument in 'append' and 'merge' functions to merge files instead of specific pages. |
I just started to experience this issue when calling The only difference in my case different from the above is it only happens after I've run the code once. I'm calling calling this from within ArcGIS (mapping software). I have to close the software and re-open it to get the 1st successful run. This seems to indicate that something is being held onto after the 1st run...but again, it just started happening. I realize this probably doesn't help you move towards a fix: just reporting to up the user count for this. Edit - "fix":
|
@appurwar the error is returned no matter if append or merge is used. The problem here seems to be the format of the PDF that is being appended, so it's not PyPDF2's fault. A sensible workaround seems to reformat the PDF in some other way before passing it to PyPDF2 once this is detected. strict=False doesn't fix this either, I came here after the error happened with strict=False on. |
I'm closing this issue now as it seems to be mostly about using |
Thanks @mstamy2 @mstamy2. strict=False while reading the pdf from PdfFileReader() works great and the rewritten or merged file won't get harmed but if some workaround done on pdf file that might affect the pdf structure will cause the same error though the strict=False is done. Not a problem of this package |
Hello Traceback (most recent call last): |
Which version of PyPDF2 do you use? |
Version 2.0.0 |
@Eslafif, |
Updated and same error exist |
and can you provide the pdf file and the page. without, no analysis can be done |
This's the page that gives the error |
@Eslafif |
Tested and giving the same error |
Can you share the output please |
|
you are not using my code and the file you've provided. Can you tell what is the result with my program please |
Traceback (most recent call last): |
this's with your code different that the file is big so i only attached the page with the problem |
@Eslafif, Meanwhile, looking at #454 I may have found a fix. as a patch can you modify generic.py line 495: if tok.isdigit():
# "The number ddd may consist of one, two, or three
# octal digits; high-order overflow shall be ignored.
# Three octal digits shall be used, with leading zeros
# as needed, if the next character of the string is also
# a digit." (PDF reference 7.3.4.2, p 16)
for _ in range(2):
ntok = stream.read(1)
if ntok.isdigit():
tok += ntok
else:
**stream.seek(-1,1)** _<--- to be added_
break
tok = b_(chr(int(tok, base=8))) I would like to confirm the fix before releasing the PR |
@Eslafif, |
@MartinThoma, |
I'm closing the issue as I believe it's solved. If anybody still has this issue with the latest PyPDF2 version, please let us know. |
|
Output: |
@mmariani3 |
ah! Works like a charm now |
We process dozens of PDF files per day in our automated script that uses PyPDF2 version 1.21 as part of its process. A few files have been failing with the error pasted below. I can provide the PDF file that is having this error, just let me know how you would like me to send it. Thanks!
The text was updated successfully, but these errors were encountered: