-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueError: invalid literal for int() with base 10: #183
Comments
Sample file in http://bugs.fi/media/afl/pypdf2/pypdf2-afl-invalid-literal-int-with-base-10.pdf (SHA1 9d25406c4a3c9f5ea61bc96f9251d2f7f186ebf7) with following Python code demonstrates this issue and can be used as a reproducer. Fuzzed with American fuzzy lop and https://bitbucket.org/jwilk/python-afl. import PyPDF2 as pyPdf
input = pyPdf.PdfFileReader(open('pypdf2-afl-invalid-literal-int-with-base-10.pdf', 'rb'))
print "document1.pdf has %d pages." % input.getNumPages()
|
Did you find a way to workaround this issue? |
Unfortunately, not all the They generally just indicate a parsing error, and they occur frequently when the file deviates from the PDF standard in some way. The good news is, parsing errors aren't terribly difficult to track down, provided I can access the file that triggers them. That said, if anyone would like to submit a PDF I would be happy to take a look (the link in the second comment is broken). |
It is working OK for me (owner of that site).
|
It seems that PDF is invalid (can't be opened by any conforming reader), so PyPDF2 would be expected to fail when reading it. That said, it is misleading because it seems to be read successfully; the expected result would be a If we can find conforming PDFs (i.e. opens in Adobe, Foxit, etc.) that exhibit the |
I have a file from hsbc that I can manually open but cannot open with this library. I'm happy to pm it to you @mstamy2 if you're interested. |
I also ran into this with PDFShuffler and tickets from DB. How can I investigate this further? |
Same error arises when trying to access the numPages attribute in this file. Same error also occurs if we use some other function such as obj.getPage(0). PyPDF2 version 1.26.0 installed from conda on Anaconda3.
Looks like the header doesn't have them in an int format. However, the file opens in Foxit and Adobe Reader normally. |
Assignment Animas_No_Provisions.pdf This is one such pdf that is failing. Can anyone take a look and suggest a workaround? |
I also using PyPDF2 version 1.26.0, same error occured. |
added potential workaround (ugly monkey patch) in #164 |
Same problem, Please anyone help solve this |
Have any one tried this one
|
I was getting similar errors. Opening the PDF in Adobe Reader showed me the PDF version of the file. It was 1.5. After opening it in Microsoft Word and saving as PDF again it got saved as 1.7 version. After that this issue stopped coming on this 1.7 version of the PDF |
This solution worked for me: https://stackoverflow.com/questions/26242952/pypdf-2-decrypt-not-working. I had to use qpdf to decrypt the file before trying to open it in Python.
|
I had this issue, and it was fixed by opening the PDF in adobe, then saving it as a new doc. It went from version 1.5 to version 1.6, and then the issue went away. |
I got the same error and this worked for me install this package - pikepdf pikepdf is a Python library allowing creation, manipulation and repair of PDFs. It provides a Pythonic wrapper around the C++ PDF content transformation library, QPD. Now, after installing import pikepdf And run this code try:
inputpdf = PdfFileReader(open(pdf_address,'rb'))
except ValueError:
pdf = pikepdf.open(pdf_address,allow_overwriting_input=True)
pdf.save(pdf_address)
inputpdf = PdfFileReader(open(pdf_address,'rb')) |
PyPDF2 had lots of updates since April 2022. I'm closing this issue now as I suspect that it's solved. If you still encounter it with a recent PyPDF2 version, please let me know. |
I was able to recreate this error in from PyPDF2 import PdfFileMerger, PdfFileReader, PdfFileWriter
from PyPDF2.generic import AnnotationBuilder
import io
PATH_TO_PDF = "./Generator.pdf"
merger = PdfFileMerger(strict=False)
with open(PATH_TO_PDF, "rb") as pdf: old = io.BytesIO(pdf.read())
reader = PdfFileReader(old)
writer = PdfFileWriter()
for page in reader.pages:
writer.add_page(page)
annotation = AnnotationBuilder.link(rect=[0,0,100,100], target_page_index=0, fit='/Fit', fit_args=(123,))
writer.add_annotation(page_number=1, annotation=annotation)
writer.write(old)
merger.append(old) In my testing, it appears to only break when annotations are added to some pdfs with a version number <= 1.4. EDIT: stack trace
|
@austinwarnock, |
@austinwarnock, from PyPDF2 import PdfFileMerger, PdfFileReader, PdfFileWriter
from PyPDF2.generic import AnnotationBuilder
import io
PATH_TO_PDF = "./Generator.pdf"
merger = PdfFileMerger(strict=False)
with open(PATH_TO_PDF, "rb") as pdf: old = io.BytesIO(pdf.read())
reader = PdfFileReader(old)
writer = PdfFileWriter()
for page in reader.pages:
writer.add_page(page)
annotation = AnnotationBuilder.link(rect=[0,0,100,100], target_page_index=0, fit='/Fit', fit_args=(123,))
writer.add_annotation(page_number=1, annotation=annotation)
new = io.BytesIO()
writer.write(new)
merger.append(new) |
Thank you for investigating it @pubpub-zz ❤️ |
Using latest version: PyPDF2-1.24.tar.gz
With code:
ValueError: invalid literal for int() with base 10: '2pGF'
lines of pdf:
line 143 - >>
line 144 - endobj
line 145 - 16 0 obj <</Length 8905 /Filter[/A85 /Fl]>> stream
line 146 - Gb![snip]2bGF[snip]J~>
If I import the full string (Gb![snip]2bGF[snip]J~) into python and use a85decode, I get the proper byte array.
The text was updated successfully, but these errors were encountered: