Skip to content

Commit

Permalink
BUG: Use 1MB as offset for readNextEndLine (py-pdf#321)
Browse files Browse the repository at this point in the history
Try to find “%%EOF” in last 1Mb of file.

This fixes the issue with reading Selenium-generated PDF files.

Closes py-pdf#177
Closes py-pdf#442
Closes py-pdf#480
  • Loading branch information
akolpakov authored and VictorCarlquist committed Apr 29, 2022
1 parent fe3ea7f commit 6e86c22
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions PyPDF2/pdf.py
Original file line number Diff line number Diff line change
Expand Up @@ -1821,12 +1821,12 @@ def read(self, stream):
absoluteEndFilePos = stream.tell() + 1
if not stream.tell():
raise PdfReadError('Cannot read an empty file')
last1K = absoluteEndFilePos - 1024 # offset of last 1024 bytes of stream
last1M = stream.tell() - 1024 * 1024 + 1 # offset of last MB of stream
line = b_('')
while line[:5] != b_("%%EOF"):
if stream.tell() < last1K:
if stream.tell() < last1M:
raise PdfReadError("EOF marker not found")
line = self.readNextEndLine(stream, last1K)
line = self.readNextEndLine(stream)
if debug: print(" line:",line)

# find startxref entry - the location of the xref table
Expand Down

0 comments on commit 6e86c22

Please sign in to comment.