IndexError: list index out of range #1278

DL6ER · 2022-08-26T07:54:58Z

See #1269 for further details.

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-5.4.0-122-generic-x86_64-with-glibc2.29

$ python -c "import PyPDF2;print(PyPDF2.__version__)"
2.10.3

Code + PDF

This is a minimal, complete example that shows the issue:

from PyPDF2 import PdfReader

with open("Work Flow From Check to QA.pdf", "rb") as f:
  reader = PdfReader(f, strict=False)
  content = " ".join([page.extract_text() for page in reader.pages])

PDF used above: Work Flow From Check to QA.pdf

Traceback

This is the complete Traceback I see:

Traceback (most recent call last):
  File "test.py", line 4, in <module>
    content = " ".join([page.extract_text() for page in reader.pages])
  File "test.py", line 4, in <listcomp>
    content = " ".join([page.extract_text() for page in reader.pages])
  File "/usr/local/lib/python3.8/dist-packages/PyPDF2/_page.py", line 1510, in extract_text
    return self._extract_text(
  File "/usr/local/lib/python3.8/dist-packages/PyPDF2/_page.py", line 1444, in _extract_text
    process_operation(operator, operands)
  File "/usr/local/lib/python3.8/dist-packages/PyPDF2/_page.py", line 1258, in process_operation
    float(operands[5]),
IndexError: list index out of range

The text was updated successfully, but these errors were encountered:

pubpub-zz · 2022-08-26T10:27:53Z

for local ref
Work Flow From Check to QA.pdf

fix py-pdf#1278

pubpub-zz · 2022-08-26T10:36:33Z

The page contains an array of content stream. They have to be reassembled adding line breaks. PR #1281 will fix this issue

Closes #1278

MartinThoma · 2022-08-27T11:39:05Z

Thank you for reporting the issue @DL6ER ! I'll release a PyPDF2 version with the fix on Sunday to PyPI.

MartinThoma · 2022-08-27T11:39:55Z

We value good error reports @DL6ER! I can add you to https://pypdf2.readthedocs.io/en/latest/meta/CONTRIBUTORS.html if you want :-)

DL6ER · 2022-08-27T11:45:13Z

@MartinThoma Sure, you can add me if your want. I will "contribute" some more issues over the next days ;-)

MartinThoma · 2022-08-27T12:10:47Z

You're added :-) (It might take ~5 minutes until the docs refresh)

pubpub-zz added a commit to pubpub-zz/pypdf that referenced this issue Aug 26, 2022

ROB : add required line separators in content stream arrays

72f8f2d

fix py-pdf#1278

pubpub-zz mentioned this issue Aug 26, 2022

ROB: Add required line separators in ContentStream ArrayObjects #1281

Merged

MartinThoma closed this as completed in #1281 Aug 27, 2022

MartinThoma pushed a commit that referenced this issue Aug 27, 2022

ROB: Add required line separators in ContentStream ArrayObjects (#1281)

c819acb

Closes #1278

MartinThoma added is-robustness-issue From a users perspective, this is about robustness PdfReader The PdfReader component is affected labels Aug 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IndexError: list index out of range #1278

IndexError: list index out of range #1278

DL6ER commented Aug 26, 2022 •

edited by MartinThoma

Loading

pubpub-zz commented Aug 26, 2022

pubpub-zz commented Aug 26, 2022

MartinThoma commented Aug 27, 2022

MartinThoma commented Aug 27, 2022

DL6ER commented Aug 27, 2022

MartinThoma commented Aug 27, 2022

IndexError: list index out of range #1278

IndexError: list index out of range #1278

Comments

DL6ER commented Aug 26, 2022 • edited by MartinThoma Loading

Environment

Code + PDF

Traceback

pubpub-zz commented Aug 26, 2022

pubpub-zz commented Aug 26, 2022

MartinThoma commented Aug 27, 2022

MartinThoma commented Aug 27, 2022

DL6ER commented Aug 27, 2022

MartinThoma commented Aug 27, 2022

DL6ER commented Aug 26, 2022 •

edited by MartinThoma

Loading