Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError: "a > b" was NOT fulfilled in parse_to_unicode #990

Closed
MartinThoma opened this issue Jun 14, 2022 · 4 comments
Closed

AssertionError: "a > b" was NOT fulfilled in parse_to_unicode #990

MartinThoma opened this issue Jun 14, 2022 · 4 comments
Assignees
Labels
Has MCVE A minimal, complete and verifiable example helps a lot to debug / understand feature requests is-robustness-issue From a users perspective, this is about robustness workflow-text-extraction From a users perspective, text extraction is the affected feature/workflow

Comments

@MartinThoma
Copy link
Member

When trying to extract the text from a PDF, I get an exception.

Environment

$ python -m platform
Linux-5.4.0-113-generic-x86_64-with-glibc2.31

$ python -c "import PyPDF2;print(PyPDF2.__version__)"
2.2.0

MCVE: Code + PDF

This is a minimal, complete example that shows the issue with 923767.pdf:

from PyPDF2 import PdfReader
reader = PdfReader("923767.pdf")
reader.pages[0].extract_text()

gives

  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_page.py", line 1301, in extract_text
    return self._extract_text(self, self.pdf, space_width, PG.CONTENTS)
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_page.py", line 1124, in _extract_text
    cmaps[f] = build_char_map(f, space_width, obj)
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_cmap.py", line 21, in build_char_map
    map_dict, space_code, int_entry = parse_to_unicode(ft, space_code)
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_cmap.py", line 225, in parse_to_unicode
    assert a > b
AssertionError
@MartinThoma MartinThoma added workflow-text-extraction From a users perspective, text extraction is the affected feature/workflow is-robustness-issue From a users perspective, this is about robustness Has MCVE A minimal, complete and verifiable example helps a lot to debug / understand feature requests labels Jun 14, 2022
@MartinThoma MartinThoma self-assigned this Jun 14, 2022
@MartinThoma
Copy link
Member Author

Other examples:

@pubpub-zz
Copy link
Collaborator

pubpub-zz commented Jun 14, 2022

error in the assert; however this is extra test not improving performances : I propose to remove it (PR #995)

@pubpub-zz
Copy link
Collaborator

@MartinThoma
This issue should be closed too

@MartinThoma
Copy link
Member Author

Closed by #995 :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Has MCVE A minimal, complete and verifiable example helps a lot to debug / understand feature requests is-robustness-issue From a users perspective, this is about robustness workflow-text-extraction From a users perspective, text extraction is the affected feature/workflow
Projects
None yet
Development

No branches or pull requests

2 participants