You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Traceback (most recent call last):
File "...\prueba_pdf\test.py", line 6, in<module>
text = page.extract_text()
File "...\prueba_pdf\venv\lib\site-packages\pypdf\_page.py", line 2284, in extract_text
return self._extract_text(
File "...\prueba_pdf\venv\lib\site-packages\pypdf\_page.py", line 1903, in _extract_text
cmaps[f] = build_char_map(f, space_width, obj)
File "...\prueba_pdf\venv\lib\site-packages\pypdf\_cmap.py", line 29, in build_char_map
font_subtype, font_halfspace, font_encoding, font_map = build_char_map_from_dict(
File "...\prueba_pdf\venv\lib\site-packages\pypdf\_cmap.py", line 54, in build_char_map_from_dict
map_dict, space_code, int_entry = parse_to_unicode(ft, space_code)
File "...\prueba_pdf\venv\lib\site-packages\pypdf\_cmap.py", line 224, in parse_to_unicode
return type1_alternative(ft, map_dict, space_code, int_entry)
File "...\prueba_pdf\venv\lib\site-packages\pypdf\_cmap.py", line 481, in type1_alternative
if words[3] != b"put":
IndexError: list index out of range
The text was updated successfully, but these errors were encountered:
Recently I ran into a particular kind of pdf file from which I cannot extract text because the library throws an exception.
Environment
Which environment were you using when you encountered the problem?
Code + PDF
This is a minimal, complete example that shows the issue:
Sample PDF file can be found here:
example.pdf
Traceback
This is the complete Traceback I see:
The text was updated successfully, but these errors were encountered: