Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pypdf.errors.PdfReadError: Could not read Boolean object #2082

Closed
isriam opened this issue Aug 11, 2023 · 1 comment · Fixed by #2083
Closed

pypdf.errors.PdfReadError: Could not read Boolean object #2082

isriam opened this issue Aug 11, 2023 · 1 comment · Fixed by #2083

Comments

@isriam
Copy link

isriam commented Aug 11, 2023

i'm getting an error when i try and read a pdf from a specific company

pycharm

test.pdf

File "C:\Users\jerem\PycharmProjects\testing\main.py", line 16, in
main()
File "C:\Users\jerem\PycharmProjects\testing\main.py", line 8, in main
text = pypdf.PdfReader('test.pdf')
File "C:\Users\jerem\AppData\Local\Programs\Python\Python310\lib\site-packages\pypdf_reader.py", line 318, in init
self.read(stream)
File "C:\Users\jerem\AppData\Local\Programs\Python\Python310\lib\site-packages\pypdf_reader.py", line 1548, in read
self._read_xref_tables_and_trailers(stream, startxref, xref_issue_nr)
File "C:\Users\jerem\AppData\Local\Programs\Python\Python310\lib\site-packages\pypdf_reader.py", line 1758, in _read_xref_tables_and_trailers
startxref = self._read_xref(stream)
File "C:\Users\jerem\AppData\Local\Programs\Python\Python310\lib\site-packages\pypdf_reader.py", line 1794, in _read_xref
self._read_standard_xref_table(stream)
File "C:\Users\jerem\AppData\Local\Programs\Python\Python310\lib\site-packages\pypdf_reader.py", line 1657, in _read_standard_xref_table
size = cast(int, read_object(stream, self))
File "C:\Users\jerem\AppData\Local\Programs\Python\Python310\lib\site-packages\pypdf\generic_data_structures.py", line 1229, in read_object
return BooleanObject.read_from_stream(stream)
File "C:\Users\jerem\AppData\Local\Programs\Python\Python310\lib\site-packages\pypdf\generic_base.py", line 257, in read_from_stream
raise PdfReadError("Could not read Boolean object")
pypdf.errors.PdfReadError: Could not read Boolean object

Process finished with exit code 1

################CODE################

import pypdf

def main():
text = pypdf.PdfReader('test.pdf') #FAILS HERE
out = text.pages[0].extract_text()

if name == 'main':
pdf_file = 'test.pdf'
main()

@pubpub-zz
Copy link
Collaborator

Your pdf shows a non common observation : the xref keyword is not followed by any separator. This seems to be ok for Acrobat reader. I've added a mod to robustify that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants