PdfReadError: File has not been decrypted #1088

MartinThoma · 2022-07-10T09:19:26Z

I was trying to read metadata from a PDF that is not encrypted. The file is encrypted with an empty password:

$ pdfinfo example.pdf

...
Encrypted:      yes (print:yes copy:no change:no addNotes:no algorithm:RC4)
...

Environment

$ python -m platform
Linux-5.4.0-121-generic-x86_64-with-glibc2.31

# Seen first (inclusive)
$ python -c "import PyPDF2;print(PyPDF2.__version__)"
2.4.2

# Seen last (inclusive)
$ python -c "import PyPDF2;print(PyPDF2.__version__)"
2.8.0

Code + PDF

The PDF: pdf/9bc3765eb6426bb34139d419a6e1f79e.pdf

>>> from PyPDF2 import PdfReader
>>> reader = PdfReader("pdf/9bc3765eb6426bb34139d419a6e1f79e.pdf")

>>> len(reader.pages)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_page.py", line 1469, in __len__
    return self.length_function()
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 389, in _get_num_pages
    return self.trailer[TK.ROOT]["/Pages"]["/Count"]  # type: ignore
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/generic.py", line 680, in __getitem__
    return dict.__getitem__(self, key).get_object()
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/generic.py", line 252, in get_object
    obj = self.pdf.get_object(self)
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 1090, in get_object
    raise PdfReadError("File has not been decrypted")
PyPDF2.errors.PdfReadError: File has not been decrypted

>>> reader.metadata
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 319, in metadata
    obj = self.trailer[TK.INFO]
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/generic.py", line 680, in __getitem__
    return dict.__getitem__(self, key).get_object()
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/generic.py", line 252, in get_object
    obj = self.pdf.get_object(self)
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 1090, in get_object
    raise PdfReadError("File has not been decrypted")
PyPDF2.errors.PdfReadError: File has not been decrypted

The text was updated successfully, but these errors were encountered:

MartinThoma · 2022-07-10T09:21:56Z

Related to #416 and #991

MartinThoma · 2022-07-10T09:33:12Z

Another example: https://corpora.tika.apache.org/base/docs/govdocs1/942/942303.pdf

MatthiasValvekens · 2022-07-13T16:35:40Z

That file is encrypted, but with an empty user password :) (AKA "mild obfuscation" rather than encryption, but hey). Seems to be 128-bit RC4.

I don't remember offhand how PyPDF2 handles this case, but it could be that a .decrypt('') call on the reader is all that is needed.

MartinThoma · 2022-07-17T08:03:56Z

@MatthiasValvekens When I specify PdfReader(stream, password=""), I get PyPDF2.errors.PdfReadError: Wrong password.

MartinThoma · 2022-07-17T08:07:20Z

Seeing that basically every PDF viewer automatically tries the empty password, I think PyPDF2 should do the same. From a users perspective, this is very confusing.

See #1088

pubpub-zz · 2022-07-20T19:25:45Z

Seeing that basically every PDF viewer automatically tries the empty password, I think PyPDF2 should do the same. From a users perspective, this is very confusing.

I agree with your proposal, however, there definitively seems to be a problem with the decoder : I did some test with pdfminer.six and the empty password do work

xilopaint · 2022-07-24T12:28:26Z

Seeing that basically every PDF viewer automatically tries the empty password, I think PyPDF2 should do the same.

Hope to see this implemented soon!

Closes #1088

MartinThoma added is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF workflow-encryption From a users perspective, encryption is the affected feature/workflow PdfReader The PdfReader component is affected labels Jul 10, 2022

MartinThoma added the Has MCVE A minimal, complete and verifiable example helps a lot to debug / understand feature requests label Jul 10, 2022

MartinThoma added a commit that referenced this issue Jul 17, 2022

TST: Add xfail for decryption fail

7362938

See #1088

MartinThoma mentioned this issue Jul 17, 2022

TST: Add xfail for decryption fail #1125

Merged

MartinThoma added a commit that referenced this issue Jul 17, 2022

TST: Add xfail for decryption fail (#1125)

cd87bbb

See #1088

MartinThoma added MCVE in Tests The MCVE was added to PyPDF2 test suite and removed Has MCVE A minimal, complete and verifiable example helps a lot to debug / understand feature requests labels Jul 17, 2022

MartinThoma mentioned this issue Jul 25, 2022

BUG: u_hash in AlgV4.compute_key #1170

Merged

MartinThoma closed this as completed in #1170 Jul 25, 2022

MartinThoma pushed a commit that referenced this issue Jul 25, 2022

BUG: u_hash in AlgV4.compute_key (#1170)

3b73b34

Closes #1088

naourass mentioned this issue Feb 3, 2023

BUG: Fix arabic extraction test #1597

Closed

MartinThoma mentioned this issue Feb 10, 2023

BUG: Text extraction not working with one glyph to char sequence #1620

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PdfReadError: File has not been decrypted #1088

PdfReadError: File has not been decrypted #1088

MartinThoma commented Jul 10, 2022 •

edited

Loading

MartinThoma commented Jul 10, 2022

MartinThoma commented Jul 10, 2022 •

edited

Loading

MatthiasValvekens commented Jul 13, 2022

MartinThoma commented Jul 17, 2022

MartinThoma commented Jul 17, 2022

pubpub-zz commented Jul 20, 2022

xilopaint commented Jul 24, 2022

PdfReadError: File has not been decrypted #1088

PdfReadError: File has not been decrypted #1088

Comments

MartinThoma commented Jul 10, 2022 • edited Loading

Environment

Code + PDF

MartinThoma commented Jul 10, 2022

MartinThoma commented Jul 10, 2022 • edited Loading

MatthiasValvekens commented Jul 13, 2022

MartinThoma commented Jul 17, 2022

MartinThoma commented Jul 17, 2022

pubpub-zz commented Jul 20, 2022

xilopaint commented Jul 24, 2022

MartinThoma commented Jul 10, 2022 •

edited

Loading

MartinThoma commented Jul 10, 2022 •

edited

Loading