Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyCryptoDome padding issue, AES encryption CBC mode #1221

Closed
bchandos opened this issue Aug 10, 2022 · 5 comments · Fixed by #1469
Closed

PyCryptoDome padding issue, AES encryption CBC mode #1221

bchandos opened this issue Aug 10, 2022 · 5 comments · Fixed by #1469
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF PdfReader The PdfReader component is affected workflow-encryption From a users perspective, encryption is the affected feature/workflow

Comments

@bchandos
Copy link

This is not a fully qualified bug report, because I lack a reproducible example for a number of reasons. However, I originally posted in issue 416 about how I was experiencing decryption issue with a file generated by the Acrobat Sign product. It is a v1.7 PDF, with AES 128-bit encryption. That issue notes a merged fix (#1015).

I downloaded PyPDF2 2.10.0, which then gave a new error about missing PyCryptoDome, which I then installed (v3.15.0). Running my test case again, I received the following error:

ValueError: Data must be padded to 16 byte boundary in CBC mode

I know little about PDF specs, and even less about encryption, however using the PyCryptoDome docs I did find that the following code addition to _encryption.py alleviated this issue for my file:

86a87,89
>             if len(data) % 16:
>                 from Crypto.Util.Padding import pad
>                 data = pad(data, 16)

Again, lacking a reproducible test case I'm not sure how useful this is but wanted to share my findings in case someone with access to Acrobat Sign can generate a file that demonstrates the same behavior.

@MartinThoma
Copy link
Member

Thank you for sharing ❤️

@MartinThoma
Copy link
Member

I did find that the following code addition to _encryption.py alleviated this issue for my file:

So the mentioned lines fixed your problem? You could read the decrypted file properly?

@MartinThoma MartinThoma added is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF PdfReader The PdfReader component is affected labels Aug 10, 2022
@MartinThoma
Copy link
Member

The code you use is something like this, I guess:

from PyPDF2 import PdfReader

reader = PdfReader("private-and-encrypted.pdf", password="example")
print(reader.extract_text()) 

@bchandos
Copy link
Author

So the mentioned lines fixed your problem? You could read the decrypted file properly?

That is correct. PdfFileMerger() can now successfully decrypt the file and merge it with others, which is my only need. (By which I mean, I have no want or need for encryption - it was a byproduct of a user-submitted file - and so I only care that PyPDF2 can process it and output a viewable PDF.)

@bchandos
Copy link
Author

bchandos commented Aug 10, 2022

The code you use is something like this, I guess:

from PyPDF2 import PdfReader

reader = PdfReader("private-and-encrypted.pdf", password="example")
print(reader.extract_text()) 

Here is my full test code, FWIW. Again, I'm sorry I can't provide the file - it contains PII and I don't have access to Acrobat Sign to try to generate an example. The file doesn't require a password and can be opened in desktop reader software. It was only PyPDF2 that was having a problem with it.

import io
import PyPDF2

file_list = ['test_pdfs/encrypted_pdf.pdf']

merger = PyPDF2.PdfFileMerger()
for f in file_list:
    with open (f, 'rb') as p:
        merger.append(fileobj=p)
output = io.BytesIO()
merger.write(output)
output.seek(0)
with open('test_output.pdf', 'wb') as o:
    o.write(output.read())

@MartinThoma MartinThoma added the workflow-encryption From a users perspective, encryption is the affected feature/workflow label Sep 14, 2022
alper111 added a commit to alper111/PyPDF2 that referenced this issue Nov 11, 2022
See py-pdf#1221 for details.
MartinThoma added a commit that referenced this issue Dec 4, 2022
Fixes #1221

Credit goes to Alper ahmetoglu for the fix

Co-authored-by: Alper Ahmetoglu <[email protected]>
MartinThoma added a commit that referenced this issue Dec 10, 2022
Fixes #1221

Credit goes to Alper Ahmetoglu for the fix

Co-authored-by: Alper Ahmetoglu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF PdfReader The PdfReader component is affected workflow-encryption From a users perspective, encryption is the affected feature/workflow
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants