-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG + 1] Add basic support for encrypted PDF files #180
Conversation
Update API and CLI to accept ASCII passwords to decrypt PDFs encrypted by algorithm code 1 or 2 (limited by support from PyPDF2). Update documentation and unit tests accordingly. Example document health_protected.pdf generated as follows: qpdf --encrypt userpass ownerpass 128 -- health.pdf health_protected.pdf Issue atlanhq#162
# Conflicts: # camelot/handlers.py # tests/test_cli.py
Codecov Report
@@ Coverage Diff @@
## master #180 +/- ##
==========================================
- Coverage 84.17% 84.12% -0.05%
==========================================
Files 13 13
Lines 1226 1235 +9
Branches 294 297 +3
==========================================
+ Hits 1032 1039 +7
- Misses 150 151 +1
- Partials 44 45 +1
Continue to review full report at Codecov.
|
@rbares This is a great PR! Apart from the changes I requested, LGTM. |
Explicitly check passwords for None rather than falsey. Correct read_pdf documentation for Owner/User password. Issue atlanhq#162
@@ -83,8 +92,8 @@ def _save_page(self, filename, page, temp): | |||
""" | |||
with open(filename, 'rb') as fileobj: | |||
infile = PdfFileReader(fileobj, strict=False) | |||
if infile.isEncrypted: | |||
infile.decrypt('') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rbares Should we change the default value for password to an empty string for this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And remove the is not None
check whenever infile.isEncrypted
is True?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha! Was initially unsure why .decrypt('')
was being used. I see from py-pdf/pypdf#51 that there are situations where normal PDFs are detected as encrypted, so defaulting to empty string seems sensible. It seems more awkward workarounds are still needed in some cases: py-pdf/pypdf#245 , but that feeds into the discussion around which library to use going forward.
Attempting to decrypt with None leads to TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
rather than the more helpful PdfReadError: File has not been decrypted
, so I would be inclined to keep the None checks throughout.
Are you happy to change the defaults or shall I?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attempting to decrypt with None leads to TypeError: unsupported operand type(s) for +: 'NoneType' and 'str' rather than the more helpful PdfReadError: File has not been decrypted, so I would be inclined to keep the None checks throughout.
But the TypeError
won't be there if we change the default value of password from None
to an empty string right? I can change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless someone passes it through explicitly to mean no password is provided, which could be seen as cleaner use of an API. Would almost be nicer to change None to empty string if encrypted rather than making that visible at the interface, but will defer to your judgement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is what I was proposing, I think password=None
at the interface level makes sense, we can set password to an empty string when None is passed. Changing it.
self.filename = filename | ||
if not filename.lower().endswith('.pdf'): | ||
raise NotImplementedError("File format not supported") | ||
self.pages = self._get_pages(self.filename, pages) | ||
if password is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rbares What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
6ed532c
to
8c7c83b
Compare
@rbares Thanks for the contribution! |
@vinayak-mehta my pleasure, thanks for review! |
Update API and CLI to accept ASCII passwords to decrypt PDFs
encrypted by algorithm code 1 or 2 (limited by support from PyPDF2).
Update documentation and unit tests accordingly.
Example document health_protected.pdf generated as follows:
qpdf --encrypt userpass ownerpass 128 -- health.pdf health_protected.pdf
Partially resolves issue #162, may want to keep open to discuss additional support