Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query - is there a way to bypass security restrictions on a pdf? #53

Closed
Rob1080 opened this issue Jan 8, 2014 · 18 comments
Closed

Query - is there a way to bypass security restrictions on a pdf? #53

Rob1080 opened this issue Jan 8, 2014 · 18 comments
Labels
nf-security Non-functional change: Security

Comments

@Rob1080
Copy link
Contributor

Rob1080 commented Jan 8, 2014

I have a pdf that has security restrictions. I need to merge some content into the secured pdf. I don't need the pdf to be secured after the merge.
When I open the file and check isEncrypted, it returns true.
When I try decrypt with empty string there's a notImplementedError raised "only algorithm code 1 and 2 are supported".

The restrictions on the file are shown below.
restrictions

At the moment, to bypass the restrictions on the file, I print the pdf to images and create a new pdf with those images. This isn't ideal as the file size becomes large and the content isn't as crisp.

Is there a better way?

@mstamy2
Copy link
Collaborator

mstamy2 commented Jan 8, 2014

For PyPDF2 to be able to decrypt a file, you need either the owner password or the user password. However, PyPDF2 does not seem to have the algorithm necessary to decrypt your PDF anyway.

Perhaps you could try saving the document as a new file (if you have permission) in Adobe Reader or similar and then it may be possible for you to access security settings, where you can turn security off.

You may also be able to upload the file to Google Drive. The upload settings should be on 'convert text and images from uploaded PDF'. Once uploaded, you can open the file with Microsoft Word and convert it back to a PDF (with no restrictions).

I haven't had to use any of these methods myself, so I don't know for sure if they'll work, but hopefully you can resolve this. There are also several 3rd party applications that claim to remove restrictions from PDFs.

@Rob1080
Copy link
Contributor Author

Rob1080 commented Jan 11, 2014

Thanks, I need to do this programmatically as I don't always have access to the file (users upload the files). Another service, that is similar to the service I'm building seems to handle the case fine (without converting to images). I'll give the google drive sdk a go and see if I can get this to work.

@Rob1080 Rob1080 closed this as completed Jan 11, 2014
@g-cassie
Copy link

@Rob1080 Did you ever resolve this? I am dealing with the same problem.

@cdrrensc
Copy link

I'd be very interested by a solution too

@g-cassie
Copy link

I did some testing last night with a few secured pdfs. There is another screen you can get in adobe which shows the "Encryption Level". PDFs that show "128-bit AES" in this field raise NotImplementedError as noted above. PDFs that show "128-bit RC4" can be decrypted (using empty string). All pdfs I was working with could be viewed in Preview or Adobe Reader without entering a password.

@cdrrensc
Copy link

Yeah, I have the case of 128b AES encryption so I can view it (in adobe or chrome) but not edit/merge it

@g-cassie
Copy link

I spent a couple hours yesterday trying to hack away at this but wasn't able to make progress. PDF.JS is an open source library that implements support for this. You can see the code at the link below. I think a lot of the code could be replaced by using the PyCrypto library. However there a still some things I don't understand. Namely, how to expand the encryption key to the correct length. And what exactly to decrypt (obviously not the whole file, as some of it is unencrypted).

https://github.com/mozilla/pdf.js/blob/master/src/core/crypto.js

@lrehmann
Copy link

I used qpdf and the following hack-y code to decrypt the documents if PyPDF2 fails at decryption.

import os
import PyPDF2
from PyPDF2 import PdfFileWriter, PdfFileReader
filename=raw_input('\nFilename:')

fp = open(filename)
pdfFile = PdfFileReader(fp)
if pdfFile.isEncrypted:
    try:
        pdfFile.decrypt('')
        print 'File Decrypted (PyPDF2)'
    except:
        command="cp "+filename+" temp.pdf; qpdf --password='' --decrypt temp.pdf "+filename
        os.system(command)
        print 'File Decrypted (qpdf)'
        #re-open the decrypted file
        fp = open(filename)
        pdfFile = PdfFileReader(fp)
else:
    print 'File Not Encrypted'
#dostuff with pdfFile here

Be careful not to accept any user input for the filename for your application using this code, if you do, be sure to sanitize it before os.system executes it.

@claird
Copy link
Contributor

claird commented Oct 8, 2015

Hello,

Thank you for your question, comments, and/or concerns regarding PyPDF2.
All of our PyPDF2 users are very important to us. Unfortunately at this
present time we are in the process of regrouping our office. Your
questions are very important to us but regretfully we will not be able to
address any issues regarding PyPDF2 until later this fall. However, if
this is an urgent matter and you need immediate assistance please respond
to this email and we will try our best to accommodate your needs sooner.
My apologies for the inconvenience. Thank you for your patience.

Sincerely,

Selma Kishwar

Phaseit, Inc.

[email protected]

On Mon, Aug 24, 2015 at 4:11 PM, lrehmann [email protected] wrote:

I used qpdf and the following hack-y code to decrypt the documents if
PyPDF2 fails at decryption.

import os
import PyPDF2
from PyPDF2 import PdfFileWriter, PdfFileReader
filename=raw_input('\nFilename:')

fp = open(filename)
pdfFile = PdfFileReader(fp)
if pdfFile.isEncrypted:
try:
pdfFile.decrypt('')
print 'File Decrypted (PyPDF2)'
except:
command="cp "+filename+" temp.pdf; qpdf --password='' --decrypt temp.pdf "+filename
os.system(command)
print 'File Decrypted (qpdf)'
#re-open the decrypted file
fp = open(filename)
pdfFile = PdfFileReader(fp)
else:
print 'File Not Encrypted'
#dostuff with pdfFile here

Be careful not to accept any user input for the filename for your
application using this code, if you do, be sure to sanitize it before
os.system executes it.


Reply to this email directly or view it on GitHub
#53 (comment).

@claird
Copy link
Contributor

claird commented Oct 8, 2015

Hello,

Thank you for your question, comments, and/or concerns regarding PyPDF2.
All of our PyPDF2 users are very important to us. Unfortunately at this
present time we are in the process of regrouping our office. Your
questions are very important to us but regretfully we will not be able to
address any issues regarding PyPDF2 until later this fall. However, if
this is an urgent matter and you need immediate assistance please respond
to this email and we will try our best to accommodate your needs sooner.
My apologies for the inconvenience. Thank you for your patience.

Sincerely,

Selma Kishwar

Phaseit, Inc.

[email protected]

On Tue, May 26, 2015 at 4:07 PM, Gordon Cassie [email protected]
wrote:

@Rob1080 https://github.com/Rob1080 Did you ever resolve this? I am
dealing with the same problem.


Reply to this email directly or view it on GitHub
#53 (comment).

@ssokolow
Copy link

@lrehmann

command="cp "+filename+" temp.pdf; qpdf --password='' --decrypt temp.pdf "+filename
os.system(command)

Here's a very heavily commented explanation of how (and why) to do that subprocess call safely:

import os, shutil, tempdir

# Things from the subprocess module don't rely on the shell unless you
# explicitly ask for it and can accept a pre-split list of arguments,
# making calling subprocesses much safer.
# (If you really do need to split quoted stuff, use shlex.split() instead)
from subprocess import check_call

# [...]

    # Use try/finally to ensure our cleanup code gets run
    try:
        # There are a lot of ways to mess up creating temporary files in a way
        # that's free of race conditions, so just use mkdtemp() to safely
        # create a temporary folder that only we have permission to work inside
        # (We ask for it to be made in the same folder as filename because /tmp
        #  might be on a different drive, which would make the final overwrite
        #  into a slow "copy and delete" rather than a fast os.rename())
        tempdir = tempfile.mkdtemp(dir=os.path.dirname(filename))

        # I'm not sure if a qpdf failure could leave the file in a halfway
        # state, so have it write to a temporary file instead of reading from one
        temp_out = os.path.join(tempdir, 'qpdf_out.pdf')

        # Avoid the shell when possible and integrate with Python errors
        # (check_call() raises subprocess.CalledProcessError on nonzero exit)
        check_call(['qpdf', "--password=", '--decrypt', filename, temp_out])

        # I'm not sure if a qpdf failure could leave the file in a halfway
        # state, so write to a temporary file and then use os.rename to
        # overwrite the original atomically.
        # (We use shutil.move instead of os.rename so it'll fall back to a copy
        #  operation if the dir= argument to mkdtemp() gets removed)
        shutil.move(temp_out, filename)
        print 'File Decrypted (qpdf)'
    finally:
        # Delete all temporary files
        shutil.rmtree(tempdir)

@unitedkartik
Copy link

Any workaround for windows?

@pythonhacker
Copy link

The qpdf work around looks really useful. Thanks!

@robsco-git
Copy link

robsco-git commented Jan 21, 2019

Here is my workaround inspired by @ssokolow above. My setup requires files to be read from and output as bytes objects in the context of Python 3.6 and PyPDF2==1.26.0. Just thought I'd post it here in case it's useful to anyone:

reader = PdfFileReader(io.BytesIO(document_data))
if reader.isEncrypted:
    try:
        reader.decrypt("")
    except NotImplementedError:
        # Decrypt the PDF with qpdf

        temp_file = tempfile.NamedTemporaryFile(delete=False)
        temp_file.write(document_data)
        temp_file_name = temp_file.name
        temp_file.close()

        decrypted_filename = f"{temp_file_name}.decrypted"

        command = f"qpdf --password= --decrypt {temp_file_name} {decrypted_filename}"

        try:
            status = subprocess.check_call(
                command, shell=True, cwd="/tmp", timeout=300
            )

            with open(decrypted_filename, "rb") as f:
                decrypted_document_data = f.read()

            reader = PdfFileReader(
                io.BytesIO(decrypted_document_data)
            )

        finally:
            os.unlink(temp_file_name)
            try:
                # decrypted_filename may or may not have been created
                os.unlink(decrypted_filename)
            except FileNotFoundError:
                pass

@andrewisplinghoff
Copy link

    temp_dir = tempfile.TemporaryDirectory()
        temp_dir.cleanup()

@robsco-git Thanks, you can further get rid of temp_dir, it is not used anyway in your code.

@robsco-git
Copy link

robsco-git commented Jan 25, 2019

    temp_dir = tempfile.TemporaryDirectory()
        temp_dir.cleanup()

@robsco-git Thanks, you can further get rid of temp_dir, it is not used anyway in your code.

Fixed: must have forgotten to delete that part 😅

@czyzby
Copy link

czyzby commented Apr 11, 2019

The qpdf solution does not work on some PDFs. Even though the qpdf output can be opened with the usual PDF viewers and does not seem to be corrupted in anyway, PyPDF2 fails to parse it correctly. It basically returns gibberish, even though I double checked if the decrypted PDFs are loaded according to the documentation.

@robsco-git
Copy link

robsco-git commented Apr 11, 2019

@czyzby I eventually moved to pikepdf. It uses QPDF at it's core.

polyglot-jones pushed a commit to polyglot-jones/PyPDF2 that referenced this issue Aug 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
nf-security Non-functional change: Security
Projects
None yet
Development

No branches or pull requests