Query - is there a way to bypass security restrictions on a pdf? #53

Rob1080 · 2014-01-08T11:17:58Z

I have a pdf that has security restrictions. I need to merge some content into the secured pdf. I don't need the pdf to be secured after the merge.
When I open the file and check isEncrypted, it returns true.
When I try decrypt with empty string there's a notImplementedError raised "only algorithm code 1 and 2 are supported".

The restrictions on the file are shown below.

At the moment, to bypass the restrictions on the file, I print the pdf to images and create a new pdf with those images. This isn't ideal as the file size becomes large and the content isn't as crisp.

Is there a better way?

mstamy2 · 2014-01-08T22:48:49Z

For PyPDF2 to be able to decrypt a file, you need either the owner password or the user password. However, PyPDF2 does not seem to have the algorithm necessary to decrypt your PDF anyway.

Perhaps you could try saving the document as a new file (if you have permission) in Adobe Reader or similar and then it may be possible for you to access security settings, where you can turn security off.

You may also be able to upload the file to Google Drive. The upload settings should be on 'convert text and images from uploaded PDF'. Once uploaded, you can open the file with Microsoft Word and convert it back to a PDF (with no restrictions).

I haven't had to use any of these methods myself, so I don't know for sure if they'll work, but hopefully you can resolve this. There are also several 3rd party applications that claim to remove restrictions from PDFs.

Rob1080 · 2014-01-11T10:50:15Z

Thanks, I need to do this programmatically as I don't always have access to the file (users upload the files). Another service, that is similar to the service I'm building seems to handle the case fine (without converting to images). I'll give the google drive sdk a go and see if I can get this to work.

g-cassie · 2015-05-26T21:07:16Z

@Rob1080 Did you ever resolve this? I am dealing with the same problem.

cdrrensc · 2015-05-27T11:48:02Z

I'd be very interested by a solution too

g-cassie · 2015-05-27T14:50:22Z

I did some testing last night with a few secured pdfs. There is another screen you can get in adobe which shows the "Encryption Level". PDFs that show "128-bit AES" in this field raise NotImplementedError as noted above. PDFs that show "128-bit RC4" can be decrypted (using empty string). All pdfs I was working with could be viewed in Preview or Adobe Reader without entering a password.

cdrrensc · 2015-05-27T14:56:12Z

Yeah, I have the case of 128b AES encryption so I can view it (in adobe or chrome) but not edit/merge it

g-cassie · 2015-05-28T13:18:01Z

I spent a couple hours yesterday trying to hack away at this but wasn't able to make progress. PDF.JS is an open source library that implements support for this. You can see the code at the link below. I think a lot of the code could be replaced by using the PyCrypto library. However there a still some things I don't understand. Namely, how to expand the encryption key to the correct length. And what exactly to decrypt (obviously not the whole file, as some of it is unencrypted).

https://github.com/mozilla/pdf.js/blob/master/src/core/crypto.js

lrehmann · 2015-08-24T21:11:31Z

I used qpdf and the following hack-y code to decrypt the documents if PyPDF2 fails at decryption.

import os
import PyPDF2
from PyPDF2 import PdfFileWriter, PdfFileReader
filename=raw_input('\nFilename:')

fp = open(filename)
pdfFile = PdfFileReader(fp)
if pdfFile.isEncrypted:
    try:
        pdfFile.decrypt('')
        print 'File Decrypted (PyPDF2)'
    except:
        command="cp "+filename+" temp.pdf; qpdf --password='' --decrypt temp.pdf "+filename
        os.system(command)
        print 'File Decrypted (qpdf)'
        #re-open the decrypted file
        fp = open(filename)
        pdfFile = PdfFileReader(fp)
else:
    print 'File Not Encrypted'
#dostuff with pdfFile here

Be careful not to accept any user input for the filename for your application using this code, if you do, be sure to sanitize it before os.system executes it.

claird · 2015-10-08T17:37:08Z

Hello,

Thank you for your question, comments, and/or concerns regarding PyPDF2.
All of our PyPDF2 users are very important to us. Unfortunately at this
present time we are in the process of regrouping our office. Your
questions are very important to us but regretfully we will not be able to
address any issues regarding PyPDF2 until later this fall. However, if
this is an urgent matter and you need immediate assistance please respond
to this email and we will try our best to accommodate your needs sooner.
My apologies for the inconvenience. Thank you for your patience.

Sincerely,

Selma Kishwar

Phaseit, Inc.

[email protected]

On Mon, Aug 24, 2015 at 4:11 PM, lrehmann [email protected] wrote:

I used qpdf and the following hack-y code to decrypt the documents if
PyPDF2 fails at decryption.

import os
import PyPDF2
from PyPDF2 import PdfFileWriter, PdfFileReader
filename=raw_input('\nFilename:')

fp = open(filename)
pdfFile = PdfFileReader(fp)
if pdfFile.isEncrypted:
try:
pdfFile.decrypt('')
print 'File Decrypted (PyPDF2)'
except:
command="cp "+filename+" temp.pdf; qpdf --password='' --decrypt temp.pdf "+filename
os.system(command)
print 'File Decrypted (qpdf)'
#re-open the decrypted file
fp = open(filename)
pdfFile = PdfFileReader(fp)
else:
print 'File Not Encrypted'
#dostuff with pdfFile here

Be careful not to accept any user input for the filename for your
application using this code, if you do, be sure to sanitize it before
os.system executes it.

—
Reply to this email directly or view it on GitHub
#53 (comment).

claird · 2015-10-08T17:46:09Z

Hello,

Thank you for your question, comments, and/or concerns regarding PyPDF2.
All of our PyPDF2 users are very important to us. Unfortunately at this
present time we are in the process of regrouping our office. Your
questions are very important to us but regretfully we will not be able to
address any issues regarding PyPDF2 until later this fall. However, if
this is an urgent matter and you need immediate assistance please respond
to this email and we will try our best to accommodate your needs sooner.
My apologies for the inconvenience. Thank you for your patience.

Sincerely,

Selma Kishwar

Phaseit, Inc.

[email protected]

On Tue, May 26, 2015 at 4:07 PM, Gordon Cassie [email protected]
wrote:

@Rob1080 https://github.com/Rob1080 Did you ever resolve this? I am
dealing with the same problem.

—
Reply to this email directly or view it on GitHub
#53 (comment).

ssokolow · 2017-01-17T03:19:29Z

@lrehmann

command="cp "+filename+" temp.pdf; qpdf --password='' --decrypt temp.pdf "+filename
os.system(command)

Here's a very heavily commented explanation of how (and why) to do that subprocess call safely:

import os, shutil, tempdir

# Things from the subprocess module don't rely on the shell unless you
# explicitly ask for it and can accept a pre-split list of arguments,
# making calling subprocesses much safer.
# (If you really do need to split quoted stuff, use shlex.split() instead)
from subprocess import check_call

# [...]

    # Use try/finally to ensure our cleanup code gets run
    try:
        # There are a lot of ways to mess up creating temporary files in a way
        # that's free of race conditions, so just use mkdtemp() to safely
        # create a temporary folder that only we have permission to work inside
        # (We ask for it to be made in the same folder as filename because /tmp
        #  might be on a different drive, which would make the final overwrite
        #  into a slow "copy and delete" rather than a fast os.rename())
        tempdir = tempfile.mkdtemp(dir=os.path.dirname(filename))

        # I'm not sure if a qpdf failure could leave the file in a halfway
        # state, so have it write to a temporary file instead of reading from one
        temp_out = os.path.join(tempdir, 'qpdf_out.pdf')

        # Avoid the shell when possible and integrate with Python errors
        # (check_call() raises subprocess.CalledProcessError on nonzero exit)
        check_call(['qpdf', "--password=", '--decrypt', filename, temp_out])

        # I'm not sure if a qpdf failure could leave the file in a halfway
        # state, so write to a temporary file and then use os.rename to
        # overwrite the original atomically.
        # (We use shutil.move instead of os.rename so it'll fall back to a copy
        #  operation if the dir= argument to mkdtemp() gets removed)
        shutil.move(temp_out, filename)
        print 'File Decrypted (qpdf)'
    finally:
        # Delete all temporary files
        shutil.rmtree(tempdir)

unitedkartik · 2018-02-23T09:09:02Z

Any workaround for windows?

pythonhacker · 2018-07-19T07:42:08Z

The qpdf work around looks really useful. Thanks!

robsco-git · 2019-01-21T09:09:37Z

Here is my workaround inspired by @ssokolow above. My setup requires files to be read from and output as bytes objects in the context of Python 3.6 and PyPDF2==1.26.0. Just thought I'd post it here in case it's useful to anyone:

reader = PdfFileReader(io.BytesIO(document_data))
if reader.isEncrypted:
    try:
        reader.decrypt("")
    except NotImplementedError:
        # Decrypt the PDF with qpdf

        temp_file = tempfile.NamedTemporaryFile(delete=False)
        temp_file.write(document_data)
        temp_file_name = temp_file.name
        temp_file.close()

        decrypted_filename = f"{temp_file_name}.decrypted"

        command = f"qpdf --password= --decrypt {temp_file_name} {decrypted_filename}"

        try:
            status = subprocess.check_call(
                command, shell=True, cwd="/tmp", timeout=300
            )

            with open(decrypted_filename, "rb") as f:
                decrypted_document_data = f.read()

            reader = PdfFileReader(
                io.BytesIO(decrypted_document_data)
            )

        finally:
            os.unlink(temp_file_name)
            try:
                # decrypted_filename may or may not have been created
                os.unlink(decrypted_filename)
            except FileNotFoundError:
                pass

andrewisplinghoff · 2019-01-25T16:29:37Z

    temp_dir = tempfile.TemporaryDirectory()

        temp_dir.cleanup()

@robsco-git Thanks, you can further get rid of temp_dir, it is not used anyway in your code.

robsco-git · 2019-01-25T16:52:23Z

    temp_dir = tempfile.TemporaryDirectory()
        temp_dir.cleanup()
@robsco-git Thanks, you can further get rid of temp_dir, it is not used anyway in your code.

Fixed: must have forgotten to delete that part 😅

czyzby · 2019-04-11T20:50:05Z

The qpdf solution does not work on some PDFs. Even though the qpdf output can be opened with the usual PDF viewers and does not seem to be corrupted in anyway, PyPDF2 fails to parse it correctly. It basically returns gibberish, even though I double checked if the decrypted PDFs are loaded according to the documentation.

robsco-git · 2019-04-11T21:25:18Z

@czyzby I eventually moved to pikepdf. It uses QPDF at it's core.

Rob1080 closed this as completed Jan 11, 2014

gbarnabic mentioned this issue Feb 27, 2016

Not able to open a PDF with security - Adobe opens it fine without password entry #249

Closed

This was referenced Nov 14, 2017

PyPDF2 can't decrypt PDF files with Acrobat 6.0 or higher password security compatibility #378

Closed

PdfReadError: Could not read Boolean object #377

Closed

polyglot-jones pushed a commit to polyglot-jones/PyPDF2 that referenced this issue Aug 11, 2020

Merge pull request py-pdf#53 from kurtmckee/test-filters-more

6115cc1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query - is there a way to bypass security restrictions on a pdf? #53

Query - is there a way to bypass security restrictions on a pdf? #53

Rob1080 commented Jan 8, 2014

mstamy2 commented Jan 8, 2014

Rob1080 commented Jan 11, 2014

g-cassie commented May 26, 2015

cdrrensc commented May 27, 2015

g-cassie commented May 27, 2015

cdrrensc commented May 27, 2015

g-cassie commented May 28, 2015

lrehmann commented Aug 24, 2015

claird commented Oct 8, 2015

claird commented Oct 8, 2015

ssokolow commented Jan 17, 2017

unitedkartik commented Feb 23, 2018

pythonhacker commented Jul 19, 2018

robsco-git commented Jan 21, 2019 •

edited

Loading

andrewisplinghoff commented Jan 25, 2019

robsco-git commented Jan 25, 2019 •

edited

Loading

czyzby commented Apr 11, 2019

robsco-git commented Apr 11, 2019 •

edited

Loading

Query - is there a way to bypass security restrictions on a pdf? #53

Query - is there a way to bypass security restrictions on a pdf? #53

Comments

Rob1080 commented Jan 8, 2014

mstamy2 commented Jan 8, 2014

Rob1080 commented Jan 11, 2014

g-cassie commented May 26, 2015

cdrrensc commented May 27, 2015

g-cassie commented May 27, 2015

cdrrensc commented May 27, 2015

g-cassie commented May 28, 2015

lrehmann commented Aug 24, 2015

claird commented Oct 8, 2015

claird commented Oct 8, 2015

ssokolow commented Jan 17, 2017

unitedkartik commented Feb 23, 2018

pythonhacker commented Jul 19, 2018

robsco-git commented Jan 21, 2019 • edited Loading

andrewisplinghoff commented Jan 25, 2019

robsco-git commented Jan 25, 2019 • edited Loading

czyzby commented Apr 11, 2019

robsco-git commented Apr 11, 2019 • edited Loading

robsco-git commented Jan 21, 2019 •

edited

Loading

robsco-git commented Jan 25, 2019 •

edited

Loading

robsco-git commented Apr 11, 2019 •

edited

Loading