Skip to content

Commit

Permalink
Additional changes for #488
Browse files Browse the repository at this point in the history
doc/Usage.md:

  - Moved description of `setIgnoreEncryption` option to doc/CustomConfig.md
  - Added brief "PDF encryption" section

doc/CustomConfig.md: added `setIgnoreEncryption` option and section to describe it.

src/Smalot/PdfParser/Config.php: Doc comment for Config::setIgnoreEncryption()

Added tests/PHPUnit/Integration/EncryptionTest.php

Added samples/not_really_encrypted.pdf (thanks to @parijke who
orginially created this as test.pdf)

See #653
  • Loading branch information
unixnut committed Nov 22, 2023
1 parent 44916ca commit a513ccb
Show file tree
Hide file tree
Showing 5 changed files with 92 additions and 11 deletions.
15 changes: 15 additions & 0 deletions doc/CustomConfig.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ The `Config` class has the following options:
|--------------------------|---------|-----------------|------------------------------------------------------------------------------------------------------------------------------------------------------|
| `setDecodeMemoryLimit` | Integer | `0` | If parsing fails because of memory exhaustion, you can set a lower memory limit for decoding operations. |
| `setFontSpaceLimit` | Integer | `-50` | Changing font space limit can be helpful when `Parser::getText()` returns a text with too many spaces. |
| `setIgnoreEncryption` | Boolean | `false` | Read PDFs that are not encrypted but have the encryption flag set. |
| `setHorizontalOffset` | String | ` ` | When words are broken up or when the structure of a table is not preserved, you may get better results when adapting `setHorizontalOffset`. |
| `setPdfWhitespaces` | String | `\0\t\n\f\r ` | |
| `setPdfWhitespacesRegex` | String | `[\0\t\n\f\r ]` | |
Expand Down Expand Up @@ -63,3 +64,17 @@ $config->setFontSpaceLimit(-60);
$parser = new \Smalot\PdfParser\Parser([], $config);
$pdf = $parser->parseFile('document.pdf');
```

## option setIgnoreEncryption

In some cases PDF files may be internally marked as encrypted even though the content is not encrypted and can be read.
This can be caused by the PDF being created by a tool that does not properly set the encryption flag.
If you are sure that the PDF is not encrypted, you can ignore the encryption flag by setting the `ignoreEncryption` flag to `true` in the `Config` object.

```php
$config = new \Smalot\PdfParser\Config();
$config->setIgnoreEncryption(true);

$parser = new \Smalot\PdfParser\Parser([], $config);
$pdf = $parser->parseFile('document.pdf');
```
19 changes: 8 additions & 11 deletions doc/Usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -230,16 +230,13 @@ foreach ($pages as $page) {
}
```

## Ignoring PDF encryption
## PDF encryption

In some cases PDF files may be internally marked as encrypted even though the content is not encrypted and can be read.
This can be caused by the PDF being created by a tool that does not properly set the encryption flag.
If you are sure that the PDF is not encrypted, you can ignore the encryption flag by setting the `ignoreEncryption` flag to `true` in the `Config` object.

```php
$config = new \Smalot\PdfParser\Config();
$config->setIgnoreEncryption(true);

$parser = new \Smalot\PdfParser\Parser([], $config);
$pdf = $parser->parseFile('document.pdf');
This library cannot currently read encrypted PDF files, i.e. those with
a read password. Attempting to do so produces this error:
```
Exception: Secured pdf file are currently not supported.
```

See `setIgnoreEncryption` option in [CustomConfig.md](CustomConfig.md)
for how to override the check in specific cases.
Binary file added samples/not_really_encrypted.pdf
Binary file not shown.
5 changes: 5 additions & 0 deletions src/Smalot/PdfParser/Config.php
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,11 @@ public function getIgnoreEncryption(): bool
return $this->ignoreEncryption;
}

/**
* @warning This is a workaround, don't rely on it, may change in the
* future. Further information is in the following PR:
* https://github.com/smalot/pdfparser/pull/653
*/
public function setIgnoreEncryption(bool $ignoreEncryption): void
{
$this->ignoreEncryption = $ignoreEncryption;
Expand Down
64 changes: 64 additions & 0 deletions tests/PHPUnit/Integration/EncryptionTest.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
<?php

/**
* @file This file is part of the PdfParser library.
*
* @author Alastair Irvine <[email protected]>
*
* @date 2023-11-22
*
* @license LGPLv3
*
* @url <https://github.com/smalot/pdfparser>
*
* PdfParser is a pdf library written in PHP, extraction oriented.
* Copyright (C) 2017 - Sébastien MALOT <[email protected]>
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU Lesser General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public License
* along with this program.
* If not, see <http://www.pdfparser.org/sites/default/LICENSE.txt>.
*/

namespace PHPUnitTests\Integration;

use PHPUnitTests\TestCase;
use Smalot\PdfParser\Config;
use Smalot\PdfParser\Parser;

class EncryptionTest extends TestCase
{
public function testNoIgnoreEncryption(): void
{
$parser = new Parser();

$filename = $this->rootDir.'/samples/not_really_encrypted.pdf';
$threw = false;
try {
$document = $parser->parseFile($filename);
} catch (\Exception $e) {
$threw = true;
}
$this->assertTrue($threw);
}

public function testIgnoreEncryption(): void
{
$config = new Config();
$config->setIgnoreEncryption(true);
$parser = new Parser([], $config);

$filename = $this->rootDir.'/samples/not_really_encrypted.pdf';
$document = $parser->parseFile($filename);
$this->assertTrue(true);
}
}

0 comments on commit a513ccb

Please sign in to comment.