Skip to content

Commit

Permalink
revived #257: Properly decode ANSI encodings (#349)
Browse files Browse the repository at this point in the history
* Get correct default font

* Create header elements with it's respective class

* Properly decode ANSI encodings

* allow for line breaks when splitting xrefs for id and position

* extend TestCase.php with functionality to "catch" E_NOTICE and E_WARNING

* added test case for this fix

* only reset error handler when the current handler is the handler we had set before

* work around for failing CI build with PHP 5.6

* added comment and link to the workaround getting the current error handler

* removed unnecessary ini_set call

* remove error level constant name before error message

* restore error from the error handler itself, to prevent PHPUnit's "THE ERROR HANDLER HAS CHANGED!" message

* reverse the changes made to the TestCase class and the code in the test case depending on it

* simplified test case, now checking if object has been parsed correctly

* code linting

* applied linting

* handle failed font lookup

* look up unfiltered font resource name first, then fall back to filtered resource name

* added unit test for #202 bugfix, code linting

* fallback for decoding single-byte ANSI characters that are not in the lookup table

* added test file and unit test for international unicode characters

* don't double-encode strings already in UTF-8

* code linting

* removed remnants from old decodeContent() function signature

* parseHeaderElement() should not return a PDFObject

* some minor changes as requested by the review

* keep $unicode as deprecated parameter in decodeContent function signature

* forgot to add default value for $unicode to make it optional

* added proper doc blocks to PostScriptGlyphs.php

* return array from PostScriptGlyphs::getGlyphs() directly instead of using and parsing JSON

* changed @deprecated to parameter description

Co-authored-by: Dāvis Mosāns <[email protected]>
  • Loading branch information
Connum and Dāvis Mosāns authored Oct 8, 2020
1 parent fdbbb5c commit 1f4056d
Show file tree
Hide file tree
Showing 11 changed files with 1,231 additions and 98 deletions.
Binary file added samples/InternationalChars.pdf
Binary file not shown.
Binary file added samples/bugs/Issue202.pdf
Binary file not shown.
1 change: 1 addition & 0 deletions src/Smalot/PdfParser/Document.php
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ public function init()

// Propagate init to objects.
foreach ($this->objects as $object) {
$object->getHeader()->init();
$object->init();
}
}
Expand Down
9 changes: 4 additions & 5 deletions src/Smalot/PdfParser/Encoding.php
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
namespace Smalot\PdfParser;

use Smalot\PdfParser\Element\ElementNumeric;
use Smalot\PdfParser\Encoding\PostScriptGlyphs;

/**
* Class Encoding
Expand Down Expand Up @@ -95,12 +96,10 @@ public function init()
++$code;
}

// Build final mapping (custom => standard).
$table = array_flip(array_reverse($this->encoding, true));

$this->mapping = $this->encoding;
foreach ($this->differences as $code => $difference) {
/* @var string $difference */
$this->mapping[$code] = (isset($table[$difference]) ? $table[$difference] : Font::MISSING);
$this->mapping[$code] = $difference;
}
}
}
Expand Down Expand Up @@ -129,6 +128,6 @@ public function translateChar($dec)
$dec = $this->mapping[$dec];
}

return $dec;
return PostScriptGlyphs::getCodePoint($dec);
}
}
Loading

0 comments on commit 1f4056d

Please sign in to comment.