-
Notifications
You must be signed in to change notification settings - Fork 539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
calculateTextWidth throws an error for some fonts #570
Comments
Can you provide us the PDF? |
Unfortunately the files are Bank Statements, I will need to find a way to remove the elements with sensitive information. Is there other information about the font I could provide in the meantime? |
The most helpful would be PHP exploit code which triggers the error. In the following (untested) a rough example. Please have a look. /*
* $elements must contain faulty data to trigger the error.
* $header->getDetails() is used inside "calculateTextWidth".
* If it doesnt return an array with key "Widths", the error occur.
*
* You can build $elements yourself or you place var_dump near
* https://github.com/smalot/pdfparser/blob/master/src/Smalot/PdfParser/Font.php#L278
* and use that.
*/
$elements = [
'Name' => ''...',
'Type' => '...',
'Encoding' => '...',
// 'Widths' => '...' <=== must be missing to trigger the error
];
$header = new Header($elements);
$font = new Font(new Document(), $header);
$font->calculateTextWidth('', null); // call this to trigger error |
@benlongstaff Ignore my last comment. I realized it is more a basement for a unit test to first trigger the error and after fixing it, make sure it doesn't happen again. We would need two things fix it:
Would you take the time and prepare a pull request? I will lead/assist you until its merged. Does PDF specification allows no |
Fortunately, the PDF for issue #592 has this font-without-Width problem as well and we already have permission to use it. /samples/bugs/Issue592.pdf The key thing is, what do we want PdfParser to do in this case? Return zero (0)? Something like (in Font.php): /**
* Calculate text width with data from header 'Widths'. If width of character is not found then character is added to missing array.
*/
public function calculateTextWidth(string $text, array &$missing = null): ?float
{
$index_map = array_flip($this->table);
$details = $this->getDetails();
// If 'Widths' is not defined for this font, return 0
// See: https://github.com/smalot/pdfparser/issues/570
if (!isset($details['Widths'])) return 0;
$widths = $details['Widths'];
... |
I suggest |
This function doesn't seem to be used by any other function in PdfParser after running a quick search, so I think returning |
I also have the same issue : font with no Widths that generates a PHP Notice and fail to calculate text width. The following code, triggers the PHP Notice using the mentioned PDF sample. <?php
require_once __DIR__.'/pdfparser/alt_autoload.php-dist';
$config = new \Smalot\PdfParser\Config();
$config->setDataTmFontInfoHasToBeIncluded(true);
$parser = new \Smalot\PdfParser\Parser(array(), $config);
$pdf = $parser->parseFile('/tmp/doc.pdf');
$pages = $pdf->getPages();
$lastpage = end($pages);
$data = $lastpage->getDataTm();
echo "Items:".PHP_EOL;
$current_text = null;
foreach($data as $item) {
if(is_array($item)) {
$text = $item[1];
if ($text != $current_text) {
echo "- '$text'".PHP_EOL;
$font = $lastpage->getFont($item[2]);
echo " font: ".$font->getName()." (".$font->getType().")"." size: ".$item[3].PHP_EOL;
$missing = array();
echo " text width: ".$font->calculateTextWidth($text, $missing)." (missing: ".implode(',', $missing).")".PHP_EOL;
$current_text = $text;
}
}
} PS: this code needs the fix of the issue #629 in order to detect the font properly Is there something I can do when generating the PDF to fix this issue in the PDF ? I have (a little) control over the PDF generation. I am mainly interested in making text width calculation works rather than preventing a PHP Notice. Thank again for you software and contributors. |
Undefined array key "Widths"
in vendor/smalot/pdfparser/src/Smalot/PdfParser/Font.php:279
Not all fonts have the widths data in the font header e.g. $font->getDetails() returns.
vs the expected
The text was updated successfully, but these errors were encountered: