Skip to content

Commit

Permalink
Check for wrong line-endings when getting xref (#635)
Browse files Browse the repository at this point in the history
If we didn't find the `xref` command at the offset specified, then replace Windows `\r\n` line endings with Unix style `\n` and try again. If it succeeds, then edit the line-endings and proceed as normal. Otherwise continue on to the `decodeXrefStream()` method.

Fixes parsing of existing test suite file **/samples/bugs/Issue95_ANSI.pdf** the test for which would normally be passed over because of the `@group linux-only` flag. Remove this flag, as all assertions in the `testDecodeText()` function now resolve as true in any environment.
  • Loading branch information
GreyWyvern authored Aug 24, 2023
1 parent 81482e8 commit 53538eb
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 5 deletions.
17 changes: 15 additions & 2 deletions src/Smalot/PdfParser/RawData/RawDataParser.php
Original file line number Diff line number Diff line change
Expand Up @@ -901,8 +901,15 @@ protected function getXrefData(string $pdfData, int $offset = 0, array $xref = [
// Cross-Reference
$xref = $this->decodeXref($pdfData, $startxref, $xref);
} else {
// Cross-Reference Stream
$xref = $this->decodeXrefStream($pdfData, $startxref, $xref);
// Check if the $pdfData might have the wrong line-endings
$pdfDataUnix = str_replace("\r\n", "\n", $pdfData);
if ($startxref < \strlen($pdfDataUnix) && strpos($pdfDataUnix, 'xref', $startxref) == $startxref) {
// Return Unix-line-ending flag
$xref = ['Unix' => true];
} else {
// Cross-Reference Stream
$xref = $this->decodeXrefStream($pdfData, $startxref, $xref);
}
}
if (empty($xref)) {
throw new \Exception('Unable to find xref');
Expand Down Expand Up @@ -937,6 +944,12 @@ public function parseData(string $data): array
// get xref and trailer data
$xref = $this->getXrefData($pdfData);

// If we found Unix line-endings
if (isset($xref['Unix'])) {
$pdfData = str_replace("\r\n", "\n", $pdfData);
$xref = $this->getXrefData($pdfData);
}

// parse all document objects
$objects = [];
foreach ($xref['xref'] as $obj => $offset) {
Expand Down
3 changes: 0 additions & 3 deletions tests/PHPUnit/Integration/FontTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -294,9 +294,6 @@ public function testDecodeUnicode(): void
$this->assertEquals('AB', Font::decodeUnicode("\xFE\xFF\x00A\x00B"));
}

/**
* @group linux-only
*/
public function testDecodeText(): void
{
$filename = $this->rootDir.'/samples/Document1_pdfcreator_nocompressed.pdf';
Expand Down

0 comments on commit 53538eb

Please sign in to comment.