Skip to content

Commit

Permalink
If no dc:format XMP tag, merge metadata
Browse files Browse the repository at this point in the history
Previously `extractXMPMetadata()` would check for the existence of a `dc:format` tag with an `application/pdf` MIME-type value before allowing found XMP metadata to be merged with the other document details.

If the tag doesn't exist, merge the metadata anyway. If it _does_ exist, _then_ check to see if it has the `application/pdf` MIME-type.
  • Loading branch information
GreyWyvern committed Jun 24, 2024
1 parent a3e213d commit f28160e
Show file tree
Hide file tree
Showing 2 changed files with 32 additions and 1 deletion.
2 changes: 1 addition & 1 deletion src/Smalot/PdfParser/Document.php
Original file line number Diff line number Diff line change
Expand Up @@ -287,7 +287,7 @@ public function extractXMPMetadata(string $content): void
}

// Only use this metadata if it's referring to a PDF
if (isset($metadata['dc:format']) && 'application/pdf' == $metadata['dc:format']) {
if (!isset($metadata['dc:format']) || 'application/pdf' == $metadata['dc:format']) {
// According to the XMP specifications: 'Conflict resolution
// for separate packets that describe the same resource is
// beyond the scope of this document.' - Section 6.1
Expand Down
31 changes: 31 additions & 0 deletions tests/PHPUnit/Integration/DocumentTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -232,4 +232,35 @@ public function testGetPagesMissingCatalog(): void
$document = $this->getDocumentInstance();
$document->getPages();
}

public function testExtractXMPMetadata(): void
{
$document = $this->getDocumentInstance();

// Check that XMP metadata is parsed even if missing a
// dc:format tag
// See: https://github.com/smalot/pdfparser/issues/721
$content = '<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.6-c015 84.159810, 2016/09/10-02:41:30">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description>
<dc:creator>
<rdf:Seq>
<rdf:li>PdfParser</rdf:li>
</rdf:Seq>
</dc:creator>
<xmp:CreateDate>2018-02-07T11:51:44-05:00</xmp:CreateDate>
<xmp:ModifyDate>2019-10-23T09:56:01-04:00</xmp:ModifyDate>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>';

$document->extractXMPMetadata($content);
$document->init();
$details = $document->getDetails();

$this->assertEquals(4, \count($details));
$this->assertEquals('PdfParser', $details['dc:creator']);
$this->assertEquals('2019-10-23T09:56:01-04:00', $details['xmp:modifydate']);
}
}

0 comments on commit f28160e

Please sign in to comment.