-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[api-minor] Change the "dc:creator" Metadata field to an Array #12838
Conversation
calixteman
commented
Jan 8, 2021
•
edited
Loading
edited
- add scripting support for doc.info.authors
- doc.info.metadata is the raw string with xml code
- aims to fix issue Doc.info in scripting API being currently implemented is not spec-compliant #12619
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking API-change, as evident by the updated unit-tests; please note that it's not correct to just change an existing API in this way!
What's the correct way to break an api ? |
Can you please link relevant section of the XMP specification here? Given that what we currently do is apparently wrong, the question here I suppose is how often we expect that the Metadata is used by third-party users and how badly we'd break them if we simply made the change? Thinking about this a bit, my hunch would be that the Metadata probably isn't depended upon a whole lot in practice; hence we might be OK with this small breaking changing if we clearly label it as such (so that it's visible in the release notes). /cc @timvandermeij What's your opinion, can we change the existing "dc:creator" Metadata field in this way? |
https://wwwimages2.adobe.com/content/dam/acom/en/devnet/xmp/pdfs/XMP%20SDK%20Release%20cc-2016-08/XMPSpecificationPart1.pdf#page=33
+1
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might be OK with changing the format of "dc:creator", see #12838 (comment)
However, I've also added a couple of comments on the overall implementation here.
src/display/api.js
Outdated
@@ -2705,6 +2705,7 @@ class WorkerTransport { | |||
.then(results => { | |||
return { | |||
info: results[0], | |||
rawMetadata: results[1], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please remove this property, and instead do something like this:
- Add
this._data = data;
after linepdf.js/src/display/metadata.js
Line 34 in c0a6d6c
} - Add a new method, e.g.
after line
getRaw() { return this._data; }
pdf.js/src/display/metadata.js
Line 126 in c0a6d6c
}
src/display/metadata.js
Outdated
if (desc.childNodes[j].nodeName.toLowerCase() !== "#text") { | ||
const entry = desc.childNodes[j]; | ||
const entry = desc.childNodes[j]; | ||
if (entry.nodeName.toLowerCase() !== "#text") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the overall indentation level and the size of the code below, things would be slightly more readable with this condition inverted like so:
if (entry.nodeName.toLowerCase() !== "#text") { | |
if (entry.nodeName.toLowerCase() === "#text") { | |
continue; | |
} |
web/app.js
Outdated
@@ -1765,6 +1767,7 @@ const PDFViewerApplication = { | |||
return; // The document was closed while the metadata resolved. | |||
} | |||
this.documentInfo = info; | |||
this.rawMetadata = rawMetadata; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the suggested changes in src/display/api.js
and src/display/metadata.js
, this is no longer needed.
web/app.js
Outdated
@@ -1655,7 +1655,8 @@ const PDFViewerApplication = { | |||
baseURL: this.baseUrl, | |||
filesize: this._contentLength, | |||
filename: this._docFilename, | |||
metadata: this.metadata, | |||
metadata: this.rawMetadata, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the suggested changes in src/display/api.js
and src/display/metadata.js
, this needs to be
metadata: this.rawMetadata, | |
metadata: this.metadata?.getRaw(), |
web/app.js
Outdated
@@ -1655,7 +1655,8 @@ const PDFViewerApplication = { | |||
baseURL: this.baseUrl, | |||
filesize: this._contentLength, | |||
filename: this._docFilename, | |||
metadata: this.metadata, | |||
metadata: this.rawMetadata, | |||
authors: this.metadata.get("authors") || null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this.metadata
can be null, hence this can throw!
Furthermore, the get
method already fallback to returning null
.
All in all, this line should read
authors: this.metadata.get("authors") || null, | |
authors: this.metadata?.get("authors"), |
If I'm reading https://wwwimages2.adobe.com/content/dam/acom/en/devnet/xmp/pdfs/XMP%20SDK%20Release%20cc-2016-08/XMPSpecificationPart1.pdf#page=33 correctly, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the final comments addressed, this looks good to me :-)
src/display/metadata.js
Outdated
const nodeName = rdf ? rdf.nodeName : null; | ||
if (!rdf || nodeName !== "rdf:rdf" || !rdf.hasChildNodes()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the lower-casing now happening in the XML-parser, you can inline the nodeName
check here instead:
const nodeName = rdf ? rdf.nodeName : null; | |
if (!rdf || nodeName !== "rdf:rdf" || !rdf.hasChildNodes()) { | |
if (!rdf || rdf.nodeName !== "rdf:rdf" || !rdf.hasChildNodes()) { |
test/test_manifest.json
Outdated
{ "id": "js-authors", | ||
"file": "pdfs/js-authors.pdf", | ||
"md5": "e892f290ae209286a29d3c17dfeab226", | ||
"rounds": 1, | ||
"type": "eq" | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much value does this eq
test bring, since the functionality that you're adding here doesn't really show up in a reference-test as far as I can tell?
- add scripting support for doc.info.authors - doc.info.metadata is the raw string with xml code
/botio integrationtest |
From: Bot.io (Linux m4)ReceivedCommand cmd_integrationtest from @calixteman received. Current queue size: 0 Live output at: http://54.67.70.0:8877/662eeeee2cf4552/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_integrationtest from @calixteman received. Current queue size: 0 Live output at: http://3.101.106.178:8877/198062624320a24/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/662eeeee2cf4552/output.txt Total script time: 3.01 mins
|
From: Bot.io (Windows)SuccessFull output at http://3.101.106.178:8877/198062624320a24/output.txt Total script time: 3.81 mins
|