Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[api-minor] Change the "dc:creator" Metadata field to an Array #12838

Merged
merged 1 commit into from
Jan 12, 2021

Conversation

calixteman
Copy link
Contributor

@calixteman calixteman commented Jan 8, 2021

Copy link
Collaborator

@Snuffleupagus Snuffleupagus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking API-change, as evident by the updated unit-tests; please note that it's not correct to just change an existing API in this way!

@calixteman
Copy link
Contributor Author

What's the correct way to break an api ?
Do you prefer I let dc:creator as it worked and so I can add a new property authors for example ?
Anyway according to xmp spec it seems that dc:creator is expected to be an array.

@Snuffleupagus
Copy link
Collaborator

Anyway according to xmp spec it seems that dc:creator is expected to be an array.

Can you please link relevant section of the XMP specification here?
If we're explicitly breaking an existing specification, I suppose that changes things slightly as far as I'm concerned.

Given that what we currently do is apparently wrong, the question here I suppose is how often we expect that the Metadata is used by third-party users and how badly we'd break them if we simply made the change?

Thinking about this a bit, my hunch would be that the Metadata probably isn't depended upon a whole lot in practice; hence we might be OK with this small breaking changing if we clearly label it as such (so that it's visible in the release notes).
I.e. you should update the PR title, and commit message, to start with something like:
[api-minor] Change the "dc:creator" Metadata field to an Array; add scripting support for doc.info.authors

/cc @timvandermeij What's your opinion, can we change the existing "dc:creator" Metadata field in this way?

@calixteman
Copy link
Contributor Author

calixteman commented Jan 9, 2021

Anyway according to xmp spec it seems that dc:creator is expected to be an array.

Can you please link relevant section of the XMP specification here?

https://wwwimages2.adobe.com/content/dam/acom/en/devnet/xmp/pdfs/XMP%20SDK%20Release%20cc-2016-08/XMPSpecificationPart1.pdf#page=33
According to this doc, dc:creator should be a Seq but I got a Bag using Acrobat DC to generate the pdf file in this patch.

If we're explicitly breaking an existing specification, I suppose that changes things slightly as far as I'm concerned.

Given that what we currently do is apparently wrong, the question here I suppose is how often we expect that the Metadata is used by third-party users and how badly we'd break them if we simply made the change?

Thinking about this a bit, my hunch would be that the Metadata probably isn't depended upon a whole lot in practice; hence we might be OK with this small breaking changing if we clearly label it as such (so that it's visible in the release notes).
I.e. you should update the PR title, and commit message, to start with something like:
[api-minor] Change the "dc:creator" Metadata field to an Array; add scripting support for doc.info.authors

+1

/cc @timvandermeij What's your opinion, can we change the existing "dc:creator" Metadata field in this way?

Copy link
Collaborator

@Snuffleupagus Snuffleupagus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might be OK with changing the format of "dc:creator", see #12838 (comment)

However, I've also added a couple of comments on the overall implementation here.

@@ -2705,6 +2705,7 @@ class WorkerTransport {
.then(results => {
return {
info: results[0],
rawMetadata: results[1],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please remove this property, and instead do something like this:

  1. Add this._data = data; after line
  2. Add a new method, e.g.
    getRaw() {
      return this._data;
    }
    after line

if (desc.childNodes[j].nodeName.toLowerCase() !== "#text") {
const entry = desc.childNodes[j];
const entry = desc.childNodes[j];
if (entry.nodeName.toLowerCase() !== "#text") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the overall indentation level and the size of the code below, things would be slightly more readable with this condition inverted like so:

Suggested change
if (entry.nodeName.toLowerCase() !== "#text") {
if (entry.nodeName.toLowerCase() === "#text") {
continue;
}

web/app.js Outdated
@@ -1765,6 +1767,7 @@ const PDFViewerApplication = {
return; // The document was closed while the metadata resolved.
}
this.documentInfo = info;
this.rawMetadata = rawMetadata;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the suggested changes in src/display/api.js and src/display/metadata.js, this is no longer needed.

web/app.js Outdated
@@ -1655,7 +1655,8 @@ const PDFViewerApplication = {
baseURL: this.baseUrl,
filesize: this._contentLength,
filename: this._docFilename,
metadata: this.metadata,
metadata: this.rawMetadata,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the suggested changes in src/display/api.js and src/display/metadata.js, this needs to be

Suggested change
metadata: this.rawMetadata,
metadata: this.metadata?.getRaw(),

web/app.js Outdated
@@ -1655,7 +1655,8 @@ const PDFViewerApplication = {
baseURL: this.baseUrl,
filesize: this._contentLength,
filename: this._docFilename,
metadata: this.metadata,
metadata: this.rawMetadata,
authors: this.metadata.get("authors") || null,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this.metadata can be null, hence this can throw!
Furthermore, the get method already fallback to returning null.

All in all, this line should read

Suggested change
authors: this.metadata.get("authors") || null,
authors: this.metadata?.get("authors"),

@timvandermeij
Copy link
Contributor

If I'm reading https://wwwimages2.adobe.com/content/dam/acom/en/devnet/xmp/pdfs/XMP%20SDK%20Release%20cc-2016-08/XMPSpecificationPart1.pdf#page=33 correctly, dc:creator should indeed be a list (even though the name implies otherwise, since I'd have expected dc:creators in that case, but it's not the first time this happens in the spec), according to the "XMP usage is a list of creators" part. While it's indeed an API change, the behavior before was incorrect, so it's more like a bugfix to me, hence I'm OK with this change but only if it's clearly indicated as [api-minor].

@calixteman calixteman changed the title JS -- support doc.info.authors [api-minor] Change the "dc:creator" Metadata field to an Array Jan 11, 2021
Copy link
Collaborator

@Snuffleupagus Snuffleupagus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the final comments addressed, this looks good to me :-)

Comment on lines 122 to 123
const nodeName = rdf ? rdf.nodeName : null;
if (!rdf || nodeName !== "rdf:rdf" || !rdf.hasChildNodes()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the lower-casing now happening in the XML-parser, you can inline the nodeName check here instead:

Suggested change
const nodeName = rdf ? rdf.nodeName : null;
if (!rdf || nodeName !== "rdf:rdf" || !rdf.hasChildNodes()) {
if (!rdf || rdf.nodeName !== "rdf:rdf" || !rdf.hasChildNodes()) {

Comment on lines 3216 to 3221
{ "id": "js-authors",
"file": "pdfs/js-authors.pdf",
"md5": "e892f290ae209286a29d3c17dfeab226",
"rounds": 1,
"type": "eq"
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much value does this eq test bring, since the functionality that you're adding here doesn't really show up in a reference-test as far as I can tell?

 - add scripting support for doc.info.authors
 - doc.info.metadata is the raw string with xml code
@calixteman
Copy link
Contributor Author

/botio integrationtest

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_integrationtest from @calixteman received. Current queue size: 0

Live output at: http://54.67.70.0:8877/662eeeee2cf4552/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_integrationtest from @calixteman received. Current queue size: 0

Live output at: http://3.101.106.178:8877/198062624320a24/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/662eeeee2cf4552/output.txt

Total script time: 3.01 mins

  • Integration Tests: Passed

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://3.101.106.178:8877/198062624320a24/output.txt

Total script time: 3.81 mins

  • Integration Tests: Passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants