Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple authors squished together in dc: creator field #12990

Closed
crystalfp opened this issue Feb 14, 2021 · 3 comments · Fixed by #12993
Closed

Multiple authors squished together in dc: creator field #12990

crystalfp opened this issue Feb 14, 2021 · 3 comments · Fixed by #12993

Comments

@crystalfp
Copy link

Attach (recommended) or Link to PDF file here:
http://mariovalle.name/SciViz/1-s2.0-S0097849310001846-main.pdf

Configuration:

  • Web browser and its version: Using pdf.js from Node.js through npm package pdfjs-dist
  • Operating system and its version: Windows 10 64bits
  • PDF.js version: 2.6.347
  • Is a browser extension: No

Steps to reproduce the problem:

  1. Load the file in acrobat
  2. In document properties look at the Author field: it is Samuel Silva; Beatriz Sousa Santos; Joaquim Madeira
  3. Look at Additional metadata > advanced > dc:creator. It is marked as seq container and contains the 3 authors.
  4. Using the code below gives: "Author": "Samuel Silva" that is, only the first one. And "dc:creator": "Samuel SilvaBeatriz Sousa SantosJoaquim Madeira", that is the three authors squashed together without separators.

What is the expected behavior? (add screenshot)
In both fields the three authors as a single string with some sort of separator between them. Note that other pdf files that contains multiple authors already have them as a single string with names separated by commas.

The code used:

var pdfjsLib = require("pdfjs-dist/es5/build/pdf.js");
var pdfPath = "http://mariovalle.name/SciViz/1-s2.0-S0097849310001846-main.pdf";
var loadingTask = pdfjsLib.getDocument(pdfPath);
loadingTask.promise
  .then(function (doc) {
    var lastPromise;
    lastPromise = doc.getMetadata().then(function (data) {
      console.log(JSON.stringify(data.info, null, 2));
      if (data.metadata) {
        console.log(JSON.stringify(data.metadata.getAll(), null, 2));
      }
    });
  });
@crystalfp crystalfp changed the title Remove ";" in Author field Multiple authors squished together in dc: creator field Feb 14, 2021
@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Feb 14, 2021

Using the code below gives: "Author": "Samuel Silva" that is, only the first one.
console.log(JSON.stringify(data.info, null, 2));

That data comes from the Author field in the /Info dictionary of the PDF document, and that entry only contains one name; hence this part works as expected.

And "dc:creator": "Samuel SilvaBeatriz Sousa SantosJoaquim Madeira", that is the three authors squashed together without separators.
console.log(JSON.stringify(data.metadata.getAll(), null, 2));

Starting with PDF.js version 2.7.570, please see https://github.com/mozilla/pdf.js/releases/tag/v2.7.570, the dc:creator now returns an Array (fixed by PR #12838); hence this issue is already fixed.

@crystalfp
Copy link
Author

Perfect! When it will be available as npm package?
Less important, but the same problem seems to happen with "dc:subject" field in the provided file. In this case the "Keywords" field is correct,

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Feb 14, 2021

Perfect! When it will be available as npm package?

https://www.npmjs.com/package/pdfjs-dist?activeTab=versions

but the same problem seems to happen with "dc:subject" field in the provided file.

See the PR linked above.


For future reference: Please open separate issues for different bugs, since having multiple different problems reported in one issue can make tracking things somewhat difficult in general.

Also, if possible, try and test against the development version of PDF.js, see e.g. https://github.com/mozilla/pdf.js#online-demo, since we generally don't backport fixes to already existing releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants