Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the pdf.js cannot load the .cmap file in node.js #8881

Closed
WeiFei365 opened this issue Sep 7, 2017 · 1 comment
Closed

the pdf.js cannot load the .cmap file in node.js #8881

WeiFei365 opened this issue Sep 7, 2017 · 1 comment

Comments

@WeiFei365
Copy link

Link to PDF file (or attach file here): https://github.com/WeiFei365/pdfjs-node-cmap/blob/master/test.pdf

Configuration:

  • Web browser and its version: Chrome 64bit 60.0.3112.113
  • Operating system and its version: macOS 10.12.6
  • PDF.js version: 1.8.619
  • Is an extension: NO

Steps to reproduce the problem:

  1. First download the source code at: https://github.com/WeiFei365/pdfjs-node-cmap
  2. After dependencies installed, please run the commands below:
node index.js
node index-mine.js

My terminal show: e2

  1. After these two commands above, I got two different string content. I think the second one should be right, because I open pdf file with pdf.js in Chrome, the output string like this:

  2. When I debug by Visual Studio Code, I find some error information while debug index.js. I attach the screeshot below: e1. For solving this, I found an issue #8064 among Issues. But as I debug, the program didn't check out the environment as Node.js correctly and the PDFJS.cMapUrl is null. I thought there is something wrong in the program.

  3. In idnex-mine.js, you can check the codes L12 that I mock a XMLHttpRequest Class to get the string content I expected. But I'm not sure if my solution has any flaws.

What is the expected behavior? (add screenshot)

the string content is expected to be output:

'268新華人壽保險股份有限公司  2015 年年度報告第十四節附件合併財務報表附註(續)截至2015年12月31日止年度(除特別標註外,金額單位為人民幣百萬元)38 資產負債表日後事項(1) 利潤分配根據2016年3月29日董事會通過的2015年度利潤分配方案,本公司擬向全體股東派發現金股利人民幣873百萬元,按已發行股份計算每股人民幣0.28元(含稅)。上述利潤分配方案尚待股東大會批准。(2) 籌建新華卓越養老保險股份有限公司2015年4月23日,保監會批復同意本公司和本公司附屬公司資產管理公司共同發起籌建新華卓越養老保險股份有限公司,註冊資本人民幣5億元,註冊地北京市,截至本財務報表批准報出日,籌建工作仍在進行中。(3) 發行資本補充債券本公司於2016年3月4日召開的2016年度第一次臨時股東大會審議批准的《關於公司2016年資本補充債券募集方案的議案(修訂)》,同意本公司2016年發行總額不超過人民幣50億元或不超過人民幣50億元等值美元的資本補充債券。本公司2016年資本補充債券發行事宜尚待監管部門批准。39 合併財務報表批准本合併財務報表於2016年3月29日經本公司董事會審議通過並批准報出。'

What went wrong? (add screenshot)

the pdf.js program didn't check out the environment as Node.js correctly, So I use XMLHttpRequest to load file 'Adobe-CNS1-UCS2.bcmap'. But the XMLHttpRequest is the object in DOM.

Link to a viewer (if hosted on a site other than mozilla.github.io/pdf.js or as Firefox/Chrome extension):

Please download my source code: https://github.com/WeiFei365/pdfjs-node-cmap. Please separately run the two files: index.js and index-mine.js, then check the output in the terminal.
Or if you also installed the Visual Studio Code, then you can debug index.js directly and check the error information.

Thank you very much.

@Snuffleupagus
Copy link
Collaborator

For solving this, I found an issue #8064 among Issues.

I'm afraid that you may have misunderstood the purpose of that PR, since what it does is allow you to provide a custom factory for reading CMap files when using PDF.js in for example Node.js (such that you don't have to try and mock e.g. XMLHttpRequest); please refer to

pdf.js/src/display/api.js

Lines 135 to 138 in 9b14f8e

* @property {Object} CMapReaderFactory - (optional) The factory that will be
* used when reading built-in CMap files. Providing a custom factory is useful
* for environments without `XMLHttpRequest` support, such as e.g. Node.js.
* The default value is {DOMCMapReaderFactory}.

The correct way to use this, is to in your code define e.g. a NodeCMapReaderFactory; please see

class NodeCMapReaderFactory {
constructor({ baseUrl = null, isCompressed = false, }) {
this.baseUrl = baseUrl;
this.isCompressed = isCompressed;
}
fetch({ name, }) {
if (!name) {
return Promise.reject(new Error('CMap name must be specified.'));
}
return new Promise((resolve, reject) => {
let url = this.baseUrl + name + (this.isCompressed ? '.bcmap' : '');
let fs = require('fs');
fs.readFile(url, (error, data) => {
if (error || !data) {
reject(new Error('Unable to load ' +
(this.isCompressed ? 'binary ' : '') +
'CMap at: ' + url));
return;
}
resolve({
cMapData: new Uint8Array(data),
compressionType: this.isCompressed ?
CMapCompressionType.BINARY : CMapCompressionType.NONE,
});
});
});
}
}
for an example of how such a thing could look. You then call the API like this (based on your code):

let loadingTask = pdfjsLib.getDocument({
  data: pdfData,
  CMapReaderFactory: NodeCMapReaderFactory,
});
loadingTask.promise.then((pdfDocument) => {
  // Your code here...
});

Provided that you've also set PDFJS.cMapUrl and PDFJS.cMapPacked correctly, this should now work as intended; closing as answered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants