Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent fontName value #11610

Closed
umutesen opened this issue Feb 17, 2020 · 4 comments
Closed

Inconsistent fontName value #11610

umutesen opened this issue Feb 17, 2020 · 4 comments

Comments

@umutesen
Copy link

Attach (recommended) or Link to PDF file here:

Configuration:

  • Web browser and its version: Chrome Version 80.0.3987.100 (Official Build) (64-bit)
  • Operating system and its version: Windows 7
  • PDF.js version: pdfjs-dist ^2.2.228 with Angular 6
  • Is a browser extension: No

Steps to reproduce the problem:

  1. Read pdf lines of the same document multiple times
public async readPdfLines(pdfUrl: string): Promise<any[]> {
    const pdf = await pdfjsLib.getDocument(pdfUrl).promise;
    const lines: any[] = [];

    for (let i = 1; i <= pdf._pdfInfo.numPages; i++) {
      const page = await pdf.getPage(i);
      const textContent = await page.getTextContent();
      textContent.items.forEach(line => lines.push(line));
    }

    return lines;
  }

this.downloadedPdfLines = await this.pdfReader.readPdfLines(url);

  1. Font name associated with each line is inconsistent, the first read produce g_d0_f8, second g_d2_f26 etc. I am having to add a new or case every time I increase the runtime on the document.

     // First line of each cover in PDF
     const firstLines = this.downloadedPdfLines
         .filter(x => (x.fontName === "g_d0_f8" ||
             x.fontName === "g_d1_f17" ||
             x.fontName === "g_d2_f26" ||
             x.fontName === "g_d3_f35" ||
             x.fontName === "g_d4_f44" ||
             x.fontName === "g_d5_f53" ||
             x.fontName === "g_d6_f62") && x.height === 11);
    

What is the expected behavior? (add screenshot)
font name should always be the same for the same line of text

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Feb 17, 2020

font name should always be the same for the same line of text

That unfortunately cannot be done, since font names must be unique for every getDocument call to prevent errors if there's ever multiple documents opened in parallel. I.e. the {n} part in g_d{n}_f1 will always be unique for each getDocument call, and this is thus working as expected.

@umutesen
Copy link
Author

Thank you for clarification, is there a way to disable this feature? In my case there is no requirement for opening documents in parallel.

@Snuffleupagus
Copy link
Collaborator

Thank you for clarification, is there a way to disable this feature?

No; and such an option isn't something that would be good to add either[1]. Please note that the current unique font names were added to address specific (and somewhat) reoccurring bugs with broken fonts.

Ninja-edit: However, I suppose that it may be possible to change things such that at least the _fxx-part of the font name would be more consistent for consecutive getDocument calls.


[1] It'd risk adding unnecessary complexity in the code, and you'd basically add a foot-gun which users may inadvertently use to break font rendering for themselves.

@timvandermeij
Copy link
Contributor

You could change that for a custom deployment, but it's not something we recommend or support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants