Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Per-element metadata from HTML -> PDF -> HTML (via pdf.js) #2279

Closed
jambudipa opened this issue Oct 21, 2024 · 2 comments
Closed

Per-element metadata from HTML -> PDF -> HTML (via pdf.js) #2279

jambudipa opened this issue Oct 21, 2024 · 2 comments

Comments

@jambudipa
Copy link

I need to carry some metadata – which could amount to just an ID – from the source HTML, through to the PDF using WeasyPrint, eventually ending up somehow addressable in the HTML rendered by pdf.js (more specifically, react-pdf).

So, for example, if I have this element in my source HTML:

<p class="x00-chapter-title---toc-level" id="contents">Contents</p>

I would like to be able to see that id when rendered in the browser.

It could be any element really: I imagined a data-id would do the trick. I saw this issue and the corresponding solution which comes close to what I need, perhaps I could fork it?

@jambudipa
Copy link
Author

So I change my element to this:

<p class="x00-chapter-title---toc-level" id="my-id">Contents</p>

Using qpdf, I was able to generate a text-readable version of the generated PDF, and happily found this:

<<
  /Names <<
    /Dests <<
      /Names [
        (my-id)
        [
          25 0 R
          /XYZ
          67.25
          810.889736
          0
        ]
      ]
    >>
  >>
>>

...which gives me hope!

But now I am not sure how to use pdf.js to provide these details, or even tell me what they mean. Presumably coordinates on the page.

Maybe I will ask on the pdf.js GitHub...

@jambudipa
Copy link
Author

Ok, managed to coerce GPT-4o into giving me the answer:

const page = await pdf.getPage(pageNum);
const pageRef = page.ref; // This contains the object reference for the page
const objectNumber = pageRef.num;
const generationNumber = pageRef.gen;

// Get all named destinations
const destinations = await pdf.getDestinations();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant