You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to extract the (plain) text from existing PDF documents and stumbled upon this library after it turned out that the pdfjs-dist is not as portable as needed in my project.
Could you share a few quick pointers on what might be the best approach to find the text nodes and extract the values with your library?
I have already browsed the API docs but realized that (while they're very extensively covering the creation and extension of PDFs) the information on processing PDFs is rather scarce. I am guessing that I should iterate over all pages and then descend into the .node trees? I tried that out but quickly faced another problem: the most of the types (PDFDict, PDFObject, ...) in these trees seem to be missing in the d.ts file, which makes the drill-down pretty cumbersome and leaves me puzzled about the actual chances of success??
Thanks in advance.
The text was updated successfully, but these errors were encountered:
Hello @Hopding,
I am trying to extract the (plain) text from existing PDF documents and stumbled upon this library after it turned out that the pdfjs-dist is not as portable as needed in my project.
Could you share a few quick pointers on what might be the best approach to find the text nodes and extract the values with your library?
I have already browsed the API docs but realized that (while they're very extensively covering the creation and extension of PDFs) the information on processing PDFs is rather scarce. I am guessing that I should iterate over all pages and then descend into the
.node
trees? I tried that out but quickly faced another problem: the most of the types (PDFDict
,PDFObject
, ...) in these trees seem to be missing in the d.ts file, which makes the drill-down pretty cumbersome and leaves me puzzled about the actual chances of success??Thanks in advance.
The text was updated successfully, but these errors were encountered: