-
-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use pdf "/ActualText" feature #494
Comments
Never mind Arabic, I can't reliably copy/paste out of a PDF in Latin alphabet based languages! I've heard of this feature in PDFs before but never played around with it. How widespread is reader support? Do you happen to know of a chart somewhere that shows what readers do or don't support PDF features like this? |
I haven’t been able to find much documentation on it. From myself and one other user I can currently report:
Adobe Reader DC – works
Foxit Reader – doesn’t work
Foxit (linux version) – doesn’t work
qpdfview (linux) – doesn’t work
evince (linux) – doesn’t work
So, it looks like it isn’t supported by many readers. On the other hand, I’m assuming that Adobe represents that major share of the pdf reader population.
…--Malachi
From: Caleb Maclennan [mailto:[email protected]]
Sent: Tuesday, November 21, 2017 11:16
To: simoncozens/sile <[email protected]>
Cc: mnjames <[email protected]>; Author <[email protected]>
Subject: Re: [simoncozens/sile] Use pdf "/ActualText" feature (#494)
Never mind Arabic, I can't reliably copy/paste out of a PDF in Latin alphabet based languages!
I've heard of this feature in PDFs before but never played around with it. How widespread is reader support? Do you happen to know of a chart somewhere that shows what readers do or don't support PDF features like this?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub <#494 (comment)> , or mute the thread <https://github.com/notifications/unsubscribe-auth/AcxrRjwmBwJLB10DmHzW4LikO2FcuDuGks5s4mqugaJpZM4Qkdhz> . <https://github.com/notifications/beacon/AcxrRuG-JmT7k8iwB1GTWPeMoEsy3ZBuks5s4mqugaJpZM4Qkdhz.gif>
|
There is some support for this through the |
Edit: evince supports this now. Edit 2: The only problem with enabling \XeTeXgenerateactualtext is that I know I'm posting this 5 years later....but.... I was writing a game-list (pdf) through latex (using a script to find the games Without specifiying \XeTeXgenerateactualtext=1, in the tex file, any text After generating th pdf with the setting active, evince (as of now) has actual PS: I see no reason why a feature like this shouldn't be turned on by default -- Long story short:
For my use-case -- plain dashes were not being copied and I didn't like the text turning So I ultimately used the ascii package and replaced all dashes with |
In qpdfview, you can press |
The text was copied correctly in qpdfview both with and without \XeTeXgenerateactualtext=1 So, it does look like this is purely a PDF-viewer issue (very similar to the old issue of what css features does a browser support) -- and not releated to LaTex, Sile, or xelatex. etc. |
See somewhat related discussion #1927 |
For the mere record, I experimented bringing directly Then, search (and copy) work well in Evince (before, it would fail on the fi ligature...): But when selecting the text, it shows ugly things... It might be an Evince-only problem (using v46.0) -- Okular (using v24.05.2) doesn't have this problem (= it also failed to find/copy the fi ligature, but with the suggested code change everything seems fine) So I'm unsure it's a PDF-viewer problem or there's some deeper issue in this |
N.B. The "naive" patch:
|
The pdf standard includes a command called /ActualText which allows you to include the unicode text along with the normally occurring glyphs in the pdf. This is wonderful for Arabic and other non-Latin languages that have never had the ability to copy-paste out of pdfs.
XeTeX added the command "\XeTeXgenerateactualtext=1" a year or so ago so that pdfs encoded through it would include the ActualText data in them.
Is it possible to add a similar feature to SILE?
The text was updated successfully, but these errors were encountered: