Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"?" at end of title produces a "?`" ligature (upside-down question mark) in latex with "lang: de" #54

Closed
korakinos opened this issue Feb 18, 2021 · 14 comments

Comments

@korakinos
Copy link

korakinos commented Feb 18, 2021

$ pandoc --version
pandoc 2.11.4
Compiled with pandoc-types 1.22, texmath 0.12.1, skylighting 0.10.2,
citeproc 0.3.0.5, ipynb 0.1.0.1

With the new citeproc, if I have a title ending with a question mark ("?") in the bibliography and I am setting lang to German in the YAML metadata block (I tested de, de-DE or de-DE-1996), this gets set as "?``" in latex together with the closing double quotes, which gets interpreted as a "?`" ligature (upside-down question mark "¿") plus closing single quotes in pdf.

Example:

---
lang: de

references:
- type: article
  id: test
  author: David Graeber
  title: "What's the Point If We Can't Have Fun?"

nocite: test

---

Command to compile to pdf: pandoc --output="temp.pdf" --citeproc "temp.md"

Output pdf: temp.pdf

Depending on the CSL used, the same happens in citations within the text. It does not happen with lang: en.

Background / non-working workaround

This issue is somewhat similar to jgm/pandoc#5407. Back then a viable workaround was for me to insert an invisible space ("word joiner", unicode U+2060) after the "?". This time (and on a different computer), however, this doesn't work for me. With pdflatex, it leads to an error:

$ pandoc --pdf-engine=pdflatex --output="temp.pdf" --citeproc "temp.md
Error producing PDF.
! Package inputenc Error: Unicode character ⁠ (U+2060)
(inputenc)                not set up for use with LaTeX.

See the inputenc package documentation for explanation.
Type  H <return>  for immediate help.
 ...                                              
                                                  
l.90 ...`What's the Point If We Can't Have Fun?⁠

Try running pandoc with --pdf-engine=xelatex.

With xelatex, it leads to a warning and faulty output because the font doesn't have the U+2060 character and replaces it by a fullstop ("."):

$ pandoc --pdf-engine=xelatex --output="temp.pdf" --citeproc "temp.md"
[WARNING] Missing character: There is no ⁠ (U+2060) in font [lmroman10-regular]:mapping=tex-text;!

Apart from the default font, I tried mainfont: Linux Libertine O and mainfont: Noto Serif. The latter got rid of the warning message, but still lead to the unwanted fullstop in the output.

@jgm
Copy link
Owner

jgm commented Feb 18, 2021

Are you perhaps using a custom template (maybe in your user data directory)?

With --pdf-engine=xelatex and your original input I get the correct quote character.
Yes, there is a full stop, but that is completely independent; it has nothing to do with the ligature or U+2060 character.
The full stop may be a citeproc library issue; the other issue, if there were one, would be a pandoc issue.

With --pdf-engine=pdflatex I get

% pandoc z.md -C --pdf-engine=pdflatex -o z2.pdf
Error producing PDF.
! Package inputenc Error: Unicode character ⁠ (U+2060)
(inputenc)                not set up for use with LaTeX.

See the inputenc package documentation for explanation.
Type  H <return>  for immediate help.
 ...                                              
                                                  
l.90 ...�What's the Point If We Can't Have Fun?⁠

Try running pandoc with --pdf-engine=xelatex.

@jgm
Copy link
Owner

jgm commented Feb 18, 2021

Or does your input (above) already contain the U+2060?

@korakinos
Copy link
Author

Or does your input (above) already contain the U+2060?

Indeed it does! My mistake, sorry. That's the trouble with invisible characters: They are easily overlooked. I changed my opening post and deleted the U+2060 after the question mark.

So the original issue stands. And no, I am quite sure that I am not using a custom template (the corresponding subdirectory of my user data directory is empty).

@jgm
Copy link
Owner

jgm commented Feb 18, 2021

Odd, there's code in the LaTeX writer that should be inserting a {} after the ?. I need to see why this isn't happening.

@jgm
Copy link
Owner

jgm commented Feb 18, 2021

OK, I see why it isn't working (the escape routine only looks at things in the same element).

It seems to me a better approach would be to disable the language-specific ligatures in babel.
Never mind, I see that this is not specific to babel.

@jgm
Copy link
Owner

jgm commented Feb 18, 2021

One option suggested here is

\usepackage{microtype}
\DisableLigatures[?,!]{encoding=T1}

@jgm
Copy link
Owner

jgm commented Feb 18, 2021

Another option would be for us to generate curly quotes instead of ``.

You can already force that to happen by specifying -t latex-smart (see manual), so that might be a solution for you.

jgm added a commit to jgm/pandoc that referenced this issue Feb 18, 2021
These are often triggered by accident in languagegs that
use ` `` ` for end quote (e.g. German).

See jgm/citeproc#54.
@jgm
Copy link
Owner

jgm commented Feb 19, 2021

Actually I think the best approach is to have pandoc automatically disable smart when creating a PDF via LaTeX.
This will prevent all problems of this kind and doesn't seem to have any drawbacks.

jgm added a commit to jgm/pandoc that referenced this issue Feb 19, 2021
This is to prevent accidental creation of ligatures like
`` ?` `` and `` !` `` (especially in languages with quotations
like German), and similar ligature issues.

See jgm/citeproc#54.
@jgm
Copy link
Owner

jgm commented Feb 19, 2021

For the full stop issue: the problem lies with punctuationInsideQuotes in Citeproc.Pandoc.
This moves punctuation inside Quoted elements (generated by pandoc's smart quote parser), but when the quotes are just regular characters, the punctuation isn't adjusted. This could be improved, so leaving this issue open.

@korakinos
Copy link
Author

Another option would be for us to generate curly quotes instead of ``.

You can already force that to happen by specifying -t latex-smart (see manual), so that might be a solution for you.

That works, great! As always, thank you so much for your time and effort. Pandoc makes my life better, end filing an issue for it is always a pleasure thanks to you (even if I often open it in the wrong repo…).

(By the way, if anyone else like me is confused by being unable to find a "latex-smart" output type in the manual: "latex-smart" is a notation for "output type 'latex' without (minus) the extension 'smart'.)

@korakinos
Copy link
Author

Actually I think the best approach is to have pandoc automatically disable smart when creating a PDF via LaTeX.
This will prevent all problems of this kind and doesn't seem to have any drawbacks.

Apart from ligatures, the pandoc manual says this about the smart extension:

Nonbreaking spaces are inserted after certain abbreviations, such as “Mr.”

Isn't this a drawback?

@jgm
Copy link
Owner

jgm commented Feb 22, 2021

That's only for Markdown input, not LaTeX output.
For LaTeX output, the only effect of disabling smart is that unicode quote characters are used instead of the quote ligatures.

@korakinos
Copy link
Author

No objections then!

@jgm
Copy link
Owner

jgm commented Jul 28, 2022

This has been fixed by changes in pandoc.

@jgm jgm closed this as completed Jul 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants