Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML>latex hides word after a tilde ~ but HTML>md>latex won't #964

Closed
florianm opened this issue Aug 28, 2013 · 2 comments
Closed

HTML>latex hides word after a tilde ~ but HTML>md>latex won't #964

florianm opened this issue Aug 28, 2013 · 2 comments

Comments

@florianm
Copy link

Tested only on Ubuntu 12.04 with pandoc 1.9.1.1 (compiled with citeproc-hs 0.3.4, texmath 0.6.0.3, highlighting-kate 0.5.0.5)

TL,DR: Converting HTML to Latex directly hides any alphanumeric words following a tilde without whitespace.
Converting the same HTML document first to markdown, then to Latex, will preserve words following a tilde.

Example: test.html

<html><body>
<h1>First chapter</h1>
<p>The word after a tilde ~ will be missing. Example: ~can't ~touch ~this.</p>
<p>One little tilde sat on a wall. ~ Two little tildes had a bad fall. ~~ Three little tildes just wanted a hug. ~~~ Four little tildes show it's a bug. ~~~~</p>
</body></html>

Converting to markdown

$ pandoc test.html -o fromhtml.md

creates fromhtml.md:

First chapter
=============

The word after a tilde \~ will be missing. Example: \~can't \~touch
\~this.

One little tilde sat on a wall. \~ Two little tildes had a bad fall.
\~\~ Three little tildes just wanted a hug. \~\~\~ Four little tildes
show it's a bug. \~\~\~\~

Converting that to latex

$ pandoc fromhtml.md -o frommd.tex

creates frommd.tex:

\section{First chapter}

The word after a tilde \ensuremath{\sim} will be missing. Example:
\ensuremath{\sim}can't \ensuremath{\sim}touch \ensuremath{\sim}this.

One little tilde sat on a wall. \ensuremath{\sim} Two little tildes had
a bad fall. \ensuremath{\sim}\ensuremath{\sim} Three little tildes just
wanted a hug. \ensuremath{\sim}\ensuremath{\sim}\ensuremath{\sim} Four
little tildes show it's a bug.
\ensuremath{\sim}\ensuremath{\sim}\ensuremath{\sim}\ensuremath{\sim}

Note that following words, as well as consecutive tildes are preserved.

Now converting the original HTML directly into Latex will hide following words and tildes:

$ pandoc test.html -o fromhtml.tex

The resulting latex file fromhtml.tex:

\section{First chapter}

The word after a tilde \ensuremath{\sim} will be missing. Example:
\ensuremath{\sim}'t \ensuremath{\sim} \ensuremath{\sim}.

One little tilde sat on a wall. \ensuremath{\sim} Two little tildes had
a bad fall. \ensuremath{\sim} Three little tildes just wanted a hug.
\ensuremath{\sim} Four little tildes show it's a bug. \ensuremath{\sim}

Why does pandoc create two different results here, as it converts any input format into its own markdown dialect, then into the specified output format?
I am aware that tildes have a special meaning in markdown, but as they come in my example from HTML, they seem not to be escaped properly.

@jgm
Copy link
Owner

jgm commented Aug 28, 2013

This was a bug in pandoc 1.9.1.1. But we are now on 1.11.1, which works fine on your input!

@jgm jgm closed this as completed Aug 28, 2013
@florianm
Copy link
Author

Thanks for the fast answer, John!

harupong added a commit to harupong/progit that referenced this issue Dec 6, 2013
Pandoc version 1.9.1.1, which is installed with `apt-get` on Ubuntu 12.04,
has a [bug](jgm/pandoc#964) that hides a word
after tilde.  It is fixed with Pandoc version 1.11.1.
harupong added a commit to harupong/progit that referenced this issue Dec 9, 2013
Pandoc version 1.9.1.1, which is installed with `apt-get` on Ubuntu 12.04,
has a [bug](jgm/pandoc#964) that hides a word
after tilde.  It is fixed with Pandoc version 1.11.1.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@jgm @florianm and others