Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combining accents not working in Unicode converter (and more) #207

Closed
oscargus opened this issue Oct 4, 2015 · 16 comments
Closed

Combining accents not working in Unicode converter (and more) #207

oscargus opened this issue Oct 4, 2015 · 16 comments

Comments

@oscargus
Copy link
Contributor

oscargus commented Oct 4, 2015

Just to remember for myself (or anyone else):

The Unicode to LaTeX converter does not handle combining accents, i.e., "a\u0301" = á does not convert into {\'{a}} as it should. Interestingly enough it seems like the Java compiler would also convert "\u0301a" to á, but "a\u0301e" = áe, so clearly the combining accent should be applied to the preceding character.

Double combining accents are not supported in either HTML nor Unicode to LaTeX conversion. While some of the codes are there, the behavior is not correct. For example, b͞c or "b\u035Cc", meaning b͞c, should be converted to something clever, probably not {\textdoublemacron{b}}c which is the current result from HTML.

@oscargus
Copy link
Contributor Author

oscargus commented Oct 7, 2015

How complete should this functionality really be? When I look around, I notice that there are huge conversion lists which could be implemented. On the other hand, how likely is it that e.g. ⬲ (HTML: ⬲, Unicode: "\u2B32") shows up in an imported field?

Should we keep it relevant or complete?

@oscargus
Copy link
Contributor Author

oscargus commented Oct 7, 2015

The old list was pretty much constructed by actively searching in IEEE Xplore for entries that were likely to end up with math symbols etc. That covered most of the HTML entries at least. I guess one can do something similar with Medline to cover even more symbols that might occur.

@koppor
Copy link
Member

koppor commented Oct 7, 2015

This seems somehow related to #161, which can now be closed, can't it? If I get you right, latex2utf8 and LaTeX::Encode now support less characters.

I would use that tools as benchmark and not do more encoding.

Since biber is a perfect replacement for bibtex, I'd even suggest to do as least effort as possible here and encourage the users to switch to biblatex. The idea of #161 was only to gain a lot with little effort. Now, it sounds like much effort for little gain 🙈

@ThomasA
Copy link
Member

ThomasA commented Oct 7, 2015

Careful with Biber... Biblatex is great, but some journals do not
accept LaTeX submissions using Biblatex - only BibTeX. As far as I know,
using \usepackage[utf8]{inputenc} and bibtex8 instead of bibtex
can get you a fair bit of the way there.

@koppor
Copy link
Member

koppor commented Oct 7, 2015

See #160 😇. OK, we'll need the converter for BW. Pushing publishers is far from easy. See the thread at tetex-extra: llncs package, especially https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=31897#47.

@ThomasA
Copy link
Member

ThomasA commented Oct 7, 2015

As far as I know most, if not all, of IEEE's journals use the IEEEtran class and require the use of a BibTeX-compatible .bib-file or a finished thebibliography environment (which Biber/Biblatex cannot produce). IEEE do not actually use LaTeX to format the papers but have a proprietary system that converts the LaTeX to some XML format, so it would probably be very difficult to convince them to change that.

@oscargus
Copy link
Contributor Author

oscargus commented Oct 7, 2015

I also publish in IEEE, so I also see a clear use case for having a converter. :-)

@oscargus
Copy link
Contributor Author

Single combining accents now works for Unicode when #808 is merged.

Double combining accents still to be solved.

@oscargus
Copy link
Contributor Author

And with #1581 single combining accents are working for LaTeX to Unicode.

@lenhard
Copy link
Member

lenhard commented Jul 26, 2016

@oscargus: So what is missing to mark this issue as closed?

@oscargus
Copy link
Contributor Author

oscargus commented Jul 26, 2016

With some sort of priority

High

  • Single combining accents LaTeX -> HTML (doable)

Medium

  • Double combining accents LaTeX -> unicode (doable)
  • Double combining accents LaTeX -> HTML (doable)

Very low

  • Double combining accents Unicode -> LaTeX (hard and more rarely needed I would guess)
  • Double combining accents Unicode -> HTML (hard and more rarely needed I would guess)

@lenhard
Copy link
Member

lenhard commented Jan 13, 2017

Single combining accents LaTeX -> Unicode seem to have experienced problems, see: #2458

#2464 should fix at least some of the issues that relate to '

@lenhard
Copy link
Member

lenhard commented Feb 10, 2017

With the replacement of our internal conversion logic with latex2unicode, the combining accents should work now. Hence, I am closing this issue.

Feel free to reopen if problems with combining accents reappear.

@lenhard lenhard closed this as completed Feb 10, 2017
@lenhard
Copy link
Member

lenhard commented Feb 10, 2017

And revisiting this issue, I see that it does not only concern the LaTeX to unicode conversion, but also the HTML to unicode conversion and the unicode to LaTeX conversion, so I guess I have to reopen. Things missing:

  • Double combining accents LaTeX -> HTML
  • Double combining accents Unicode -> LaTeX
  • Double combining accents Unicode -> HTML

@lenhard lenhard reopened this Feb 10, 2017
@lenhard
Copy link
Member

lenhard commented Feb 10, 2017

Double combining accents are tested now: 9eef09c

@lenhard
Copy link
Member

lenhard commented Jul 18, 2017

Some time has passed again and we have made more improvements in terms of unicode conversion.

Not much has happened in this issue, though. I'll close it now, because I think it makes more sense to open new issues when someone actually misses a specific conversion. We will probably never be complete with regards to conversion anyway. But we should just track conversion problems that people actually face.

@lenhard lenhard closed this as completed Jul 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants