Combining accents not working in Unicode converter (and more) #207

oscargus · 2015-10-04T10:00:59Z

Just to remember for myself (or anyone else):

The Unicode to LaTeX converter does not handle combining accents, i.e., "a\u0301" = á does not convert into {\'{a}} as it should. Interestingly enough it seems like the Java compiler would also convert "\u0301a" to á, but "a\u0301e" = áe, so clearly the combining accent should be applied to the preceding character.

Double combining accents are not supported in either HTML nor Unicode to LaTeX conversion. While some of the codes are there, the behavior is not correct. For example, b͞c or "b\u035Cc", meaning b͞c, should be converted to something clever, probably not {\textdoublemacron{b}}c which is the current result from HTML.

The text was updated successfully, but these errors were encountered:

oscargus · 2015-10-07T07:01:53Z

How complete should this functionality really be? When I look around, I notice that there are huge conversion lists which could be implemented. On the other hand, how likely is it that e.g. ⬲ (HTML: ⬲, Unicode: "\u2B32") shows up in an imported field?

Should we keep it relevant or complete?

oscargus · 2015-10-07T07:06:13Z

The old list was pretty much constructed by actively searching in IEEE Xplore for entries that were likely to end up with math symbols etc. That covered most of the HTML entries at least. I guess one can do something similar with Medline to cover even more symbols that might occur.

koppor · 2015-10-07T07:13:48Z

This seems somehow related to #161, which can now be closed, can't it? If I get you right, latex2utf8 and LaTeX::Encode now support less characters.

I would use that tools as benchmark and not do more encoding.

Since biber is a perfect replacement for bibtex, I'd even suggest to do as least effort as possible here and encourage the users to switch to biblatex. The idea of #161 was only to gain a lot with little effort. Now, it sounds like much effort for little gain 🙈

ThomasA · 2015-10-07T07:37:05Z

Careful with Biber... Biblatex is great, but some journals do not
accept LaTeX submissions using Biblatex - only BibTeX. As far as I know,
using \usepackage[utf8]{inputenc} and bibtex8 instead of bibtex
can get you a fair bit of the way there.

koppor · 2015-10-07T07:44:22Z

See #160 😇. OK, we'll need the converter for BW. Pushing publishers is far from easy. See the thread at tetex-extra: llncs package, especially https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=31897#47.

ThomasA · 2015-10-07T08:10:26Z

As far as I know most, if not all, of IEEE's journals use the IEEEtran class and require the use of a BibTeX-compatible .bib-file or a finished thebibliography environment (which Biber/Biblatex cannot produce). IEEE do not actually use LaTeX to format the papers but have a proprietary system that converts the LaTeX to some XML format, so it would probably be very difficult to convince them to change that.

oscargus · 2015-10-07T12:11:31Z

I also publish in IEEE, so I also see a clear use case for having a converter. :-)

oscargus · 2016-02-16T08:38:20Z

Single combining accents now works for Unicode when #808 is merged.

Double combining accents still to be solved.

oscargus · 2016-07-24T20:34:30Z

And with #1581 single combining accents are working for LaTeX to Unicode.

lenhard · 2016-07-26T07:51:43Z

@oscargus: So what is missing to mark this issue as closed?

oscargus · 2016-07-26T08:21:56Z

With some sort of priority

High

Single combining accents LaTeX -> HTML (doable)

Medium

Double combining accents LaTeX -> unicode (doable)
Double combining accents LaTeX -> HTML (doable)

Very low

Double combining accents Unicode -> LaTeX (hard and more rarely needed I would guess)
Double combining accents Unicode -> HTML (hard and more rarely needed I would guess)

lenhard · 2017-01-13T16:44:05Z

Single combining accents LaTeX -> Unicode seem to have experienced problems, see: #2458

#2464 should fix at least some of the issues that relate to '

lenhard · 2017-02-10T09:10:26Z

With the replacement of our internal conversion logic with latex2unicode, the combining accents should work now. Hence, I am closing this issue.

Feel free to reopen if problems with combining accents reappear.

lenhard · 2017-02-10T10:18:07Z

And revisiting this issue, I see that it does not only concern the LaTeX to unicode conversion, but also the HTML to unicode conversion and the unicode to LaTeX conversion, so I guess I have to reopen. Things missing:

Double combining accents LaTeX -> HTML
Double combining accents Unicode -> LaTeX
Double combining accents Unicode -> HTML

lenhard · 2017-02-10T10:52:04Z

Double combining accents are tested now: 9eef09c

lenhard · 2017-07-18T14:45:44Z

Some time has passed again and we have made more improvements in terms of unicode conversion.

Not much has happened in this issue, though. I'll close it now, because I think it makes more sense to open new issues when someone actually misses a specific conversion. We will probably never be complete with regards to conversion anyway. But we should just track conversion problems that people actually face.

lenhard added the type: enhancement label Jul 29, 2016

Siedlerchr mentioned this issue Jan 13, 2017

Switch to latex2unicode lib instead of own handling #2465

Closed

lenhard mentioned this issue Feb 9, 2017

Switch to Latex2unicode #2532

Merged

3 tasks

lenhard closed this as completed Feb 10, 2017

lenhard reopened this Feb 10, 2017

lenhard closed this as completed Jul 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combining accents not working in Unicode converter (and more) #207

Combining accents not working in Unicode converter (and more) #207

oscargus commented Oct 4, 2015

oscargus commented Oct 7, 2015

oscargus commented Oct 7, 2015

koppor commented Oct 7, 2015

ThomasA commented Oct 7, 2015

koppor commented Oct 7, 2015

ThomasA commented Oct 7, 2015

oscargus commented Oct 7, 2015

oscargus commented Feb 16, 2016

oscargus commented Jul 24, 2016

lenhard commented Jul 26, 2016

oscargus commented Jul 26, 2016 •

edited

Loading

lenhard commented Jan 13, 2017

lenhard commented Feb 10, 2017

lenhard commented Feb 10, 2017

lenhard commented Feb 10, 2017

lenhard commented Jul 18, 2017

Combining accents not working in Unicode converter (and more) #207

Combining accents not working in Unicode converter (and more) #207

Comments

oscargus commented Oct 4, 2015

oscargus commented Oct 7, 2015

oscargus commented Oct 7, 2015

koppor commented Oct 7, 2015

ThomasA commented Oct 7, 2015

koppor commented Oct 7, 2015

ThomasA commented Oct 7, 2015

oscargus commented Oct 7, 2015

oscargus commented Feb 16, 2016

oscargus commented Jul 24, 2016

lenhard commented Jul 26, 2016

oscargus commented Jul 26, 2016 • edited Loading

lenhard commented Jan 13, 2017

lenhard commented Feb 10, 2017

lenhard commented Feb 10, 2017

lenhard commented Feb 10, 2017

lenhard commented Jul 18, 2017

oscargus commented Jul 26, 2016 •

edited

Loading