Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep unknown commands #9

Closed
stefan-kolb opened this issue Apr 24, 2018 · 6 comments · Fixed by #13
Closed

Keep unknown commands #9

stefan-kolb opened this issue Apr 24, 2018 · 6 comments · Fixed by #13

Comments

@stefan-kolb
Copy link

stefan-kolb commented Apr 24, 2018

Hi @tomtung
You recently introduced the feature that unknown commands are preserved.
However, parameters in braces are removed from the brace environment or empty braces are stripped.
We discussed this at the JabRef dev call and came to the conclusion that it would be beneficial if the commands are just kept as they are if they are unknown.
I added a few tests that show the intended behavior.
Could you help us in bringing this into Scala code?
Or in general, do you agree with our thoughts?

Best regards,

Stefan and the JabRef team

  test("Unknown commands") {
    LaTeX2Unicode.convert("\\this \\is \\alpha test") shouldBe "\\this \\is α test"
    LaTeX2Unicode.convert("\\unknown command") shouldBe "\\unknown command"
    LaTeX2Unicode.convert("\\unknown{} empty params") shouldBe "\\unknown{} empty params"
    LaTeX2Unicode.convert("\\unknown{cmd}") shouldBe "\\unknown{cmd}"
  }
@tomtung
Copy link
Owner

tomtung commented May 29, 2018

Parsing-wise this is a little tricky, because generally braces are not kept in the output unless they are escaped (the same goes to consecutive spaces). Creating a special case here can be quite messy.

Problem is, even if we can add special cases to preserve braces, since the arity of an unknown command is, by definition, unknown, we still don't really know how many braces that follows should be kept. E.g. the output for \this{is}{a}{test} could be {\this}isatest or \this{is}atest or \this{is}{a}test or \this{is}{a}{test}, and there's no good way to know which is "correct".

From my understanding, you have problem with the output when an unknown commend is not followed by spaces. E.g. \test{abc} becomes \testabc, as if \testabc is a single command. What about simply adding braces around the unknown command itself? In that case, the behavior would be

  test("Unknown commands") {
    LaTeX2Unicode.convert("\\this \\is \\alpha test") shouldBe "{\\this} {\\is} α test"
    LaTeX2Unicode.convert("\\unknown command") shouldBe "{\\unknown} command"
    LaTeX2Unicode.convert("\\unknown{} empty params") shouldBe "{\\unknown} empty params"
    LaTeX2Unicode.convert("\\unknown{cmd}") shouldBe "{\\unknown}cmd"
  }

@koppor
Copy link

koppor commented Jun 1, 2018

I think, we can go with the assumption \this{is}{a}{test} denotes a command with three parameters. Would that be possible? - All other interpretations feel strange and are IMHO not common in LaTeX usage.

@tomtung
Copy link
Owner

tomtung commented Jun 2, 2018

Not sure if I can agree.. For example, if \${\bf 1MM} should translate to $𝟏𝐌𝐌 (note the unicode boldface), why should \unknown_cmd{\bf 1MM} be translated to \unknown_cmd{𝟏𝐌𝐌} instead of {\unknown_cmd}𝟏𝐌𝐌?

Generally, I think it's really tricky to come up with well-defined rules that can guess the arity of an unknown command while sensibly covering all these edge cases. Even with such rules, context-dependent arity-guessing would add a lot of parsing complexity, likely with performance penalty, too.

@koppor
Copy link

koppor commented Jun 3, 2018

I agree that it is very tricky. In our use case, we just need symbol replacement. Thus, all tex-commands not producing symbols directly can be left untranslated. This especially includes superscript, italics and bold face. Superscript is causing huge issues when converting it (see JabRef/jabref#3644 and JabRef/jabref#2596).

Is it possible to introduce a flag (second conversion method?) to replace symbols only? That would help us very much and leave the other features intact without any issues with unknown commands.

@tomtung
Copy link
Owner

tomtung commented Jun 3, 2018

I think that's a separate issue. From my understanding, the issue here is that LaTeX generally doesn't keep unescaped brackets {}, so it's very tricky to decide when to keep them for unkown commands without knowing the arity of the command.

Alternatively, maybe we can disregard the standard LaTeX behavior and have an option of always keeping the brackets in outputs?

@koppor
Copy link

koppor commented Jun 3, 2018

+1 for the option to always keep the brackets in outputs!

tobiasdiez added a commit to tobiasdiez/latex2unicode that referenced this issue May 17, 2019
Preserve arguments for unknown commands
@tobiasdiez tobiasdiez mentioned this issue May 17, 2019
tomtung pushed a commit that referenced this issue Jun 3, 2019
Preserve arguments for unknown commands
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants