Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No hyphenation for lang: es #8684

Closed
memeplex opened this issue Feb 12, 2024 · 14 comments
Closed

No hyphenation for lang: es #8684

memeplex opened this issue Feb 12, 2024 · 14 comments
Labels
bug Something isn't working
Milestone

Comments

@memeplex
Copy link

Bug description

When using the frontmatter:

---
lang: es
---

I expect a pdf rendered using latex to have proper Spanish indentation (or any indentation whatsoever) as described in https://quarto.org/docs/authoring/language.html. That's not the case.

The documentation states:

Document language plays a role in Pandoc’s processing of most formats, and controls hyphenation in PDF output when using LaTeX (through babel and polyglossia) or ConTeXt.

Quarto’s built-in PDF compilation engine handles running LaTeX multiple times to resolve index and bibliography entries, and also performs automatic LaTeX package installation.

From which I infer the proper behavior would have been to install the required packages when lang: es or to provide them preinstalled in tinytex. Instead, I had to manually install hyphen-spanish using tlmgr.

Steps to reproduce

prueba.qmd

---
lang: es
---

automáticamente automáticamente automáticamente automáticamente automáticamente automáticamente automáticamente automáticamente automáticamente automáticamente automáticamente automáticamente automáticamente automáticamente automáticamente automáticamente automáticamente automáticamente automáticamente automáticamente automáticamente automáticamente automáticamente automáticamente automáticamente automáticamente
quarto uninstall tinytex
quarto install tinytext
quarto render prueba.qmd -t pdf
image
tlmgr update --self
tlmgr install hyphen-spanish
quarto render prueba.qmd -t pdf
image

Expected behavior

The documented behavior: everything is properly autoinstalled so then hyphenation works for the document's language.

Actual behavior

No hyphenation for Spanish.

Your environment

macOS Sonoma

Quarto check output

Quarto 1.4.549
[✓] Checking versions of quarto binary dependencies...
      Pandoc version 3.1.11: OK
      Dart Sass version 1.69.5: OK
      Deno version 1.37.2: OK
[✓] Checking versions of quarto dependencies......OK
[✓] Checking Quarto installation......OK
      Version: 1.4.549
      Path: /Users/carlos/.venvs/base/lib/python3.11/site-packages/quarto_cli/bin

[✓] Checking tools....................OK
      TinyTeX: v2024.02
      Chromium: (not installed)

[✓] Checking LaTeX....................OK
      Using: TinyTex
      Path: /Users/carlos/Library/TinyTeX/bin/universal-darwin
      Version: 2023

[✓] Checking basic markdown render....OK

[✓] Checking Python 3 installation....OK
      Version: 3.11.7
      Path: /Users/carlos/.venvs/base/bin/python3
      Jupyter: 5.7.1
      Kernels: python3

[✓] Checking Jupyter engine render....OK

[✓] Checking R installation...........OK
      Version: 4.3.2
      Path: /opt/homebrew/Cellar/r/4.3.2/lib/R
      LibPaths:
        - /Users/carlos/.rlibs/base
        - /Users/carlos/Documents/Util
        - /opt/homebrew/lib/R/4.3/site-library
        - /opt/homebrew/Cellar/r/4.3.2/lib/R/library
      knitr: 1.45
      rmarkdown: 2.25

[✓] Checking Knitr engine render......OK
@memeplex memeplex added the bug Something isn't working label Feb 12, 2024
@memeplex
Copy link
Author

Notice that some special handling was added to tinytex for cases like this: rstudio/tinytex@0f20074

Perhaps they weren't ported to quarto's engine.

@cscheid cscheid added this to the Future milestone Feb 12, 2024
@dragonstyle
Copy link
Collaborator

dragonstyle commented Feb 12, 2024

We do support parsing of Babel warnings, but none are emitted in the example that you provided, so there isn't really a reasonable way for us to automatically detect this if the LaTeX engine isn't notifying us of any issue.

https://github.com/quarto-dev/quarto-cli/blob/078cac0de0a42bca9cfa788b17f160128193286c/src/command/render/latexmk/pdf.ts#L181C38-L181C65

@memeplex
Copy link
Author

Could you add some note in the documentation, please? The problem I see here is that all is documented as if it worked out of the box but when it doesn't it's hard to figure out why and what to do, because of the additional layers involved. It's even hard to figure out where the latex installation is located.

Maybe some note after:

Document language plays a role in Pandoc’s processing of most formats, and controls hyphenation in PDF output when using LaTeX

explaining that sometimes it's not possible to automatically install missing hyphenation rules and if you see that hyphenation is not working for language L, then you can find where tinytex is installed with quarto tools info tinytex and run tlmgr install hyphen-L.

Moreover, basictex, which is about the same size as tinytex, doesn't have this issue. Perhaps it's possible to include more language support out of the box without significantly increasing the size of the installation. Currently a simple \usepackage[spanish]{babel} fails, even with hyphen-spanish installed.

Honestly, Spanish hyphenation working OOB (or at least with clear instructions about how to enable it) seems quite a basic requirement to me to just close this like that.

@cscheid
Copy link
Collaborator

cscheid commented Feb 12, 2024

The problem I see here is that all is documented as if it worked out of the box

Let me jump right here. No, it's not documented as such. Since you have been creating a relatively large number of issues here, let me ask you to read documentation carefully. Our documentation states:

"The following languages currently have full translations available:"

It doesn't say anything about hyphenation support for those languages.

Moreover, basictex, which is about the same size as tinytex, doesn't have this issue

Honestly, Spanish hyphenation working OOB (or at least with clear instructions about how to enable it) seems quite a basic requirement to me to just close this like that.

I'm a native speaker of Portuguese, so I understand the desire and value of internationalization. But I think Quarto does an excellent job given our time constraints. If such fine-grained control over your latex output is the utmost priority for you, and you consider basictex to be sufficiently superior, you might want to consider using a system that supports basictex out of the box, and maybe Quarto isn't the system for you right now.

@cscheid
Copy link
Collaborator

cscheid commented Feb 12, 2024

Is that a problem?

It's not a problem. But because we look at every single new issue that's filed, and you're filing more issues than the median user, I'm asking you to also be more careful than the median user. In a real sense, it causes a degradation of our ability to attend to the other issues in the repository.

@memeplex
Copy link
Author

Since you have been creating a relatively large number of issues here

Is that a problem? I know one of them was a mistake and I already apologized for it, but in general I'm careful and pay attention to the details. But if you feel I'm bothering I'll just stop reporting, no problem.

let me ask you to read documentation carefully

I did. Obviously I might have missed many things because it's a long document. Now, please let me quote another part of the documentation:

Document language plays a role in Pandoc’s processing of most formats, and controls hyphenation in PDF output when using LaTeX

For example, this document specifies the use of French:

---
title: "My Document"
lang: fr    
---

This will result in the use of French translations as well as the application of other language specific rules to document processing.

I might be wrong, but it's hard for me not to read there that hyphenation is intended to work OOB.

I have analyzed the logs with and without hyphen-spanish installed and there are some differences that seem relevant:

❯ diff test.log test-no-hyphen.log 
1c1
< This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) (preloaded format=pdflatex 2024.2.12)  12 FEB 2024 13:18
---
> This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) (preloaded format=pdflatex 2024.2.12)  12 FEB 2024 13:16
411c411
< \l@unhyphenated=\language7
---
> \l@unhyphenated=\language6
419c419
< \l@nil=\language8
---
> \l@nil=\language7
424a425,426
> Package babel Info: Hyphen rules for 'spanish' set to \l@nil
> (babel)             (\language7). Reported on input line 128.
602,603c604,605
<  18504 strings out of 476150
<  326015 string characters out of 5793133
---
>  18502 strings out of 476162
>  326018 string characters out of 5793733
606c608
<  562543 words of font info for 44 fonts, out of 8000000 for 9000
---
>  562294 words of font info for 41 fonts, out of 8000000 for 9000
610c612
< Output written on test.pdf (1 page, 26218 bytes).
---
> Output written on test.pdf (1 page, 26113 bytes).

@memeplex
Copy link
Author

Just in case it's not clear, this is part of the log only when hyphen-spanish is not installed:

> Package babel Info: Hyphen rules for 'spanish' set to \l@nil
> (babel)             (\language7). Reported on input line 128.

@memeplex
Copy link
Author

memeplex commented Feb 12, 2024

Looking at the logs, there are two issues with

export function findMissingHyphenationFiles(logText: string) {
:

  1. babelWarningRegex = /^Package babel Warning:/m should have Info or (Warning|info) instead of just Warning.
  2. languageRegex = /^\(babel\).* language (\S+).*$/m, but the only matching lines are
    (babel)             option. I'll load 'nil'. Reported on input line 4342.
    (babel)             from babel-es.ini. Reported on input line 128.
    (babel)             (\language7). Reported on input line 128.
    
    so you will get language7 instead of spanish

They probably changed the way it's reported, or perhaps the tex source you were generating was different at the time. But it's now possible to identify the message and parse the language from the same line Package babel Info: Hyphen rules for 'spanish' set to \l@nil.

@mcanouil
Copy link
Collaborator

Did you look and try what have been discussed in the following discussion?

@memeplex
Copy link
Author

Thanks for the link. Disregarding subtleties related to German rules, the log they show there is similar to mine:

Package babel Info: Importing data for ngerman
(babel)             from babel-de.ini. Reported on input line 131.
Package babel Info: Hyphen rules for 'ngerman' set to \l@nil
(babel)             (\language7). Reported on input line 131.

\l@nil is empty element. So I believe no hyphenation is found.

I indeed tried what they suggest, yes, I've installed hyphen-spanish, although because it was previously suggested in rstudio/tinytex#97.

But this issue is about the automatic handling of missing hyphenation packages. I believe there is room for improvement now that we realize there is a message in the log that's easy to match and parse, but that is not being matched by findMissingHyphenationFiles, probably because of a change in the way it is reported.

@dragonstyle dragonstyle reopened this Feb 12, 2024
@dragonstyle
Copy link
Collaborator

I agree with the assessment that this is a condition we could and should detect - thanks for the additional debugging. I'll get a fix on the way.

@memeplex
Copy link
Author

Thanks!

@mcanouil mcanouil modified the milestones: Future, v1.5 Feb 12, 2024
@memeplex
Copy link
Author

@dragonstyle looking at your patch, wouldn't it be preferable if you defined filterLang outside of both parsing alternatives, and run hyphen-${filterLang(language.toLowerCase())} at the end no matter which one matched the log? I'm not sure whether it applies to the warning case, but nevertheless there is no hyphen-ngerman package, so it seems innocuous.

@dragonstyle
Copy link
Collaborator

Indeed - I'll tidy up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants