-
-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apostrophes wreak havoc on hyphenation #283
Comments
This is actually essentially the same bug as #265. The problem is about how text is being segmented; SILE isn't seeing "words" any more, only nodes after the text has been shaped by Harfbuzz and then formed into nodes by
Maybe we should pass through quotes just like we do with combining marks. I have a patch which would potentially do this, but I am worried if it would break other languages by allowing strange hyphenation points:
|
Can you push that to a test branch? I'd be interested in trying such a change. I have an idea we might need to look at some language specific tweaks in this department. Another day, another book, another issue that I think is related to this: my verse references are breaking on figure/en-dashes; e.g.:
I have a non-breaking space after the book name, but I would like to inhibit (or at least severely penalize) breaks before and after Unfortunately this isn't quite standard Turkish usage. According to the official Turkish Language Institute people Turkish doesn't have a figure dash or en-dash at all. It has either a hyphen or a long (em) dash. However using hyphens for everything -including insertions like this one- quickly gets messy and many publishers use a figure dash as well. I'm one of those that thinks the figure dash adds something over using the same glyph used for hyphenation, ergo I'd like to be able to define rules for its usage that inhibit it being a line breaking character. This seems to me to be about the same problem as I'm having with apostrophes, only in reverse. |
I'm not going to push it to a test branch, because it will break English, but here's the patch:
|
Hey that looks more like it! Is there a general way to control which side of the apostrophe to break on? I'd typically like to keep it on the trailing line before the hyphen instead of bumping it to the new line, but since there are break points on both sides of the apostrophe in does whatever is best for the word spacing and sometimes they stay and sometimes they don't. |
Travis doesn't (yet) have the Libertinus fork version of the font used in this test. Also in order to trigger the widest range of failures possible the original test used a very specific page and font size combination. This adaption allows it to work on A5 paper but setting the corresponding font size that trips up the most break points.
I'm wondering, given that this seems to be a small number of words in Turkish, whether another possible solution is to use the Unicode WORD JOINER character? |
@r12a The list of possible words is by no means small (I could find hundreds of words to use as examples), and one of the principals I'm working from is that clean source text shouldn't need special treatment to be typeset. Obviously in the case of language exceptions having control characters like that might be acceptable, but we're not talking about exceptions here — this is the rule. |
Ok, thanks for clarifying. |
This is related to #265, but that case is somewhat specific to a language anomaly and needing a way to setup exceptions. But there is a more general problem.
Basically any time apostrophes get involved everything goes to pot. Interestingly Unicode right single quotation marks fail in a different way that straight apostrophes. Here is an MWE:
This is especially puzzling to me because there are no shortage of hyphenation points in either of these words, nor are they different for the different quote styles:
The text was updated successfully, but these errors were encountered: