-
-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop characters when hyphenating #355
Comments
I'll need to come back to this, but I think the answer will be to replace the apostrophe with a |
At what stage in the processing would you envision this being implemented? An |
If I replace all my apostrophes in the input text with discretionary then none of my text that gets run through pushBack() has any apostrophes at all. |
The Hyphen hyphenation library, supports what it calls “non-standard hyphenation” where there are changes applied to the text when it is hyphenated. It might be what you are looking for. This was requested in #277, but SILE will either need to use Hyphen instead of its own code, or support the enhanced hyphenation patterns. |
Small poke. I've got another book going to press next week and it's turning up a rather large number of these errors (hyphenation at apostrophe's not removing the apostrophe). If you've had any ideas about where in the process this should be implemented I'd be all up for trying it in the next few days. |
So, this works: \begin[papersize=a6]{document}
\language[main=tr]
\font[size=30pt]
Müjdesi\discretionary[prebreak="-",replacement="'"]nin
Müjdesi\discretionary[prebreak="-",replacement="'"]nin
\end{document} At the moment we don't have a way for hyphenation dictionaries to do that clever stuff. If it's a small number of obvious cases I would suggest either using an input filter or defining a command like The bigger fix is obviously to support enhanced hyphenation patterns as Khaled suggests. I will try to implement that over the next day or two. |
To be removed when [upstream issue][1] is fixed. [1]: sile-typesetter/sile#355
To be removed when [upstream issue][1] is fixed. [1]: sile-typesetter/sile#355
I don't know what to say. Either my testing two years ago was flawed (without an MWE that's entirely possible) or something else has changed in SILE since then, because Doing it my hand or for specific words would be a nightmare given the hundreds of words involved. I was able to get my current book looking better by preprocessing the Markdown source. This is a lot easier than trying to set it up in SILE using an input filter simply by virtue of brute force access to the raw text.
That skips all headings and matches apostrophes both preceded and succeeded by a letter character and replaces them with an inline SILE command (apostrophe hack) defined as follows: SILE.registerCommand("ah", function ()
SILE.call("discretionary", { prebreak = "-", replacement = "’" })
end) That will get me by for this book (and the output looks a lot better) but it also has some down sides. For whatever reason it completely changes the outcome of the line breaker even in cases where apostrophe's do not fall at a possible break point. Lines that had an apostrophed words even in the middle of a line are being broken at different points, and more emergency stretching is involved. I suspect the usual penalty weight imposed by hyphens is causing the math to work out differently, but I'm not sure exactly how. The change is certainly not for the better, so a proper fix to the hyphenation system to understand more advanced patterns is certainly still in order. Thanks for the input. |
Well, if that’s not a hyphenation point, don’t put a discretionary there! Maybe your Perl regexp needs adjusting? I am half way through implementing libhyphen support, but since I can’t find extended hyphenation patterns for Turkish, I am not sure how useful it will actually be for you... |
Adding the hack isn't the problem, If I don't do anything at all SILE makes the sames mistake only worse because it also dosen't make the replacement. In other news, the workaround I've been using for years now isn't serving any more in v0.14+, not sure why yet but introducing manual discretionary nodes now breaks alignment completely. |
This is not going to be easy to solve. I started tinkering with the hyphenator and got something working—only to run into another problem more visible than the one I was trying to solve. If you setup correct patterns for hyphenation around intra-word
That single letter should not have been allowed to break because it should have been considered the end of a word. My hack actually avoided this problem because it dropped a command in between two words so the tokenizer always treated them separately. Bah! |
So the feature added at my request in #265 to setup custom hyphenation patterns is great for words that are exceptions to the normal rules, but I'm finding it leaves me hanging when it comes to the actual rules.
I've recently discovered that there is more than one style guide for how to hyphenate words with internal apostrophes in Turkish. My personal preference is to place the hyphenation points after the apostrophe, but this is not what all publishing houses want to see. Many of them require the apostrophe to be dropped in the event that the word gets hyphenated at that point. For example
might normally hyphenate as
However in the event it is used as a proper noun the suffixes (after the third person possessive) are set apart with an apostrophe:
However in the event that a hyphenation point is actually used, the apostrophe should go away.
The problem here is two fold.
The text was updated successfully, but these errors were encountered: