You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When attempting to produce PDF reports in Thai, in which spaces are used as punctuation and not as word seperators (which are not visible; they can be represented by zero-width spaces), line breaks do not appear to be functioning as intended. Instead of occuring for Thai words, they often occur when there is a space from an incidental English word or phrases that has spaces.
So, while Thai text should wrap using the default wrapmode of Word, its not happening as intended and results in incorrectly broken text:
Error details
I think this occurs because the line_break.py code specifically looks for spaces (' '),
and consequently, when lines are checked for appropriate terminating characters first against a space --- this checks for ' ' but no wrapping is made for zero-width spaces:
ifcharacter==SPACE: # must come first, always drop a current space.
Perhaps a simple solution would be to test for inclusion in a list to see if character is in the list of space or zero-width space:
if character in [SPACE, ZWS]:
Then I think either scenario should result in a wrap?
Minimal code
fromfpdfimportFPDFpdf=FPDF()
# using font from FPDF font pack https://github.com/reingart/pyfpdf/releases/download/binary/fpdf_unicode_font_pack.zipfont_path='configuration/fonts/fpdf_unicode_font_pack/Waree.ttf'pdf.add_font(fname=font_path)
pdf.set_font('Waree', size=12)
pdf.add_page()
pdf.write(8, u"Thai (ideally wouldn't wrap after the space after 1000'): นโยบายสาธารณะมีความสำคัญต่อการสนับสนุนการออกแบบและการสร้างชุมชนและเมืองสุขภาพดีและยั่งยืน รายการตรวจสอบนโยบายความท้าทาย 1,000 เมืองสำหรับใช้เพื่อประเมินการมีอยู่และคุณภาพของนโยบายที่สอดคล้องกับหลักฐานและหลักการสำหรับเมืองที่มีสุขภาพดีและยั่งยืน")
pdf.output("unicode.pdf")
Here is the example output generated from the above code that illustrates the issue:
Just so its clearer, the Thai text there does contain word wrap indicators in the form of zero-width spaces (U+200B). You can view the above text with hyphens instead to see that there would be other wrapping opportunities,
Caveat
I cannot read Thai myself, but am working with others who do who advised me of this proble. I understand that others have used FPDF2 with Thai (as indicated in issues I've searched, and the documentation). Perhaps there are other ways to get correct word wrapping working for Thai? I couldn't figure it out or find a solution in documentation or issues, so thought I'd check in first. If others think its worth pursuing I could attempt to make a code edit for this.
The text was updated successfully, but these errors were encountered:
carlhiggs
changed the title
Word wrapping for Thai text (zero-width space) may not work as intended // wrapmode==WORD should also use zero-width space
Word wrapping for Thai text (zero-width space) may not work as intended // wrapmode==WORD should also use zero-width space?
Jun 3, 2024
Ideally we should implement the Unicode line breaking algorithm in fpdf2 to produce results similar to document editors and browsers.
Sounds great @andersonhc ; that's probably beyond what I'd have capacity to assist with right now, but after writing the above I made a quick sketch and tested that my suggestion would work for now; I believe it does. I went a head and made a pull request. The unicode algorithm sounds like an ideal solution, but if this pull request could work for now, at least for my purposes, I think it would address the issue.
Hope the pull request is useful, at least as a short/medium-term solution.
When attempting to produce PDF reports in Thai, in which spaces are used as punctuation and not as word seperators (which are not visible; they can be represented by zero-width spaces), line breaks do not appear to be functioning as intended. Instead of occuring for Thai words, they often occur when there is a space from an incidental English word or phrases that has spaces.
So, while Thai text should wrap using the default wrapmode of Word, its not happening as intended and results in incorrectly broken text:
Error details
I think this occurs because the
line_break.py
code specifically looks for spaces (' '),fpdf2/fpdf/line_break.py
Line 20 in f0bd468
when defining
space_break_hint
sfpdf2/fpdf/line_break.py
Lines 452 to 460 in f0bd468
and consequently, when lines are checked for appropriate terminating characters first against a space --- this checks for ' ' but no wrapping is made for zero-width spaces:
fpdf2/fpdf/line_break.py
Line 671 in f0bd468
Perhaps a simple solution would be to test for inclusion in a list to see if character is in the list of space or zero-width space:
Then I think either scenario should result in a wrap?
Minimal code
Here is the example output generated from the above code that illustrates the issue:
Just so its clearer, the Thai text there does contain word wrap indicators in the form of zero-width spaces (U+200B). You can view the above text with hyphens instead to see that there would be other wrapping opportunities,
Ideally the fpdf2 output would be more like displayed here in the browser:
Caveat
I cannot read Thai myself, but am working with others who do who advised me of this proble. I understand that others have used FPDF2 with Thai (as indicated in issues I've searched, and the documentation). Perhaps there are other ways to get correct word wrapping working for Thai? I couldn't figure it out or find a solution in documentation or issues, so thought I'd check in first. If others think its worth pursuing I could attempt to make a code edit for this.
The text was updated successfully, but these errors were encountered: