Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The PDF file shows #### for global character #174

Closed
AbnerSui opened this issue Feb 8, 2018 · 10 comments
Closed

The PDF file shows #### for global character #174

AbnerSui opened this issue Feb 8, 2018 · 10 comments

Comments

@AbnerSui
Copy link

AbnerSui commented Feb 8, 2018

Hi,
I had a try on openhtmltopdf sandbox, but it shows # for global character.
image

@danfickle
Copy link
Owner

Hi @AbnerSui

The sandbox doesn’t include cjk fonts so you’ll have to set up a project on your computer. #129 outlines how to use CJK fonts.

@vipcxj
Copy link
Contributor

vipcxj commented Feb 13, 2018

The solution in tutorial works very well. Perhaps this works as well:

 GraphicsEnvironment ge = GraphicsEnvironment.getLocalGraphicsEnvironment();
 ge.registerFont(Font.createFont(Font.TRUETYPE_FONT, new File("A.ttf")));

I haven't test yet.

@AbnerSui
Copy link
Author

AbnerSui commented Feb 22, 2018

@danfickle thanks for the reply. I have 2 questions.

  1. The global characters, 中文字符, has no relations with any fonts, why should I use CJK font, why not other fonts?
  2. If there are multiple fonts used in HTML, I need to add all the fonts to the PdfRendererBuilder?

@AbnerSui
Copy link
Author

Hi @danfickle , sorry to disturb you. But I have multiple languages need to support, only CJK font seems not enough. Can I ask when you plan to support global characters ? In fact, I can accept to lose the font style, but can't accept # showing there. Thank you so much for building such a great library.

@rototor
Copy link
Contributor

rototor commented Feb 23, 2018

@AbnerSui If you need all different languages you just need to declare the font family according and register all needed fonts. E.g.

*  {
font-family:Arial, "Malgun Gothic", MingLiU, "MS Gothic", "Microsoft JhengHei", "Noto Sans", Hevletica, Sans-serif, serif';
}

And of course you muster register all this different font files in the builder. If you register the files with all styles then you also get the different styles. This are the names of the windows font files for this fonts:

	static final String[] malgun = new String[] { "malgun.ttf", "malgunbd.ttf", };
	static final String[] mingliu = new String[] { "mingliu.ttc" };
	static final String[] msgothic = new String[] { "msgothic.ttc" };
	static final String[] msj = new String[] { "msjh.ttf", "msjhbd.ttf" };
       // Only on Win10+, not Win 7
	static final String[] msj_ttc = new String[] { "msjh.ttc", "msjhbd.ttc" };

You must locate (in %WINDIR%/fonts or where every) all this fonts and register them in the builder.

Note: The sandbox is outdated, don't test this with the sandbox.

Having different glyphs and fonts is working fine for me.

@AbnerSui
Copy link
Author

AbnerSui commented Feb 24, 2018

Hi @rototor , thank you for reply.
I met one wierd thing, HTML like below

<html>
<head>
<title></title>
<style type="text/css">html{font-family:sans-serif,simfang;}</style>
</head>
<body>
<span style="font-family:sans-serif;font-weight:bold;">中文字符</span><br />
<span style="font-family:simfang;font-weight:bold;">中文字符</span><br />
中文字符<br />
<body>
</html>

My java code is like below

PdfRendererBuilder builder = new PdfRendererBuilder();
builder.useUnicodeBidiSplitter(new ICUBidiSplitter.ICUBidiSplitterFactory());
builder.useUnicodeBidiReorderer(new ICUBidiReorderer());
builder.defaultTextDirection(PdfRendererBuilder.TextDirection.LTR);
builder.useFont(() -> {
                    try {
                        return new FileInputStream("simfang.ttf");
                    } catch (FileNotFoundException e) {
                        mLog.error("simfang not found");
                        return null;
                    }
                }, "simfang", 400, PdfRendererBuilder.FontStyle.NORMAL, true);
builder.useFont(() -> {
                    try {
                        return new FileInputStream("sans-serif.ttf");
                    } catch (FileNotFoundException e) {
                        mLog.error("sans-serif not found");
                        return null;
                    }
                }, "sans-serif", 400, PdfRendererBuilder.FontStyle.NORMAL, true);
OutputStream os = new FileOutputStream("test.pdf");
org.w3c.dom.Document doc = html5ParseDocument(htmlString, htmlFileInputStream);
builder.withW3cDocument(doc, htmlFileInputStream);
builder.toStream(os);
builder.run();

But the PDF result is wierd, like below

####
中文字符
中文字符

The first line became ####, but the style is still there, font-weight:bold;
The second shows correctly, but lose style, font-weight:bold;
The third line shows correctly.

@rototor
Copy link
Contributor

rototor commented Feb 25, 2018

Thats to be expected. Only the fonts you specify in font-family are used, in the order you specify them.

sans-serif is the builtin default font. It has font styles and all that stuff. But is does not have the glyphs for the Chinese characters, usually it only has a subset of all possible characters.

The font-family logic works the following: For each character it is tried to look up a glyph. If a font does not exists, then that font family is skipped (i.e. thats the way font-family is used mostly on the web). But also if that font does not have a glyph for the given character the font is skip. When a font is skipped the next font in the font-family is tried, till a matching character is found. If no font provides a glyph, then OpenHtmlToPDF falls back to print #, so that you see that there is a something missing.

The second and the third line should be the same, shouldn't they? Can you provide a screenshot of how this looks? (you can paste screenshots directly into the comment field of githup)

You did not provide a version of simfang for FontStyle.BOLD, so of course the font wont display "bold". You need a font file for each font variation you want. E.g. in the JDK 9 font folder there are different versions of the LucidaBright font:

LucidaBrightRegular.ttf
LucidaBrightItalic.ttf 
LucidaBrightDemiItalic.ttf
LucidaBrightDemiBold.ttf 

If you would like to use them you would need to register them all and specify the correct weight and style while registering. It does not matter if you register the file using the API or using the standard CSS font registration rule (see e.g. https://www.w3schools.com/cssref/css3_pr_font-face_rule.asp).

In Java2D (Graphics2D etc.) it is possible to let the Java synthesize different font variations. Any typographer would call them dead ugly (in opposite to correctly hand crafted bold or italic fonts), but the API provides the possibility to draw the font with styles the font does not really provide. It then just gets streched (in case of bold) or skewed (in case of italics).

PDF on the other side does not provide this possibility. If you don't have a font with a given style, then it wont show up as e.g. bold. You can always use the builtin PDF fonts (e.g. Helvetica) which provides all font variations. But when using custom fonts you need to provide all font variations.

@AbnerSui
Copy link
Author

@rototor thanks very much. Your reply helps to understand the library behavior better.
Thank you again for offering me so much help and building such a great library.
Finally, if someday the library can support all languages without showing #, it would be awesome.
Looking forward to it.

@AbnerSui
Copy link
Author

Reopen for other users, the comments may help them.

@AbnerSui AbnerSui reopened this Feb 28, 2018
@AbnerSui
Copy link
Author

We found one font from Windows operation system, arialuni.ttf, arial unicode ms, which can be used on almost all languages in the world. Hope this work around can help other users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants