From 02f271928fd9c542d1828bfe01860326bedcdc28 Mon Sep 17 00:00:00 2001 From: Dan Fickle Date: Sat, 25 Jun 2016 21:14:10 +1000 Subject: [PATCH] For #8 - Move integration information to separate document and cleanup README. --- README.md | 231 ++---------------------------------- docs/integration-guide.md | 243 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 251 insertions(+), 223 deletions(-) create mode 100644 docs/integration-guide.md diff --git a/README.md b/README.md index 04804da71..39007e750 100644 --- a/README.md +++ b/README.md @@ -3,8 +3,14 @@ OPEN HTML TO PDF OVERVIEW ======== -Open HTML to PDF is a pure-Java library for rendering arbitrary well-formed XML -(or XHTML) using CSS 2.1 for layout and formatting, output to PDF and images. +Open HTML to PDF is a pure-Java library for rendering arbitrary well-formed XML/XHTML (and even HTML5) +using CSS 2.1 for layout and formatting, output to PDF and images. + +GETTING STARTED +======== +[Integration guide](docs/integration-guide.md) - get maven artifacts and code to get started. + +You could also try the browser example at ````/openhtmltopdf-examples/src/main/java/com/openhtmltopdf/demo/browser/BrowserStartup.java```` LICENSE ======== @@ -19,227 +25,6 @@ Open HTML to PDF uses a couple of FOSS packages to get the job done. A list of these, along with the license they each have, is listed in the LICENSE file in our distribution. -GETTING OPEN HTML TO PDF -======== -New releases of Open HTML to PDF will be distributed through Maven. Search maven for [com.openhtmltopdf](http://mvnrepository.com/artifact/com.openhtmltopdf). Current maven release is ````0.0.1-RC3````. - -GETTING STARTED -======== -There is a large amount of sample code under the openhtmltopdf-examples directory (integration guide and template guide to come). -You could try the browser example at ````/openhtmltopdf-examples/src/main/java/com/openhtmltopdf/demo/browser/BrowserStartup.java```` - -SIMPLE USAGE -======== -Add these to your maven dependencies section: -````xml - - - 0.0.1-RC3 - - - - com.openhtmltopdf - openhtmltopdf-core - ${openhtml.version} - - - com.openhtmltopdf - openhtmltopdf-pdfbox - ${openhtml.version} - - - - com.openhtmltopdf - openhtmltopdf-rtl-support - ${openhtml.version} - - ```` - Then you can use this code: - ````java -import java.io.FileOutputStream; -import java.io.IOException; -import java.io.OutputStream; -import com.openhtmltopdf.bidi.support.ICUBidiReorderer; -import com.openhtmltopdf.bidi.support.ICUBidiSplitter; -import com.openhtmltopdf.pdfboxout.PdfRendererBuilder; -import com.openhtmltopdf.pdfboxout.PdfRendererBuilder.TextDirection; - -public class SimpleUsage -{ - public static void main(String[] args) - { - new SimpleUsage().exportToPdfBox("file:///Users/user/path-to/document.xhtml", "/Users/user/path-to/output.pdf"); - } - - public void exportToPdfBox(String url, String out) - { - OutputStream os = null; - - try { - os = new FileOutputStream(out); - - try { - // There are more options on the builder than shown below. - PdfRendererBuilder builder = new PdfRendererBuilder(); - - // The following three lines are optional. Leave them out if you do not need - // RTL or bi-directional text layout. - builder.useBidiSplitter(new ICUBidiSplitter.ICUBidiSplitterFactory()); - builder.useBidiReorderer(new ICUBidiReorderer()); - builder.defaultTextDirection(TextDirection.LTR); - - builder.withUri(url); - builder.toStream(os); - builder.run(); - - } catch (Exception e) { - e.printStackTrace(); - // LOG exception - } finally { - try { - os.close(); - } catch (IOException e) { - // swallow - } - } - } - catch (IOException e1) { - e.printStackTrace(); - // LOG exception. - } - } -} -```` - -HTML5 PARSER -============ -While Open HTML to PDF works with a standard w3c DOM, the project provides a converter from the Jsoup HTML5 parser provided Document to -a w3c DOM Document. This allows you to parse and use HTML5, rather than the default strict XML required by the project. To use the converter, add this -dependency: -````xml - - com.openhtmltopdf - openhtmltopdf-jsoup-dom-converter - ${openhtml.version} - -```` -Then you can use one of the ````Jsoup.parse```` methods to parse HTML5 and ````DOMBuilder.jsoup2DOM```` to convert the Jsoup document to a w3c DOM one. -````java - public org.w3c.dom.Document html5ParseDocument(String urlStr, int timeoutMs) throws IOException - { - URL url = new URL(urlStr); - org.jsoup.nodes.Document doc; - - if (url.getProtocol().equalsIgnoreCase("file")) { - doc = Jsoup.parse(new File(url.getPath()), "UTF-8"); - } - else { - doc = Jsoup.parse(url, timeoutMs); - } - - return DOMBuilder.jsoup2DOM(doc); - } -```` -Then you can set the renderer document with ````builder.withW3cDocument(doc, url)```` in place of ````builder.withUri(url)````. - -PLUGGABLE HTTP CLIENT -================= -Open HTML to PDF makes it simple to plugin an external client for HTTP and HTTPS requests. In fact this is recommended if you are using -HTTP/HTTPS resources, as the built-in Java client is showing its age. For example, to use the excellent [OkHttp](http://square.github.io/okhttp/) library is -as simple as adding the following code: -````java - public static class OkHttpStreamFactory implements HttpStreamFactory { - private final OkHttpClient client = new OkHttpClient(); - - @Override - public HttpStream getUrl(String url) { - Request request = new Request.Builder() - .url(url) - .build(); - - try { - final Response response = client.newCall(request).execute(); - - return new HttpStream() { - @Override - public InputStream getStream() { - return response.body().byteStream(); - } - - @Override - public Reader getReader() { - return response.body().charStream(); - } - }; - } - catch (IOException e) { - e.printStackTrace(); - } - - return null; - } - } -```` -Then use ````builder.useHttpStreamImplementation(new OkHttpStreamFactory())````. - -CACHE BETWEEN RUNS -======= -By default, Open HTML to PDF should not cache anything between runs. However, it allows the user to plugin an external cache. It should -be noted that the URI received by the cache is already resolved (see below). Here is a simple external cache: -````java - public static class SimpleCache implements FSCache { - private final Map cache = new HashMap<>(); - - @Override - public Object get(FSCacheKey cacheKey) { - Object obj = cache.get(cacheKey); - System.out.println("Requesting: " + cacheKey.getUri() + " of type: " + cacheKey.getClazz().getName() + ", got it: " + (obj != null)); - return obj; - } - - @Override - public void put(FSCacheKey cacheKey, Object obj) { - System.out.println("Putting: " + cacheKey.getUri() + " of type: " + cacheKey.getClazz().getName()); - cache.put(cacheKey, obj); - } - } -```` -Of course, you may want to customize your cache by inspecting the URI or class name contained by cache key. Once you have a cache, you can set it -on the builder with ````builder.useCache(cache)````. - -URI RESOLVER -======= -By default, the code attempts to resolve relative URIs by using the document URI as a base URI. Absolute URIs are returned unchanged. If you wish to plugin your -own resolver, you can. This can not only resolve relative URIs but also resolve URIs in a private address space or even reject a URI. To use an external resolver -implement ````FSUriResolver```` and use it with ````builder.useUriResolver(new MyResolver())````. - -LOGGING -======= -Three options are provided by Open HTML to PDF. The default is to use java.util.logging. If you prefer to output using log4j or slf4j, adapters are provided: -````xml - - - com.openhtmltopdf - openhtmltopdf-slf4j - ${openhtml.version} - - - - com.openhtmltopdf - openhtmltopdf-log4j - ${openhtml.version} - -```` -Then at the start of your code, before calling any Open HTML to PDF methods, use this code: -````java - XRLog.setLoggingEnabled(true); - - // For slf4j: - XRLog.setLoggerImpl(new Slf4jLogger()); - // or for log4j 1.2.17: - XRLog.setLoggerImpl(new Log4JXRLogger()); -```` - CREDITS ======== Open HTML to PDF is based on [Flying-saucer](https://github.com/flyingsaucerproject/flyingsaucer). Credit goes to the contributors of that project. Code will also be used from [neoFlyingSaucer](https://github.com/danfickle/neoflyingsaucer) diff --git a/docs/integration-guide.md b/docs/integration-guide.md new file mode 100644 index 000000000..e3be570b6 --- /dev/null +++ b/docs/integration-guide.md @@ -0,0 +1,243 @@ +OPEN HTML TO PDF +--------- + +GETTING OPEN HTML TO PDF +======== +New releases of Open HTML to PDF will be distributed through Maven. Search maven for [com.openhtmltopdf](http://mvnrepository.com/artifact/com.openhtmltopdf). +Current maven release is ````0.0.1-RC3````. If you would like to be notified of new releases, please subscribe to the [Maven issue](https://github.com/danfickle/openhtmltopdf/issues/7). + +MAVEN ARTIFACTS +======== +Add these to your maven dependencies section as needed: +````xml + + + 0.0.1-RC3 + + + + + com.openhtmltopdf + openhtmltopdf-core + ${openhtml.version} + + + + + com.openhtmltopdf + openhtmltopdf-pdfbox + ${openhtml.version} + + + + + com.openhtmltopdf + openhtmltopdf-rtl-support + ${openhtml.version} + + + + + com.openhtmltopdf + openhtmltopdf-jsoup-dom-converter + ${openhtml.version} + + + + + com.openhtmltopdf + openhtmltopdf-slf4j + ${openhtml.version} + + + + + com.openhtmltopdf + openhtmltopdf-log4j + ${openhtml.version} + +```` + +MINIMAL USAGE +======== +Most of the options avaiable for PDF output are settable on the [PdfRendererBuilder](https://github.com/danfickle/openhtmltopdf/blob/open-dev-v1/openhtmltopdf-pdfbox/src/main/java/com/openhtmltopdf/pdfboxout/PdfRendererBuilder.java) builder class. This shows the minimal possible configuration to output a PDF from an XHTML document. + +````java +import java.io.FileOutputStream; +import java.io.IOException; +import java.io.OutputStream; +import com.openhtmltopdf.pdfboxout.PdfRendererBuilder; +import com.openhtmltopdf.pdfboxout.PdfRendererBuilder.TextDirection; + +public class SimpleUsage +{ + public static void main(String[] args) + { + new SimpleUsage().exportToPdfBox("file:///Users/user/path-to/document.xhtml", "/Users/user/path-to/output.pdf"); + } + + public void exportToPdfBox(String url, String out) + { + OutputStream os = null; + + try { + os = new FileOutputStream(out); + + try { + // There are more options on the builder than shown below. + PdfRendererBuilder builder = new PdfRendererBuilder(); + + builder.withUri(url); + builder.toStream(os); + builder.run(); + + } catch (Exception e) { + e.printStackTrace(); + // LOG exception + } finally { + try { + os.close(); + } catch (IOException e) { + // swallow + } + } + } + catch (IOException e1) { + e.printStackTrace(); + // LOG exception. + } + } +} +```` + +SUPPORT FOR BI-DIRECTIONAL (RTL) AND SHAPED TEXT +======== +````java +// Add these imports (and remember the rtl-support maven module). +import com.openhtmltopdf.bidi.support.ICUBidiReorderer; +import com.openhtmltopdf.bidi.support.ICUBidiSplitter; + +// Then call on the builder. +builder.useBidiSplitter(new ICUBidiSplitter.ICUBidiSplitterFactory()); +builder.useBidiReorderer(new ICUBidiReorderer()); +builder.defaultTextDirection(TextDirection.LTR); // OR RTL +```` + +HTML5 PARSER SUPPORT +============ +While Open HTML to PDF works with a standard w3c DOM, the project provides a converter from the Jsoup HTML5 parser provided Document to +a w3c DOM Document. This allows you to parse and use HTML5, rather than the default strict XML required by the project. + +Then you can use one of the ````Jsoup.parse```` methods to parse HTML5 and ````DOMBuilder.jsoup2DOM```` to convert the Jsoup document to a w3c DOM one. +````java + public org.w3c.dom.Document html5ParseDocument(String urlStr, int timeoutMs) throws IOException + { + URL url = new URL(urlStr); + org.jsoup.nodes.Document doc; + + if (url.getProtocol().equalsIgnoreCase("file")) { + doc = Jsoup.parse(new File(url.getPath()), "UTF-8"); + } + else { + doc = Jsoup.parse(url, timeoutMs); + } + + return DOMBuilder.jsoup2DOM(doc); + } +```` +Then you can set the renderer document with ````builder.withW3cDocument(doc, url)```` in place of ````builder.withUri(url)````. + +PLUGGABLE HTTP CLIENT +================= +Open HTML to PDF makes it simple to plugin an external client for HTTP and HTTPS requests. In fact this is recommended if you are using +HTTP/HTTPS resources, as the built-in Java client is showing its age. For example, to use the excellent [OkHttp](http://square.github.io/okhttp/) library is +as simple as adding the following code: +````java + public static class OkHttpStreamFactory implements HttpStreamFactory { + private final OkHttpClient client = new OkHttpClient(); + + @Override + public HttpStream getUrl(String url) { + Request request = new Request.Builder() + .url(url) + .build(); + + try { + final Response response = client.newCall(request).execute(); + + return new HttpStream() { + @Override + public InputStream getStream() { + return response.body().byteStream(); + } + + @Override + public Reader getReader() { + return response.body().charStream(); + } + }; + } + catch (IOException e) { + e.printStackTrace(); + } + + return null; + } + } +```` +Then use ````builder.useHttpStreamImplementation(new OkHttpStreamFactory())````. +The library should close the reader or stream when it is finished with it. + +CACHE BETWEEN RUNS +======= +By default, Open HTML to PDF should not cache anything between runs. However, it allows the user to plugin an external cache. It should +be noted that the URI received by the cache is already resolved (see below). Here is a simple external cache: +````java + public static class SimpleCache implements FSCache { + private final Map cache = new HashMap<>(); + + @Override + public Object get(FSCacheKey cacheKey) { + Object obj = cache.get(cacheKey); + System.out.println("Requesting: " + cacheKey.getUri() + " of type: " + cacheKey.getClazz().getName() + ", got it: " + (obj != null)); + return obj; + } + + @Override + public void put(FSCacheKey cacheKey, Object obj) { + System.out.println("Putting: " + cacheKey.getUri() + " of type: " + cacheKey.getClazz().getName()); + cache.put(cacheKey, obj); + } + } +```` +Of course, you may want to customize your cache by inspecting the URI or class name contained by cache key. Once you have a cache, you can set it +on the builder with ````builder.useCache(cache)````. + +URI RESOLVER +======= +By default, the code attempts to resolve relative URIs by using the document URI as a base URI. Absolute URIs are returned unchanged. If you wish to plugin your +own resolver, you can. This can not only resolve relative URIs but also resolve URIs in a private address space or even reject a URI. To use an external resolver +implement ````FSUriResolver```` and use it with ````builder.useUriResolver(new MyResolver())````. + +LOGGING +======= +Three options are provided by Open HTML to PDF. The default is to use java.util.logging. If you prefer to output using log4j or slf4j, adapters are provided. +Add the appropriate maven module, then at the start of your code, before calling any Open HTML to PDF methods, use this code: +````java + XRLog.setLoggingEnabled(true); + + // For slf4j: + XRLog.setLoggerImpl(new Slf4jLogger()); + // or for log4j 1.2.17: + XRLog.setLoggerImpl(new Log4JXRLogger()); +```` + +COMING SOON +======= ++ SVG support. ++ Loads more (stay tuned). + +FINALLY +======= +Thanks for using openhtmltopdf and please feel free to file any issues you are having trouble with. +