Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using usePdfAConformance resulting in missing fonts and other attributes #326

Closed
mattstjean opened this issue Feb 12, 2019 · 8 comments
Closed

Comments

@mattstjean
Copy link

mattstjean commented Feb 12, 2019

Before reading this, I only now saw the disclaimer of "Note: This is pre-release documentation. PDF/UA support will be released with RC-18."...So if this isn't supported yet, I'm sorry for the issue. --- Is there a timeline for RC-18? The project I'm working on requires the PDFs be compliant.

Summary
I am having an issue where when I add the line:
builder.usePdfAConformance(PdfAConformance.PDFA_1_A);
it causes my PDF to render blank, which I am assuming is due to the now-missing fonts.

I am also having issues when trying to include the line:
builder.useFastMode();
it causes the PDF to lose the Author attribute (only when used with PdfAConformance

Let me know if there is anything additional I can provide to get help with this.

Background

  • useFastMode() without usePdfAConformance

    • PDF is not tagged
    • Contains title, author, subject, description, and fonts
    • PDF does not contain language
    • Content is properly displayed
  • usePdfAConformance(PdfAConformance.PDFA_1_A) without useFastMode()

    • PDF is tagged
    • Displays a compliance notice when opened with Adobe Reader
    • Contains title, subject, description
    • PDF does not contain author, language, or fonts
    • Content is not displayed
  • useFastMode() and usePdfAConformance(PdfAConformance.PDFA_1_A)

    • PDF is not tagged (unexpected, PDF should be tagged)
    • Does not display a compliance notice when opened with Adobe Reader (unexpected, PDF should be claiming compliance)
    • Contains title, author, subject, description
    • Does not contain fonts or language
    • Content is not displayed
  • Neither useFastMode() nor usePdfAConformance(PdfAConformance.PDFA_1_A)

    • PDF is not tagged
    • Contains title, author, subject, description, fonts
    • No language
    • Content is displayed

Application Info

  • Generate HTML using freemarker to merge data with HTML template (resulting HTML is a string and not a file)
  • Generate PDF, I have tried this two ways based on examples I've found. I return a byte array because this is part of a webservice that receives JSON data and returns a PDF representation of the data.

Everything has been working perfectly, I've only run into issues when trying to make the application

Implementation 1

public byte[] generatePdf(final String html) throws Exception {
        System.out.println("in generate pdf");
        PdfRendererBuilder builder = new PdfRendererBuilder();
        builder.useFastMode();
        builder.usePdfAConformance(PdfAConformance.PDFA_1_A);

        Map<String, String> fonts = FontHelper.getFonts(true);
        fonts.forEach( (k, v) -> {
            if (k.contains("Bold") && k.contains("Italic")) {
                builder.useFont(new File(k), v, 700, FontStyle.ITALIC, true);
            } else if (k.contains("Bold")) {
                builder.useFont(new File(k), v, 700, FontStyle.NORMAL, true);
            } else if (k.contains("Italic")) {
                builder.useFont(new File(k), v, 400, FontStyle.ITALIC, true);
            } else if (k.contains("Regular")) {
                builder.useFont(new File(k), v, 400, FontStyle.NORMAL, true);
            }
        });

        builder.withHtmlContent(html, "/");
        ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
        builder.toStream(outputStream);

        builder.run();

        outputStream.close();
        return outputStream.toByteArray();
    }

Implementation 2

public byte[] generatePdf(final String html) throws IOException {
        System.out.println("in generate pdf");
        PdfRendererBuilder builder = new PdfRendererBuilder();

        builder.withHtmlContent(html, "/");
        ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
        builder.toStream(outputStream);
        try (PdfBoxRenderer pdfBoxRenderer = builder.buildPdfRenderer()) {
            pdfBoxRenderer.layout();
            pdfBoxRenderer.createPDF();
            pdfBoxRenderer.close();
        }
        outputStream.close();
        return outputStream.toByteArray();
    }

I have been doing this with a simplified HTML template until I get it to work before I switch back to my real template:

<html lang="EN-US">
    <head>
        <title>Example Title</title>
        <meta name="subject" content="Example Subject" />
        <meta name="author" content="Example Author" />
        <meta name="description" content="Example Description"/>

        <bookmarks>
            <bookmark name="First" href="#first" />
            <bookmark name="Second" href="#second" />
            <bookmark name="Third" href="#third" />
            <bookmark name="Fourth" href="#fourth" />
        </bookmarks>

        <style>
            .noto {
                font-family: "Noto Sans";
            }
            body {
                font-family: "Noto Sans";
            }
        </style>
    </head>
    <body>
        <h1> Title </h1>
        <h2> Subtitle </h2>
        <h3 id="first">Section 1 - First</h3>
        <div>asdoasok</div>
        <h3 id="second">Section 2 - Second</h3>
        <div>asodaokasd</div>
        <h3 id="third">Section 3 - Third</h3>
        <div>asdaklsdpkasd</div>
        <h3 id="fourth">Section 4 - Fourth</h3>
        <div>asodjasojaosdj</div>
    </body>
</html>
@mattstjean
Copy link
Author

In case you're interested in my Font strategy...it's essentially a copy-and-paste of one of the examples.

My fonts are located in my classpath: "project-root/src/main/resources/fonts"

public static Map<String, String> getFonts(boolean showErrors) {

        Map<String, String> fonts = new HashMap<String, String>();

        File fod = new File("src/main/resources/fonts");
        
        List<File> fontFiles = new ArrayList<File>();

        if (fod.isDirectory()) {
            fontFiles.addAll(Arrays.asList(fod.listFiles(new FilenameFilter(){
                public boolean accept(File file, String s) {
                    return s.endsWith(".ttf");
                }
            })));
        } else {
            fontFiles.add(fod);
        }

        System.out.println("Font files: " + fontFiles);

        List<String> errors = new ArrayList<String>();
        for (Iterator<File> fit = fontFiles.iterator(); fit.hasNext();) {
            File f = (File) fit.next();
            Font awtf = null;
            try {
                awtf = Font.createFont(Font.TRUETYPE_FONT, f);
            } catch (FontFormatException e) {
                log.error("Trying to load font via AWT: " + e.getMessage());
            } catch (IOException e) {
                log.error("Trying to load font via AWT: " + e.getMessage());
            }
            try {
                log.info("Font located at " + f.getPath() + "\n" +
                         " family name (reported by AWT): " + awtf.getFamily());
                fonts.put(f.getPath(), awtf.getFamily());
            } catch (RuntimeException e) {
                if (e.getMessage().contains("not a valid TTF or OTF file.")) {
                    errors.add(e.getMessage());
                } else if (e.getMessage().contains("Table 'OS/2' does not exist")) {
                    errors.add(e.getMessage());
                } else if (e.getMessage().contains("licensing restrictions.")) {
                    errors.add(e.getMessage());
                } else {
                    throw e;
                }
            }
        }
        if (errors.size() > 0) {
            if (showErrors) {
                log.error("Errors were reported on reading some font files.");
                for (Iterator<String> eit = errors.iterator(); eit.hasNext();) {
                    log.error(eit.next());
                }
            } else {
                log.error("Errors were reported on reading some font files. Pass true as an argument to show them, and re-call");
            }
        }

        return fonts;
    }

@danfickle
Copy link
Owner

Hi @mattstjean,

Thanks for the detailed write-up!

In regards to fonts, I think you're falling victim to #324. Either the font is not under that name (Noto Sans) or an exception is being thrown when PDFBOX loads it and silently discarded.

You could put something like this in a main method to check if it throwing:

PDDocument doc = new PDDocument();
try {
     PDType0Font.load(doc, new File("/path/to/font.ttf"));
} catch (Exception e) {
     e.printStackTrace();
}

As to the rest, I've just added a PDF/A testing module using VeraPDF. I used the following code to create the PDF:

        byte[] pdfBytes;
        
        try (PDDocument doc = new PDDocument()) {
            PdfRendererBuilder builder = new PdfRendererBuilder();
            builder.usePDDocument(doc);
            builder.useFastMode();
            //builder.testMode(true);
            builder.usePdfAConformance(conform);
            builder.useFont(new File("target/test/artefacts/Karla-Bold.ttf"), "TestFont");
            builder.withHtmlContent(html, PdfATester.class.getResource("/html/").toString());
    
            try (PdfBoxRenderer renderer = builder.buildPdfRenderer()) {
                renderer.createPDFWithoutClosing();
            }
    
            try (InputStream colorProfile = PdfATester.class.getResourceAsStream("/colorspaces/sRGB.icc")) {
                PDOutputIntent oi = new PDOutputIntent(doc, colorProfile); 
                oi.setInfo("sRGB IEC61966-2.1"); 
                oi.setOutputCondition("sRGB IEC61966-2.1"); 
                oi.setOutputConditionIdentifier("sRGB IEC61966-2.1"); 
                oi.setRegistryName("http://www.color.org"); 
                doc.getDocumentCatalog().addOutputIntent(oi);
            }
        
            ByteArrayOutputStream baos = new ByteArrayOutputStream();
            doc.save(baos);
            pdfBytes = baos.toByteArray();
        }

Note: I got the color space file from:
https://svn.apache.org/viewvc/pdfbox/trunk/examples/src/main/resources/org/apache/pdfbox/resources/pdfa/

The test reports the following problems:

DISTINCT ERRORS(all-in-one--1a) (4): [
    An annotation dictionary shall contain the F key. The F key’s Print flag bit shall be set to 1 and its Hidden, Invisible and NoView flag bits shall be set to 0
    root/document[0]/pages[0](9 0 obj PDPage)/annots[1](17 0 obj PDAnnot)
    If a document information dictionary does appear at a document, then all of its entries that have analogous properties in predefined XMP schemas, shall also be embedded in the file in XMP form with equivalent values.
    root
    If an Image dictionary contains the Interpolate key, its value shall be false
    root/document[0]/pages[0](9 0 obj PDPage)/contentStream[0](14 0 obj PDContentStream)/operators[203]/xObject[0](23 0 obj PDXImage)
    An XObject dictionary shall not contain the SMask key
    root/document[0]/pages[0](9 0 obj PDPage)/contentStream[0](14 0 obj PDContentStream)/operators[203]/xObject[0](23 0 obj PDXImage)
]

The XMP issue is probably where author is going. They appear to be all simple to fix, except for the SMask issue which is used to implement transparency in images. I guess for now, we could advise people not to use transparent PNGs?

In addition, PDF/A1a requires proper tagging. Fortunately, I've just implemented that for PDF/UA so that shouldn't be hard to get working.

danfickle added a commit that referenced this issue Mar 3, 2019
danfickle added a commit that referenced this issue Mar 3, 2019
Unfortunately, we are using some structure types introduced in PDF 1.5 so we are not PDF/A1a compliant, at least when using tables.
@danfickle
Copy link
Owner

UPDATE:

We are now compliant with PDF/A standards 1 and 2, except for PDF/A1a when using tables. This is because we are using the TFoot, TBody and THead structure types which were only introduced with PDF standard 1.5 (PDF/A1 is based on PDF 1.4).

So I'll have to find a way to factor out their use and then I can finally release RC-18.

Additionally, I forgot that we have a builder method to input the color profile, so updated code to use PDF/A standards is something like:

            PdfRendererBuilder builder = new PdfRendererBuilder();
            builder.useFastMode();
            //builder.testMode(true);
            builder.usePdfAConformance(conform);
            builder.useFont(new File("target/test/artefacts/Karla-Bold.ttf"), "TestFont");
            builder.withHtmlContent(html, PdfATester.class.getResource("/html/").toString());
    
            try (InputStream colorProfile = PdfATester.class.getResourceAsStream("/colorspaces/sRGB.icc")) {
                byte[] colorProfileBytes = IOUtils.toByteArray(colorProfile);
                builder.useColorProfile(colorProfileBytes);
            }
        
            ByteArrayOutputStream baos = new ByteArrayOutputStream();
            builder.toStream(baos);
            builder.run();

danfickle added a commit that referenced this issue Mar 4, 2019
Had to move the thead, tbody and tfoot rows directly as children of the table when using PDF version 1.4 or less.
@danfickle
Copy link
Owner

We're now PDF/A1a compliant as well. I've written up some guidelines for PDF/A compliance in the wiki.

Note in the example, the addition of this line:

builder.usePdfVersion(conform.getPart() == 1 ? 1.4f : 1.5f);

I think we can now close this issue. I'll release RC18 this week. Please re-open if you find any more issues with PDF/A. Thanks @mattstjean.

@jaapspiering
Copy link

jaapspiering commented Mar 4, 2019

Please re-open if you find any more issues with PDF/A. Thanks @mattstjean.

@danfickle : The initial issue (fonts are missing) still seems to appear if we use builder.usePdfUaAccessbility(true)
However, since this bug was about PDF/A compliancy and not necessarily PDF/UA, would that be a separate bug?

@danfickle
Copy link
Owner

Hi @mattstjean,

We can discuss here. Firstly, just making sure you know that src/main/resources will not be a directory when your project is compiled into a jar?

@mattstjean
Copy link
Author

mattstjean commented Apr 8, 2019

Thank you for all of the help, @danfickle . Sorry about the delay in responding, I've been very busy and wanted to try it out before responding.

I figured out my main issue with the fonts.

The first fix was to get them properly (I had been trying all different variants because I wasn't sure why it wasn't working). I landed with:

ClassLoader classLoader = getClass().getClassLoader();
File regFile = new File(classLoader.getResource("fonts/NotoSans-Regular.ttf").getFile());
builder.useFont(regFile, "noto", 400, FontStyle.NORMAL, true);

Then I hit a snag and needed a second fix that wasn't as obvious to me. It was actually caused by the way I had the page counter set up. It's not in my initial example above because I added it after the fonts worked. The way I had it was:

@bottom-right {
    content: 'Page ' counter(page) ' of ' counter(pages);
}

I had it like that and then like:

@bottom-right {
    content: 'Page ' counter(page) ' of ' counter(pages);
    font-family: 'noto', sans-serif;
    font-size: 12;
}

Both of those didn't work and I was getting a lot of errors saying "Font list empty" or something similar. When I changed it to

@bottom-right {
    font-family: 'noto', sans-serif;
    font-size: 12;
    content: 'Page ' counter(page) ' of ' counter(pages);
}

it worked. In your html examples you have it that way too, so when I was doing a manual diff between my html and yours - I finally figured it out.


I am having an issue now where the document language isn't getting set. When I run the adobe acrobat pro dc accessibility full check, it catches 2 fails:

Primary language | Failed | Text language is specified
Title | Failed | Document title is showing in title bar

The title I'm not super worried about because when I look at the document properties it does have a value for title. The thing that I'm trying to figure out is why language is not getting set. I get the same 2 fails when I run it on a PDF generated from your all-in-one.html test file.

<html lang="EN-US">
    <head>
        <title>Summary</title>
        <meta name="subject" content="Summary" />
        <meta name="author" content="Business" />
        <meta name="description" content="Request Summary"/>

Let me know if I should open a different issue.

@danfickle
Copy link
Owner

Closing in favor of #347. The order of properties situation is bizarre. Not sure what is happening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants