Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

only warn on xfa if an acroform is not also present #12254

Closed
wants to merge 1 commit into from
Closed

only warn on xfa if an acroform is not also present #12254

wants to merge 1 commit into from

Conversation

escapewindow
Copy link
Contributor

No description provided.

@timvandermeij
Copy link
Contributor

timvandermeij commented Aug 20, 2020

(Related to #12217)

Unfortunately I don't think this works. If XFA is present, by definition AcroForm is too since XFA is inside the AcroForm dictionary, so the condition in this statement can never happen. Refer to

pdf.js/src/core/document.js

Lines 582 to 598 in 7edc5cb

// Check if AcroForms are present in the document.
try {
this.acroForm = this.catalog.catDict.get("AcroForm");
if (this.acroForm) {
this.xfa = this.acroForm.get("XFA");
const fields = this.acroForm.get("Fields");
if ((!Array.isArray(fields) || fields.length === 0) && !this.xfa) {
this.acroForm = null; // No fields and no XFA, so it's not a form.
}
}
} catch (ex) {
if (ex instanceof MissingDataException) {
throw ex;
}
info("Cannot fetch AcroForm entry; assuming no AcroForms are present");
this.acroForm = null;
}
for the XFA/AcroForm detection.

I'm not really sure what the exact relation is between XFA and AcroForms. I have always seen forms with either only AcroForm or with AcroForm and XFA, but in the combined cases I have yet to find a PDF that wasn't fillable when we ignore XFA (which we do now). That's why I think XFA is only an extension that we can ignore, but I don't know if there are real-life examples of files where an AcroForm form cannot be filled in properly because XFA features are missing (for example to expand/show hidden form fields based on a selection of another widget).

@brendandahl Do you perhaps know more about how XFA relates to AcroForm? My feeling tells me that we can safely ignore XFA completely and only log a warning for it (so no fallback bar) because I haven't seen any cases where XFA was required for filling in the form, but I'm not entirely sure. On the other hand, if such a file were to surface, we can always reconsider.

TLDR; I would say we only log a warning if XFA is present and nothing else (no fallback bar).

@escapewindow
Copy link
Contributor Author

Hm, maybe I could check for this.xfa && !this.acroForm.get("fields") (or the check (!Array.isArray(fields) || fields.length === 0), rather)? Removing the fallback bar is easy too.

@timvandermeij
Copy link
Contributor

If it can happen that XFA is present and the AcroForm dictionary contains no fields, then that might indeed be better since the user then correctly gets notified that filling the work definitely won't work. The question is: does the specification even allow that?

@escapewindow
Copy link
Contributor Author

I'm not sure -- I think @brendandahl said most xfa forms have acroforms, but some don't. The spec seems to say xfa forms should have acroforms. I pushed the fallback bar removal only change; I can push the "fallback bar if there's an xfa without acroform fields" change as well if wanted.

@timvandermeij
Copy link
Contributor

timvandermeij commented Aug 20, 2020

I can't find it directly in the specification, but Apache PDFBox does handle the case where XFA is present and the AcroForm fields are missing; see https://github.com/apache/pdfbox/blob/e3150a327c4291654451fce149f3665fa294e498/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/interactive/form/PDAcroForm.java#L645-L653. They call it "dynamic XFA" and I found that Wikipedia also mentions this at https://en.wikipedia.org/wiki/XFA.

I think the conclusion is:

  • Dynamic XFA: AcroForm -> XFA entry is present and AcroForm -> Fields is an empty array. This is unsupported and should show the fallback bar. I haven't found any viewer other that Adobe's viewers that can handle this, and it doesn't seem common at all.
  • Static XFA: AcroForm -> XFA entry is present and AcroForm -> Fields is a non-empty array. This is supported since we can use the AcroForm data and ignore the XFA data since it should not be important (if we ever find a file where it is, we can reconsider, but for now I don't have any reason to believe otherwise). From Wikipedia: "In a static form the form’s appearance and layout is fixed, regardless of the field content. Any unfilled fields are present in the form."

@escapewindow
Copy link
Contributor Author

I tried https://gist.github.com/escapewindow/2e13c19e1ba47e25c09570e234c7f509 , and got

Uncaught (in promise) TypeError: can't access property "get", this.pdfDocument.acroForm is undefined
    _initializeMetadata http://localhost:8888/web/app.js:1422
    load http://localhost:8888/web/app.js:1316
    open http://localhost:8888/web/app.js:832
57d4621c-0c4c-bf46-ac7a-86620ceec23c:1422:22
    _initializeMetadata http://localhost:8888/web/app.js:1455
    InterpretGeneratorResume self-hosted:1478
    AsyncFunctionNext self-hosted:684
    (Async: async)
    load http://localhost:8888/web/app.js:1316
    open http://localhost:8888/web/app.js:832

This is odd to me -- I'd guess if info.IsAcroFormPresent is set, that pdfDocument.acroForm would be set... either that's not true, or I have a bug I'm overlooking.

I may poke at changing docInfo to add an isDynamicXFAPresent ?

@timvandermeij
Copy link
Contributor

The AcroForm fields information is not exposed because it's internal functionality (only the getAnnotations API call gives out annotation information in a structured way). I'll try to give this some more thought over the weekend since the AcroForm/XFA logic isn't completely trivial now and it would be great if we can simplify it to also make the checks here much easier. I'll get back to you on this soon.

@escapewindow
Copy link
Contributor Author

Attempted IsDynamicXFAPresent in 7cd9848 . Previous attempts in https://github.com/mozilla/pdf.js/compare/master...escapewindow:broken-xfa-acroform?expand=1 .

The AcroForm fields information is not exposed because it's internal functionality (only the getAnnotations API call gives out annotation information in a structured way). I'll try to give this some more thought over the weekend since the AcroForm/XFA logic isn't completely trivial now and it would be great if we can simplify it to also make the checks here much easier. I'll get back to you on this soon.

Thanks!

@escapewindow
Copy link
Contributor Author

bdahl
aki: #1773 (comment) is an example where the viewer shows something totally different if XFA isn't supported
aki
boo
so we should have a warning but no fallback bar? or keep the current behavior?
bdahl
i guess for that type, show the fallback bar right away. i'm not actually sure how to detect the above type of PDF though
aki
yeah

@timvandermeij
Copy link
Contributor

I have spent the majority of today figuring out how this works exactly and it's a bit more difficult than I initially thought to cover all cases, but I did find a solution that covers all known cases including the one mentioned just above. I have made #12271 with the full solution.

@escapewindow
Copy link
Contributor Author

Closing in favor of #12271.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants