Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support XML Forms Architecture (XFA) forms #2373

Closed
abevoelker opened this issue Nov 12, 2012 · 19 comments
Closed

Support XML Forms Architecture (XFA) forms #2373

abevoelker opened this issue Nov 12, 2012 · 19 comments
Assignees
Labels

Comments

@abevoelker
Copy link

XFA forms are XML-based forms created by Adobe's LiveCycle Designer tool. They offer enhanced features over the old AcroForms method of creating fillable PDF forms with things like growable text fields (which can overflow across page boundaries) and "Rich Text," which is a subset of XHTML and CSS for stylizing text. There is also support for running JavaScript and FormCalc (a proprietary scripting language) scripts for manipulating data using a
"Scripting Object Model" (SOM - looks sort of like XPath). Other features include network interaction with servers (e.g. form submission) using HTTP or WSDL/SOAP. I believe you can also embed regular PDF documents inside XFA.

I don't pretend to understand XFA (the standard is here), but I have had to deal with it recently as it seems to be the direction that Adobe is pushing towards as until very recently (with the latest version of iText), the only reliable way of manipulating these forms were with Adobe's LiveCycle Server product (epic $). And so far, the only renderer that I've used that can even display them is Adobe Acrobat.

I see XFA has been mentioned on the Mozilla wiki and on the mailing list and it looks like there is some trepidation in supporting it as it is still a proprietary standard. If this is still the case, does that mean that pull requests would also not be accepted relating to XFA?

I can totally understand why you would not want to support a proprietary and complicated standard, but thought I would bring it up anyway as it's frustrating being locked into Adobe Reader and LiveCycle Server products. Thanks!

@nacengineer
Copy link

+1

@brendandahl
Copy link
Contributor

Since this isn't currently part of the ISO standard there are no plans to implement it. Until it is in the ISO standard, it would probably be best if the support was added in some kind of "extension" to pdf.js in a separate repo.

If someone is serious and wants put the work into adding it into mainstream pdf.js they should ping me and I could look into the implications.

@brendensoares
Copy link

I would be interested in working on an extension to support this. It seems pdf2json already has written an extension for pdf.js. Any comments on this?

@fbender
Copy link

fbender commented Nov 16, 2015

This should be revisited. Even if XFA is not a de-jure standard, it unfortunately is a de-facto standard (along AcroForms) with great distribution and (medium) renderer support. Since browsers are deprecating NPAPI plugins, it will (for a lot of existing content and content still generated today) no longer be possible to fill out PDFs with forms within the browser, leading to users being annoyed and/or websites force-downloading PDF files to be opened with another PDF reader. Supporting forms in PDF (both XFA and AcroForms) is a unique feature that has the potential to influence the market. Issue #1459 contains further reasoning.

@wanghaisheng
Copy link

any update?

@timvandermeij
Copy link
Contributor

Currently we're working on AcroForm support (in #7613). XFA support is not planned, but I think the AcroForms work should definitely help to simplify the implementation for XFA support.

@Vinmj
Copy link

Vinmj commented Jan 5, 2018

Any updates on supporting LiveCycle PDF Documents?

timvandermeij added a commit to timvandermeij/pdf.js that referenced this issue Aug 23, 2020
…unit tests

Dynamic XFA is when the form's elements are not fixed, which is not
supported by PDF.js and tracked in mozilla#2373. If that is used, the fallback
bar should be triggered. We do support AcroForm and static XFA now, the
latter being when the form's elements are fixed, basically meaning that
it's equal to AcroForm (we have not found documents where the XFA
features were required for filling out the document).

This commit makes sure that detection of the three form types is in one
place, easy to understand and covered by unit tests so that our logic is
clearly documented, should we ever want to make additional changes for
this.
timvandermeij added a commit to timvandermeij/pdf.js that referenced this issue Aug 23, 2020
Good form type detection is important to get reliable telemetry and to
only show the fallback bar if a form cannot be filled out by the user.

PDF.js only supports AcroForm data, so XFA data is explicitly unsupported
(tracked in issue mozilla#2373). However, the previous form type detection
couldn't separate AcroForm and XFA well enough, causing form type
telemetry to be incorrect sometimes and the fallback bar to be shown for
forms that could in fact be filled out by the user.

The solution in this commit is found by studying the specification and
the form documents that are available to us. In a nutshell the rules are:

- There is XFA data if the `XFA` entry is a non-empty array or stream.
- There is AcroForm data if the `Fields` entry is a non-empty array and
  it doesn't consist of only digital signatures.

The digital signatures part was not handled in the old code, causing a
document with only XFA data to also be marked as having AcroForm data.
Moreover, the old code didn't check all the data types.

Now that AcroForm and XFA can be distinguished, the viewer is configured
to only show the fallback bar for documents that only have XFA data. If
a document also has AcroForm data, the viewer can use that to render the
form. We have not found documents where the XFA data was necessary in
that case.

Finally, we include unit tests to ensure that all cases are covered and
move the form type detection out of the `parse` function so that it's
only executed if the document information is actually requested
(potentially making initial parsing a tiny bit faster).
timvandermeij added a commit to timvandermeij/pdf.js that referenced this issue Aug 23, 2020
Good form type detection is important to get reliable telemetry and to
only show the fallback bar if a form cannot be filled out by the user.

PDF.js only supports AcroForm data, so XFA data is explicitly unsupported
(tracked in issue mozilla#2373). However, the previous form type detection
couldn't separate AcroForm and XFA well enough, causing form type
telemetry to be incorrect sometimes and the fallback bar to be shown for
forms that could in fact be filled out by the user.

The solution in this commit is found by studying the specification and
the form documents that are available to us. In a nutshell the rules are:

- There is XFA data if the `XFA` entry is a non-empty array or stream.
- There is AcroForm data if the `Fields` entry is a non-empty array and
  it doesn't consist of only digital signatures.

The digital signatures part was not handled in the old code, causing a
document with only XFA data to also be marked as having AcroForm data.
Moreover, the old code didn't check all the data types.

Now that AcroForm and XFA can be distinguished, the viewer is configured
to only show the fallback bar for documents that only have XFA data. If
a document also has AcroForm data, the viewer can use that to render the
form. We have not found documents where the XFA data was necessary in
that case.

Finally, we include unit tests to ensure that all cases are covered and
move the form type detection out of the `parse` function so that it's
only executed if the document information is actually requested
(potentially making initial parsing a tiny bit faster).
timvandermeij added a commit to timvandermeij/pdf.js that referenced this issue Aug 23, 2020
Good form type detection is important to get reliable telemetry and to
only show the fallback bar if a form cannot be filled out by the user.

PDF.js only supports AcroForm data, so XFA data is explicitly unsupported
(tracked in issue mozilla#2373). However, the previous form type detection
couldn't separate AcroForm and XFA well enough, causing form type
telemetry to be incorrect sometimes and the fallback bar to be shown for
forms that could in fact be filled out by the user.

The solution in this commit is found by studying the specification and
the form documents that are available to us. In a nutshell the rules are:

- There is XFA data if the `XFA` entry is a non-empty array or stream.
- There is AcroForm data if the `Fields` entry is a non-empty array and
  it doesn't consist of only invisible digital document signatures.

The digital signatures part was not handled in the old code, causing a
document with only XFA data to also be marked as having AcroForm data.
Moreover, the old code didn't check all the data types.

Now that AcroForm and XFA can be distinguished, the viewer is configured
to only show the fallback bar for documents that only have XFA data. If
a document also has AcroForm data, the viewer can use that to render the
form. We have not found documents where the XFA data was necessary in
that case.

Finally, we include unit tests to ensure that all cases are covered and
move the form type detection out of the `parse` function so that it's
only executed if the document information is actually requested
(potentially making initial parsing a tiny bit faster).
timvandermeij added a commit to timvandermeij/pdf.js that referenced this issue Aug 23, 2020
Good form type detection is important to get reliable telemetry and to
only show the fallback bar if a form cannot be filled out by the user.

PDF.js only supports AcroForm data, so XFA data is explicitly unsupported
(tracked in issue mozilla#2373). However, the previous form type detection
couldn't separate AcroForm and XFA well enough, causing form type
telemetry to be incorrect sometimes and the fallback bar to be shown for
forms that could in fact be filled out by the user.

The solution in this commit is found by studying the specification and
the form documents that are available to us. In a nutshell the rules are:

- There is XFA data if the `XFA` entry is a non-empty array or stream.
- There is AcroForm data if the `Fields` entry is a non-empty array and
  it doesn't consist of only invisible digital document signatures.

The digital signatures part was not handled in the old code, causing a
document with only XFA data to also be marked as having AcroForm data.
Moreover, the old code didn't check all the data types.

Now that AcroForm and XFA can be distinguished, the viewer is configured
to only show the fallback bar for documents that only have XFA data. If
a document also has AcroForm data, the viewer can use that to render the
form. We have not found documents where the XFA data was necessary in
that case.

Finally, we include unit tests to ensure that all cases are covered and
move the form type detection out of the `parse` function so that it's
only executed if the document information is actually requested
(potentially making initial parsing a tiny bit faster).
timvandermeij added a commit to timvandermeij/pdf.js that referenced this issue Aug 23, 2020
Good form type detection is important to get reliable telemetry and to
only show the fallback bar if a form cannot be filled out by the user.

PDF.js only supports AcroForm data, so XFA data is explicitly unsupported
(tracked in issue mozilla#2373). However, the previous form type detection
couldn't separate AcroForm and XFA well enough, causing form type
telemetry to be incorrect sometimes and the fallback bar to be shown for
forms that could in fact be filled out by the user.

The solution in this commit is found by studying the specification and
the form documents that are available to us. In a nutshell the rules are:

- There is XFA data if the `XFA` entry is a non-empty array or stream.
- There is AcroForm data if the `Fields` entry is a non-empty array and
  it doesn't consist of only invisible digital document signatures.

The digital signatures part was not handled in the old code, causing a
document with only XFA data to also be marked as having AcroForm data.
Moreover, the old code didn't check all the data types.

Now that AcroForm and XFA can be distinguished, the viewer is configured
to only show the fallback bar for documents that only have XFA data. If
a document also has AcroForm data, the viewer can use that to render the
form. We have not found documents where the XFA data was necessary in
that case.

Finally, we include unit tests to ensure that all cases are covered and
move the form type detection out of the `parse` function so that it's
only executed if the document information is actually requested
(potentially making initial parsing a tiny bit faster).
timvandermeij added a commit to timvandermeij/pdf.js that referenced this issue Aug 23, 2020
Good form type detection is important to get reliable telemetry and to
only show the fallback bar if a form cannot be filled out by the user.

PDF.js only supports AcroForm data, so XFA data is explicitly unsupported
(tracked in issue mozilla#2373). However, the previous form type detection
couldn't separate AcroForm and XFA well enough, causing form type
telemetry to be incorrect sometimes and the fallback bar to be shown for
forms that could in fact be filled out by the user.

The solution in this commit is found by studying the specification and
the form documents that are available to us. In a nutshell the rules are:

- There is XFA data if the `XFA` entry is a non-empty array or stream.
- There is AcroForm data if the `Fields` entry is a non-empty array and
  it doesn't consist of only document signatures.

The document signatures part was not handled in the old code, causing a
document with only XFA data to also be marked as having AcroForm data.
Moreover, the old code didn't check all the data types.

Now that AcroForm and XFA can be distinguished, the viewer is configured
to only show the fallback bar for documents that only have XFA data. If
a document also has AcroForm data, the viewer can use that to render the
form. We have not found documents where the XFA data was necessary in
that case.

Finally, we include unit tests to ensure that all cases are covered and
move the form type detection out of the `parse` function so that it's
only executed if the document information is actually requested
(potentially making initial parsing a tiny bit faster).
timvandermeij added a commit to timvandermeij/pdf.js that referenced this issue Aug 24, 2020
Good form type detection is important to get reliable telemetry and to
only show the fallback bar if a form cannot be filled out by the user.

PDF.js only supports AcroForm data, so XFA data is explicitly unsupported
(tracked in issue mozilla#2373). However, the previous form type detection
couldn't separate AcroForm and XFA well enough, causing form type
telemetry to be incorrect sometimes and the fallback bar to be shown for
forms that could in fact be filled out by the user.

The solution in this commit is found by studying the specification and
the form documents that are available to us. In a nutshell the rules are:

- There is XFA data if the `XFA` entry is a non-empty array or stream.
- There is AcroForm data if the `Fields` entry is a non-empty array and
  it doesn't consist of only document signatures.

The document signatures part was not handled in the old code, causing a
document with only XFA data to also be marked as having AcroForm data.
Moreover, the old code didn't check all the data types.

Now that AcroForm and XFA can be distinguished, the viewer is configured
to only show the fallback bar for documents that only have XFA data. If
a document also has AcroForm data, the viewer can use that to render the
form. We have not found documents where the XFA data was necessary in
that case.

Finally, we include unit tests to ensure that all cases are covered and
move the form type detection out of the `parse` function so that it's
only executed if the document information is actually requested
(potentially making initial parsing a tiny bit faster).
timvandermeij added a commit to timvandermeij/pdf.js that referenced this issue Aug 24, 2020
Good form type detection is important to get reliable telemetry and to
only show the fallback bar if a form cannot be filled out by the user.

PDF.js only supports AcroForm data, so XFA data is explicitly unsupported
(tracked in issue mozilla#2373). However, the previous form type detection
couldn't separate AcroForm and XFA well enough, causing form type
telemetry to be incorrect sometimes and the fallback bar to be shown for
forms that could in fact be filled out by the user.

The solution in this commit is found by studying the specification and
the form documents that are available to us. In a nutshell the rules are:

- There is XFA data if the `XFA` entry is a non-empty array or stream.
- There is AcroForm data if the `Fields` entry is a non-empty array and
  it doesn't consist of only document signatures.

The document signatures part was not handled in the old code, causing a
document with only XFA data to also be marked as having AcroForm data.
Moreover, the old code didn't check all the data types.

Now that AcroForm and XFA can be distinguished, the viewer is configured
to only show the fallback bar for documents that only have XFA data. If
a document also has AcroForm data, the viewer can use that to render the
form. We have not found documents where the XFA data was necessary in
that case.

Finally, we include unit tests to ensure that all cases are covered and
move the form type detection out of the `parse` function so that it's
only executed if the document information is actually requested
(potentially making initial parsing a tiny bit faster).
timvandermeij added a commit to timvandermeij/pdf.js that referenced this issue Aug 25, 2020
Good form type detection is important to get reliable telemetry and to
only show the fallback bar if a form cannot be filled out by the user.

PDF.js only supports AcroForm data, so XFA data is explicitly unsupported
(tracked in issue mozilla#2373). However, the previous form type detection
couldn't separate AcroForm and XFA well enough, causing form type
telemetry to be incorrect sometimes and the fallback bar to be shown for
forms that could in fact be filled out by the user.

The solution in this commit is found by studying the specification and
the form documents that are available to us. In a nutshell the rules are:

- There is XFA data if the `XFA` entry is a non-empty array or stream.
- There is AcroForm data if the `Fields` entry is a non-empty array and
  it doesn't consist of only document signatures.

The document signatures part was not handled in the old code, causing a
document with only XFA data to also be marked as having AcroForm data.
Moreover, the old code didn't check all the data types.

Now that AcroForm and XFA can be distinguished, the viewer is configured
to only show the fallback bar for documents that only have XFA data. If
a document also has AcroForm data, the viewer can use that to render the
form. We have not found documents where the XFA data was necessary in
that case.

Finally, we include unit tests to ensure that all cases are covered and
move the form type detection out of the `parse` function so that it's
only executed if the document information is actually requested
(potentially making initial parsing a tiny bit faster).
@nekohayo
Copy link

In case this might somehow be useful, perhaps you could use PDFium's XFA implementation code as inspiration for an implementation in pdfjs? I have no idea if it actually works, in any case it's probably better than nothing: https://github.com/chromium/pdfium/tree/master/xfa

@ludakhris
Copy link

I hate to keep asking about this but has there been any progress on this???

@timvandermeij
Copy link
Contributor

XFA support is being developed and experimental, but is progressing.

@ludakhris
Copy link

Thanks so much @timvandermeij is there a timeframe or a pull request that interested parties can follow?

@timvandermeij
Copy link
Contributor

The list of merged PRs is https://github.com/mozilla/pdf.js/pulls?q=is%3Apr+xfa+is%3Aclosed.

@marco-c
Copy link
Contributor

marco-c commented Jun 11, 2021

I think we can now close this, as the main XFA-related work has landed.

@marco-c marco-c closed this as completed Jun 11, 2021
@garyvdm
Copy link

garyvdm commented Jun 29, 2021

Thanks for all the hard work that went into this!

In what version of Firefox is this expected to land?

@marco-c
Copy link
Contributor

marco-c commented Jun 29, 2021

Firefox 91

@marco-c
Copy link
Contributor

marco-c commented Jun 29, 2021

You can already test it in Firefox Nightly, by setting the pdfjs.enableXfa preference to true.

@garyvdm
Copy link

garyvdm commented Jun 29, 2021

Got it working. Thanks!

@keyhan
Copy link

keyhan commented Nov 14, 2021

Is is possible now to save the filled input in separate XFA-file or is it merged into the PDF? Or is it not saved at all like in Acrobat Reader?

@marco-c
Copy link
Contributor

marco-c commented Nov 16, 2021

@keyhan the filled data should be saved as part of the PDF.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests