Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated pdf fields don't show up when page is written #355

Closed
segevmalool opened this issue Jun 13, 2017 · 78 comments · Fixed by #412 or #1639
Closed

Updated pdf fields don't show up when page is written #355

segevmalool opened this issue Jun 13, 2017 · 78 comments · Fixed by #412 or #1639
Assignees
Labels
help wanted We appreciate help everywhere - this one might be an easy start! is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF workflow-forms From a users perspective, forms is the affected feature/workflow

Comments

@segevmalool
Copy link

segevmalool commented Jun 13, 2017

I'd like to use PyPDF2 to fill out a pdf form. So far, everything is going smoothly, including updating the field text. But when I write the pdf to a file, there is apparently no change in the form. Running this code:

import datetime as dt
from PyPDF2 import PdfFileReader, PdfFileWriter
import re

form701 = PdfFileReader('ABC701LG.pdf')
page = form701.getPage(0)
filled = PdfFileWriter()

#removing extraneous fields
r = re.compile('^[0-9]')
fields = sorted(list(filter(r.match, form701.getFields().keys())), key = lambda x: int(x[0:2]))

filled.addPage(page)
filled.updatePageFormFieldValues(filled.getPage(0), 
                                 {fields[0]: 'some filled in text'})

print(filled.getPage(0)['/Annots'][0].getObject()['/T'])
print(filled.getPage(0)['/Annots'][0].getObject()['/V'])

with open('test.pdf','wb') as fp:
    filled.write(fp)

prints text:

1 EFFECTIVE DATE OF THIS SCHEDULE <i.e. the field name>
some filled in text

But when I open up test.pdf, there is no added text on the page! Help!

@mwhit74
Copy link
Contributor

mwhit74 commented Aug 22, 2017

I am having this same issue. The data does not show up in Adobe Reader unless you activate the field. The data does show up in Bluebeam but if you print, flatten, or push the pdf to a studio session all the data is lost.

When the file is opened in Bluebeam it automatically thinks that the user has made changes, denoted by the asterisk next to the file name in the tab.

If you export the fdf file from Bluebeam all the data is in the fdf file in the proper place.

If you change any attribute of the field in Bluebeam or Adobe, it will recognize the text in that field. It will print correctly and flatten correctly. I am not sure if it will push to the Bluebeam studio but I assume it will. You can also just copy and paste the text in the field back into that field and it will render correctly.

I have not found any help after googling around all day. I think it is an issue with PyPDF2 not "redrawing" the PDF correctly.

I have contacted Bluebeam support and they have returned saying essentially that it is not on their end.

@mwhit74
Copy link
Contributor

mwhit74 commented Sep 6, 2017

Ok I think I have narrowed this down some by just comparing two different pdfs.

For reference I am trying to read a pdf that was originally created by Bluebeam, use the updatePageFormFields() function in PyPDF2 to push a bunch of data from a database into the form fields, and save. At some point we want to flatten these and that is when it all goes wrong in Bluebeam. In Adobe it is messed up from the start in that you don't see any values in the form fields until you scroll over them with the mouse.

I appears there is a problem with the stream object that follows the object(s) representing the text form field. See below.

This is a sample output from a pdf generated by PyPDF2 for a text form field:

26 0 obj<</Subtype/Widget/M(D:20160512102729-05'00')/NM(OEGVASQHFKGZPSZW)/MK<</IF<</A[0 0]>>>>/F 4/C[1 0 0]/Rect[227.157 346.3074 438.2147 380.0766]/V(Marshall CYG)/Type/Annot/FT/Tx/AP<</N 27 0 R>>/DA(0 0 0 rg /Helv 12 Tf)/T(Owner Group)/BS 29 0 R/Q 0/P 3 0 R>>
endobj
27 0 obj<</Type/XObject/Matrix[1 0 0 1 0 0]/Resources<</ProcSet[/PDF/Text]/Font<</Helv 28 0 R>>>>/Length 41/FormType 1/BBox[0 0 211.0577 33.76923]/Subtype/Form>>
stream
0 0 211.0577 33.76923 re W n /Tx BMC EMC 
endstream
endobj
28 0 

And if I back up and edit the same based file in Bluebeam the output from that pdf for a text form field looks like this (I think the border object can be ignored):

16 0 obj<</Type/Annot/P 5 0 R/F 4/C[1 0 0]/Subtype/Widget/Q 0/FT/Tx/T(Owner Group)/MK<</IF<</A[0 0]>>>>/DA(0 0 0 rg /Helv 12 Tf)/AP<</N 18 0 R>>/M(D:20170906125217-05'00')/Rect[227.157 346.3074 438.2147 380.0766]/NM(OEGVASQHFKGZPSZW)/BS 17 0 R/V(Marshall CYG)>>
endobj
17 0 obj<</W 1/S/S/Type/Border>>
endobj
18 0 obj<</Type/XObject/Subtype/Form/FormType 1/BBox[0 0 211.0577 33.7692]/Resources<</ProcSet[/PDF/Text]/Font<</Helv 12 0 R>>>>/Matrix[1 0 0 1 0 0]/Length 106>>
stream
0 0 211.0577 33.7692 re W n /Tx BMC BT 0 0 0 rg /Helv 12 Tf 1 0 0 1 2 12.6486 Tm (Marshall CYG) Tj ET EMC 
endstream

Ok so the biggest difference here is the stream object at the end. The value /V(Marshall CYG) gets updated in the first object of each pdf, objects 26 and 16 respectively. However the stream object in the PyPDF2 generated pdf does not get updated and the stream object from Bluebeam does get updated.

In testing this theory I made a copy of the PyPDF2 pdf and manually edited the stream object in a text editor. I open this new file in Bluebeam and flattened it. It worked. This also appears to work in adobe reader.

Now how to fix....

@ademidun
Copy link
Contributor

A potential solution seems to be setting the Need Appearances flag.
Not yet sure how to implement in pypdf2 but these 2 links may provide some clues:
https://stackoverflow.com/questions/12198742/pdf-form-text-hidden-unless-clicked
https://forums.adobe.com/thread/305250

@ademidun
Copy link
Contributor

ademidun commented Dec 20, 2017

Okay, I think I have figured it out. If you read section 12.7.2 (page 431) of the PDF 1.7 specification, you will see that you need to set the NeedAppearances flag of the Acroform.

reader = PdfFileReader(open(infile, "rb"), strict=False)

if "/AcroForm" in reader.trailer["/Root"]:
    reader.trailer["/Root"]["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)}
    )
writer = PdfFileWriter()

if "/AcroForm" in writer._root_object:
    writer._root_object["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)}
    )

@Tromar44
Copy link

Tromar44 commented Jan 24, 2018

ademidun - Can you elaborate on your suggested solution above? I too am having problems with pdf forms, edited with PyPDF2, not showing field values without clicking in the field. With the code example below, how do you "set the NeedAppearances flag of the Acroform"?

from PyPDF2 import PdfFileWriter, PdfFileReader

output = PdfFileWriter()
input = PdfFileReader(open("myInputPdf.pdf", "rb"))

field_dictionary = {'Make': 'Toyota', 'Model': 'Tacoma'}

for pageNum in range(input.numPages):
    pageObj = input.getPage(pageNum)
    output.addPage(pageObj)
    output.updatePageFormFieldValues(pageObj, field_dictionary)

outputStream = open("myOutputPdf.pdf", "wb")
output.write(outputStream)

I tried adding in your IF statements but two problems arise: 1) NameObject and BooleanObject are not defined within my PdfFileReader "input" variable (I do not know how to do that) and 2) "/AcroForm" is not found within the PdfFileWriter object (my "output" variable).

Thanks for any help!

@ademidun
Copy link
Contributor

@Tromar44 Preamble, make sure your form is interactive. E.g. The pdf must already have editable fields.

  1. Sorry forgot to mention you will have to import them:
    from PyPDF2.generic import BooleanObject, NameObject, IndirectObject
  2. Are you sure you are using output.__root_object["/AcroForm"] or output.trailer["/Root"]["/AcroForm"] to access the "/AcroForm" key? and not just doing output["/AcroForm"]

@Tromar44
Copy link

Tromar44 commented Jan 25, 2018

@ademidun I thank you very much for your help but unfortunately I'm still not having any luck. To be clear, my simple test pdf form does have two editable fields and the script will populate them with "Toyota" and "Tacoma" respectively but those values are not visible unless I click on the field in the form (they become invisible again after the field loses focus). Here is the rewritten code that includes your suggestions and the results of running the code in inline comments.

from PyPDF2 import PdfFileWriter, PdfFileReader
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject

infile = "myInputPdf.pdf"
outfile = "myOutputPdf.pdf"

reader = PdfFileReader(open(infile, "rb"), strict=False)
if "/AcroForm" in reader.trailer["/Root"]: # result: following "IF code is executed
    print(True)
    reader.trailer["/Root"]["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

writer = PdfFileWriter()
if "/AcroForm" in writer._root_object: # result: False - following "IF" code is NOT executed
    print(True)
    writer._root_object["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

if "/AcroForm" in writer._root_object["/AcroForm"]: # result: "KeyError: '/AcroForm'
    print(True)
    writer._root_object["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

if "/AcroForm" in writer.trailer["/Root"]["/AcroForm"]:  # result: AttributeError: 'PdfFileWriter' object has no attribute 'trailer'
    print(True)
    writer._root_object["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

field_dictionary = {"Make": "Toyota", "Model": "Tacoma"}

writer.addPage(reader.getPage(0))
writer.updatePageFormFieldValues(writer.getPage(0), field_dictionary)

outputStream = open(outfile, "wb")
writer.write(outputStream)

I would definitely appreciate any more suggestions that you may have! Thank you very much!

@ademidun
Copy link
Contributor

It may also be a browser issue. I don't have the links anymore but I remember reading about some issues when opening/creating a PDF on Preview on Mac or viewing it in the browser vs. using an Adobe app etc. Maybe if you google things like "form fields only showing on click" or "form fields only active on click using preview mac".

I also recommend reading the PDF spec link I posted, its a bit dense but a combination of all these should get you in the right direction.

@ademidun
Copy link
Contributor

ademidun commented Jan 25, 2018

@Tromar44 Okay, I also found this snippet from my code, maybe it will help:

def set_need_appearances_writer(writer: PdfFileWriter):
    # See 12.7.2 and 7.7.2 for more information: http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
    try:
        catalog = writer._root_object
        # get the AcroForm tree
        if "/AcroForm" not in catalog:
            writer._root_object.update({
                NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)
            })

        need_appearances = NameObject("/NeedAppearances")
        writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
        # del writer._root_object["/AcroForm"]['NeedAppearances']
        return writer

    except Exception as e:
        print('set_need_appearances_writer() catch : ', repr(e))
        return writer

@Tromar44
Copy link

Tromar44 commented Jan 25, 2018

@ademidun That worked perfectly (I'd high five you right now if I could)! Thank you very much! For anyone else interested, the following worked for me:

from PyPDF2 import PdfFileWriter, PdfFileReader
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject

def set_need_appearances_writer(writer: PdfFileWriter):
    # See 12.7.2 and 7.7.2 for more information: http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
    try:
        catalog = writer._root_object
        # get the AcroForm tree
        if "/AcroForm" not in catalog:
            writer._root_object.update({
                NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})

        need_appearances = NameObject("/NeedAppearances")
        writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
        return writer

    except Exception as e:
        print('set_need_appearances_writer() catch : ', repr(e))
        return writer

infile = "input.pdf"
outfile = "output.pdf"

reader = PdfFileReader(open(infile, "rb"), strict=False)
if "/AcroForm" in reader.trailer["/Root"]:
    reader.trailer["/Root"]["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

writer = PdfFileWriter()
set_need_appearances_writer(writer)
if "/AcroForm" in writer._root_object:
    writer._root_object["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

field_dictionary = {"Make": "Toyota", "Model": "Tacoma"}

writer.addPage(reader.getPage(0))
writer.updatePageFormFieldValues(writer.getPage(0), field_dictionary)

with open(outfile, "wb") as fp:
    writer.write(fp)

@kissmett
Copy link

kissmett commented Feb 2, 2018

@ademidun you great!!!

@caver456
Copy link

caver456 commented Feb 8, 2018

Just stumbled upon this solution - great work! A couple of issues I noticed - can you reproduce them? - won't have time to send test case details for a couple of days yet if you need them; we had been using the good-ol fdfgen-then-pdftk-subprocess-call method but would like to get away from the external pdftk dependency so pypdf2 is great:

  • text field values show on the generated pdf, but checkbox field values (populated with True or False) don't seem to show up
  • there are some vertical shifting issues in pypdf2 output as compared to pdftk output - a few of the fields get bumped up or down on the generated pdf

mwhit74 added a commit to mwhit74/PyPDF2 that referenced this issue Mar 30, 2018
Borrowed code from ademidun in the comment history and inserted it into the
proper location in the pdf.py module.

Made some changes to the function to make it a method of the class.

It appears to work. I don't have a huge test suite set up to check it.
@shurshilov
Copy link

output.pdf
Does not work in the fields in this file, for example, the first field for the phone, the second one for some reason works and a few more fields, so the fix is ​​not working

@saipawan999
Copy link

Hi i am facing the same issue...i have tried setting need lreferences true also.when i edited pdf using pypdf2 some fields are displaying correctly and some are displaying only after i click on that filed.Please help me out on this issue as it is blocking me from the work.
Thank you

brandon-rhodes added a commit to brandon-rhodes/luca that referenced this issue Dec 11, 2018
The fields were showing up anyway, but not checkboxes, so I wanted to
advance to the full state of the art before trying more tweaking.
From: py-pdf/pypdf#355
@fvw222
Copy link

fvw222 commented Feb 15, 2019

The code works great! but only for PDFs with one page. I tried splitting my PDF into several one page files and looped through it. This worked great but when I merged them back together, the click-to-reveal-text problem reemerged. The problem lies in the .addPage command for the PdfFileWritter.

for page_number in range(pdf.total_pages):
    pdf2.addPage(pdf.getPage(page_number))
    pdf2.updatePageFormFieldValues(pdf2.getPage(page_number), field_dictionary)

When I enter this and try to save, I get an error message: "TypeError: argument should be integer or None, not 'NullObject'" It seems that the .addpage does not append the filewriter but treats each page as a seperate object. Does some one have a solution for this?

Problem solved:
I figured out the problem was I was running a protected PDF. I manually split the PDF and manually recombind it and now it works great. The solution is often right in front of your nose.

@aatish29
Copy link

aatish29 commented Feb 19, 2019

Hi All,

Thanks for your help.

I was able to view the text fields of the PDF Form using pypdf2. But still could not figure out to make the visibility(need appearances) of the checkbox of PDF Form.

Tried with this logic :
catalog = writer._root_object if '/AcroForm' in catalog: writer._root_object["/AcroForm"].update( {NameObject("/NeedAppearances"): BooleanObject(True)})

Thanks in advance.

@karnh
Copy link

karnh commented Mar 25, 2019

I found answer for checkboxes issue at https://stackoverflow.com/questions/35538851/how-to-check-uncheck-checkboxes-in-a-pdf-with-python-preferably-pypdf2.

def updateCheckboxValues(page, fields):

    for j in range(0, len(page['/Annots'])):
        writer_annot = page['/Annots'][j].getObject()
        for field in fields:
            if writer_annot.get('/T') == field:
                writer_annot.update({
                    NameObject("/V"): NameObject(fields[field]),
                    NameObject("/AS"): NameObject(fields[field])
                })

And as the comment says checked value could be anything depending on how the form was created. It was present in '/AP' for me. Which I extracted using list(writer_annot.get('/AP').get('/N').keys())[0].

@madornetto
Copy link

madornetto commented Mar 26, 2019

ok, I have implemented the above and it works on my pdf forms however once the form has been updated by the python it can't be run through the code a second time, as getFormFields returns an empty list. If I open the updated pdf in Adobe and add a space to the end of a form field value and save, run the code on the form again, getFormFields returns the correct list.

@ghost
Copy link

ghost commented Apr 11, 2019

I am having the same problem: fields not visible fixed by above-mentioned set_need_appearances_writer() approach but getFormFields/pdftk dump_data_fields does not see them.

In addition, it looks like my fonts somehow get messed up: one of the fields is actually a barcode font. But, after going through PyPDF2 to make a copy with updated fields, the field that uses the barcode font in the original copy now uses one of the other fonts.

@willingham
Copy link

I'm experiencing the same click-to-reveal-text issue. Here are a few interesting things I have noticed.

  • When using some of the irs forms e.g. https://www.irs.gov/pub/irs-pdf/f1095c.pdf, the issue doesn't happen.
  • When creating forms PDFElement's 'Form Field Recognition' feature, the issue doesn't happen.
  • When manually adding fields using PDFElement, the issue happens sometimes.

@mjl
Copy link

mjl commented Oct 8, 2019

t can't be run through the code a second time, as getFormFields returns an empty list.

For reference, I just stumbled on the same issue. The problem is that the generated pdf does not have an /AcroForm, and the easiest solution is probably to copy it over from the source file like this:

trailer = reader.trailer["/Root"]["/AcroForm"]
writer._root_object.update({
        NameObject('/AcroForm'): trailer
    })

@Nivatius
Copy link

@mjl can you elaborate how to implement those lines?

@zoiiieee
Copy link

anyone figure out a solution to set /NeedAppearance for a pdf with multiple pages?

@sstamand
Copy link

sstamand commented Jan 30, 2020

To include multiple pages to the output PDF, I added the pages from the template onto the outpuf file....

if "/AcroForm" in pdf2._root_object:
        pdf2._root_object["/AcroForm"].update(
                {NameObject("/NeedAppearances"): BooleanObject(True)})
        pdf2.addPage(pdf.getPage(0))
        pdf2.updatePageFormFieldValues(pdf2.getPage(0), student_data)
        **pdf2.addPage(pdf.getPage(1))
        pdf2.addPage(pdf.getPage(2))**
        outputStream = open(cs_output, "wb")
        pdf2.write(outputStream)
        outputStream.close()

@brzGatsu
Copy link

Looking forward to your findings. We are currently stuck with pdftk due to this bug... would love to switch to PyPDF once the issue has been resolved. Thanks for taking a look at it!

@cryzed
Copy link
Contributor

cryzed commented Feb 19, 2023

I read through the PDF docs and took a look at pdftk's source code: implementing appearance streams from scratch is possible, but quite tedious (especially if you want to support most common features). I think I'll go with the pdftk-route myself and use it until it becomes unsupported. If that ever happens, I'll take another look at it or hope that PDF is a dead format by that time.

However, I'll reopen my pull request -- the bug that prevents the proper creation of /Root/AcroForm does exist, and is fixed by my PR. With this at least, it's possible to render the first page correctly in Adobe Reader and all pages in most other PDF readers, without all these workarounds.

@brzGatsu
Copy link

brzGatsu commented Feb 19, 2023

If I understand correctly, with your PR we could split our pdf into single pages, fill the forms individually and then merge them again? Would that work for Acrobat?

@pubpub-zz
Copy link
Collaborator

@brzGatsu
I would have add expected 'PdfWriter.append()' to provide some capability to correctly split documents with fields. Can you confirm it ?

@cryzed
Copy link
Contributor

cryzed commented Feb 19, 2023

@brzGatsu no, that won't work. The issue is that a PDF reader is supposed to render the appearance streams for all annotations if /Root/AcroForm/NeedAppearances is set, when the document is opened. This rendering only happens at runtime (when Adobe Reader displays the file) and is not persisted, so you can't just split the pages and merge them later.

MartinThoma pushed a commit that referenced this issue Mar 5, 2023
PdfWriter.set_need_appearances_writer() (whether called directly or indirectly by PdfWriter.update_page_form_field_values()) fails to create the /Root/AcroForm object correctly when it doesn't already exist in the PdfWriter object.

See #355 (comment) for more details.

Fixes #355
@csears123
Copy link

I am also experiencing this issue from Adobe Reader, where the NeedAppearances flag is only allowing the first page of the PDF to view the text in the fillable field, as @cryzed documented. On the second page if I click into the field the text becomes visible, only with the cursor focus.
Really hoping there is a solution to set the appearance-stream for every field if that is the best and most reliable method. I'll try the example above from @codigovision.

I haven't used pdftk but I will also explore that as an alternative.

@pubpub-zz pubpub-zz self-assigned this Mar 10, 2023
@csears123
Copy link

The example below of adding/updating the appearance-stream seemed to work for a 2-page PDF with fillable fields:
#355 (comment)
However the issue persists when merging another fillable PDF form into a single PDF output. The first 2 pages with the original PDF are still working correctly (after updating the appearance-streams), but all the fields on the 3rd page (from a different PDF) do not show the text, it is hidden behind the input until I click into that field (using Adobe Reader).
Doing a little more debugging it seems the writer annotation's on the 3rd page do not have a 'AP' attribute to begin with, and the function below returns 'None' type:
ap = writer_annot.get(AnnotationDictionaryAttributes.AP)
Not sure how to add the missing 'AP' appearance-streams, it's seems complicated.
I also ended up testing pdftk and it just worked first try, no workarounds, issues, or bugs that needed addressing. I'll probably be scrapping pypdf for now, unless this critical issue is resolved.

@binury
Copy link

binury commented Apr 29, 2023

Confirming: this is still an issue. Filled annotations do not display as expected in MacOS Preview/{Mobile,} Safari. They do render in Chrome & Acrobat

@michael-hoang
Copy link

Has anyone tried doing this for PyPDF2 v3.0.1?

@pubpub-zz
Copy link
Collaborator

@michael-hoang
PyPDF2 is no more support. you have to upgrade to pypdf latest version.

@pubpub-zz
Copy link
Collaborator

@binury
if annotation is displayed in Chrome and Acrobat but not in MacOS Preview/Mobile but, the issue is more likely on this latest program. You may have to identify by your own. At lease technically I have no mean to help you.

@binury
Copy link

binury commented May 3, 2023

@binury

if annotation is displayed in Chrome and Acrobat but not in MacOS Preview/Mobile but, the issue is more likely on this latest program. You may have to identify by your own. At lease technically I have no mean to help you.

FWIW There is a working implementation available currently (in pdftk, of course, as mentioned in preview comment) that does not exhibit the same inconsistent appearance between readers.

I think it's a stretch to claim Pypdf fill is working if the PDFs won't be displayed correctly when viewed on an iPhone…
Sure Acrobat is technically the official PDF viewer and the most spec-compliant...
But nobody is going to use this if it means writing off a huge majority of users viewing the documents on their mobile phones. Not to mention MacOS users.

In any case… no help needed. just wanted to leave a comment earlier to let you guys know that it's broken still.

In lieu of having access to iOS for testing… There is also a working implementation of creating default appearances in a Node lib called pdf-annot. As a reference that may shed some light on why the default appearances in pypdf aren't working.

@alenards
Copy link

alenards commented Jun 6, 2023

@binury - I'm having trouble filling only a library named pdf-annot on npm.

Is it this: https://www.npmjs.com/package/ts-pdf-annot

@pubpub-zz
Copy link
Collaborator

@binury - I'm having trouble filling only a library named pdf-annot on npm.

Is it this: https://www.npmjs.com/package/ts-pdf-annot

Your library seems to be JavaScript. I do not think there is a link with pypdf (python)

@pubpub-zz
Copy link
Collaborator

@binury

if annotation is displayed in Chrome and Acrobat but not in MacOS Preview/Mobile but, the issue is more likely on this latest program. You may have to identify by your own. At lease technically I have no mean to help you.

FWIW There is a working implementation available currently (in pdftk, of course, as mentioned in preview comment) that does not exhibit the same inconsistent appearance between readers.

I think it's a stretch to claim Pypdf fill is working if the PDFs won't be displayed correctly when viewed on an iPhone…
Sure Acrobat is technically the official PDF viewer and the most spec-compliant...
But nobody is going to use this if it means writing off a huge majority of users viewing the documents on their mobile phones. Not to mention MacOS users.

In any case… no help needed. just wanted to leave a comment earlier to let you guys know that it's broken still.

In lieu of having access to iOS for testing… There is also a working implementation of creating default appearances in a Node lib called pdf-annot. As a reference that may shed some light on why the default appearances in pypdf aren't working.

@binury
A PR is under submission to improve field rendering if you want to have a try

@alenards
Copy link

alenards commented Jun 6, 2023

@pubpub-zz - I think @binury was offering that the referenced library's filled in textfields shown as expected without having to do these workarounds with /Root/AcroForm, adjusting the annotations, or putting in the /NeedsAppearances. If the handling in that library is correct, it might help resolve the need for workarounds on the PdfWriter or these other approaches.

I'm a bit frustrated that the readthedocs for this library make it look like "filling out forms" work here without these workarounds noted. I was incredibly pumped up to see that the latest release for pypdf was on June 4. My use case is filling out forms and having them reliably render in PDF tools (macOS Preview being one of them).

My testing in ipython and macOS Preview led to reported behavior (the value were there, but only when I clicked into the fields, and then they will disappear after focus is lost). I'll try calling PdfWriter.set_need_appearances_writer() directly. But I'm probably going to have to look for another library in another language.

@pubpub-zz
Copy link
Collaborator

@pubpub-zz - I think @binury was offering that the referenced library's filled in textfields shown as expected without having to do these workarounds with /Root/AcroForm, adjusting the annotations, or putting in the /NeedsAppearances. If the handling in that library is correct, it might help resolve the need for workarounds on the PdfWriter or these other approaches.

I'm a bit frustrated that the readthedocs for this library make it look like "filling out forms" work here without these workarounds noted. I was incredibly pumped up to see that the latest release for pypdf was on June 4. My use case is filling out forms and having them reliably render in PDF tools (macOS Preview being one of them).

My testing in ipython and macOS Preview led to reported behavior (the value were there, but only when I clicked into the fields, and then they will disappear after focus is lost). I'll try calling PdfWriter.set_need_appearances_writer() directly. But I'm probably going to have to look for another library in another language.

Thanks for the comment : I may have read too quickly the message
I understand your position and agree with it. The workaround of needappearance is not the. Best. As said aboveI've produced PR #1864 that is generating the display. It is a first release but If you can test it, it would be great

@alenards
Copy link

alenards commented Jun 6, 2023

@pubpub-zz - I definitely appreciate the effort for all the folx keeping pypdf maintained. All of the PyPDF2, PyPDF4, all that is dizzying; so relieved to see this library active.

I'll see if I can look at #1864 - and I'll comment there on that PR thread.

Thanks again.

@thomasweiland93
Copy link

Hello, together I have taken a look on #1864 and tested with a PDF from my company. But unfortunately the appearance doesn't look correct on Iphones etc.

The Problem might be following structure on my pdf:

Parent (writer_parent_annot)
{'/DA': '/MyriadPro-Regular 9 Tf 0 0.290 0.439 rg', '/FT': '/Tx', '/Kids': [IndirectObject(88, 0, 2973937844032), IndirectObject(85, 0, 2973937844032)], '/T': '08-Mail2', '/V': '[[email protected]]'}
Child (writer_annot)
{'/AP': {'/N': IndirectObject(89, 0, 2973937844032)}, '/F': 4, '/MK': {}, '/P': IndirectObject(49, 0, 2973937844032), '/Parent': IndirectObject(87, 0, 2973937844032), '/Rect': [246.47300000000001, 232.13200000000001, 513.09299999999996, 220.27699999999999], '/Subtype': '/Widget', '/Type': '/Annot'}

In this case the code runs just in the else case of update_page_form_field_values and sets AA.AS to /Off

To get a correct view on the Iphone Viewer i have done some small changes in the _writer.py... (but just a messy fix for my current pdf)

I have used the /DA /FT and /V from the writer_parent_annot an the rest from the writer_annot.

Is this a know Issue?

@Chrisd204
Copy link

@ademidun That worked perfectly (I'd high five you right now if I could)! Thank you very much! For anyone else interested, the following worked for me:

from PyPDF2 import PdfFileWriter, PdfFileReader
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject

def set_need_appearances_writer(writer: PdfFileWriter):
    # See 12.7.2 and 7.7.2 for more information: http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
    try:
        catalog = writer._root_object
        # get the AcroForm tree
        if "/AcroForm" not in catalog:
            writer._root_object.update({
                NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})

        need_appearances = NameObject("/NeedAppearances")
        writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
        return writer

    except Exception as e:
        print('set_need_appearances_writer() catch : ', repr(e))
        return writer

infile = "input.pdf"
outfile = "output.pdf"

reader = PdfFileReader(open(infile, "rb"), strict=False)
if "/AcroForm" in reader.trailer["/Root"]:
    reader.trailer["/Root"]["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

writer = PdfFileWriter()
set_need_appearances_writer(writer)
if "/AcroForm" in writer._root_object:
    writer._root_object["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

field_dictionary = {"Make": "Toyota", "Model": "Tacoma"}

writer.addPage(reader.getPage(0))
writer.updatePageFormFieldValues(writer.getPage(0), field_dictionary)

with open(outfile, "wb") as fp:
    writer.write(fp)

This solution works!

@RomHartmann
Copy link

RomHartmann commented Jan 23, 2024

This thread is crazy long, with a lot of old versions and red herrings.

As of now, this works for me for pypdf==3.17.4

import pypdf
from pypdf import generic as pypdf_generic

# ... load file
reader = pypdf.PdfReader(file)
writer = pypdf.PdfWriter()

writer.set_need_appearances_writer()

for page_nr, page in enumerate(reader.pages):
    form_fields = page.get('/Annots')
    if form_fields:
        for field in form_fields.get_object():
            field_object = field.get_object()

            # any other logic
            field_object.update({
                pypdf_generic.NameObject('/V'): pypdf_generic.create_string_object(field_value)
            })
    writer.add_page(page)

# create your output file or stream
writer.write(output_file)

Conditions of my test:

  • single page PDF
  • Only text fields

@caver456
Copy link

Thanks @RomHartmann that definitely got closer.

In the end, as someone else pointed out, flattening is only part of the answer, and relying on NeedAppearences didn't quite do the trick, so modifying the stream directly gave much better results. Here's a stackoverflow question spelling out these specific symptoms (not sure if they are the exact same symptoms as everyone else has been experiencing):

Flattened filled PDF form is 'of invalid format' on Android, and shows blank fields in Chrome extension

and the solution (for our use case, at least) that basically references another solution at https://stackoverflow.com/a/73655665/3577105 - thanks to @JeremyM4n for sure.

@WMiller256
Copy link

WMiller256 commented Oct 15, 2024

@ademidun That worked perfectly (I'd high five you right now if I could)! Thank you very much! For anyone else interested, the following worked for me:

from PyPDF2 import PdfFileWriter, PdfFileReader
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject

def set_need_appearances_writer(writer: PdfFileWriter):
    # See 12.7.2 and 7.7.2 for more information: http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
    try:
        catalog = writer._root_object
        # get the AcroForm tree
        if "/AcroForm" not in catalog:
            writer._root_object.update({
                NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})

        need_appearances = NameObject("/NeedAppearances")
        writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
        return writer

    except Exception as e:
        print('set_need_appearances_writer() catch : ', repr(e))
        return writer

infile = "input.pdf"
outfile = "output.pdf"

reader = PdfFileReader(open(infile, "rb"), strict=False)
if "/AcroForm" in reader.trailer["/Root"]:
    reader.trailer["/Root"]["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

writer = PdfFileWriter()
set_need_appearances_writer(writer)
if "/AcroForm" in writer._root_object:
    writer._root_object["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

field_dictionary = {"Make": "Toyota", "Model": "Tacoma"}

writer.addPage(reader.getPage(0))
writer.updatePageFormFieldValues(writer.getPage(0), field_dictionary)

with open(outfile, "wb") as fp:
    writer.write(fp)

For future users: this may corrupt the PDF file, if that is the case for you one possible solution is to move the lines

set_need_appearances_writer(writer)
if "/AcroForm" in writer._root_object:
    writer._root_object["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

After the page-adding is complete, i.e.

from PyPDF2 import PdfFileWriter, PdfFileReader
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject

def set_need_appearances_writer(writer: PdfFileWriter):
    # See 12.7.2 and 7.7.2 for more information: http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
    try:
        catalog = writer._root_object
        # get the AcroForm tree
        if "/AcroForm" not in catalog:
            writer._root_object.update({
                NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})

        need_appearances = NameObject("/NeedAppearances")
        writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
        return writer

    except Exception as e:
        print('set_need_appearances_writer() catch : ', repr(e))
        return writer

infile = "input.pdf"
outfile = "output.pdf"

reader = PdfFileReader(open(infile, "rb"), strict=False)
if "/AcroForm" in reader.trailer["/Root"]:
    reader.trailer["/Root"]["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

writer = PdfFileWriter()
field_dictionary = {"Make": "Toyota", "Model": "Tacoma"}
writer.addPage(reader.getPage(0))
writer.updatePageFormFieldValues(writer.getPage(0), field_dictionary)

set_need_appearances_writer(writer)
if "/AcroForm" in writer._root_object:
    writer._root_object["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

with open(outfile, "wb") as fp:
    writer.write(fp)

@stefan6419846
Copy link
Collaborator

While this surely is an old issue, I recommend to switch to the maintained pypdf instead which might already solve this out of the box.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted We appreciate help everywhere - this one might be an easy start! is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF workflow-forms From a users perspective, forms is the affected feature/workflow
Projects
None yet