Updated pdf fields don't show up when page is written #355

segevmalool · 2017-06-13T17:29:51Z

I'd like to use PyPDF2 to fill out a pdf form. So far, everything is going smoothly, including updating the field text. But when I write the pdf to a file, there is apparently no change in the form. Running this code:

import datetime as dt
from PyPDF2 import PdfFileReader, PdfFileWriter
import re

form701 = PdfFileReader('ABC701LG.pdf')
page = form701.getPage(0)
filled = PdfFileWriter()

#removing extraneous fields
r = re.compile('^[0-9]')
fields = sorted(list(filter(r.match, form701.getFields().keys())), key = lambda x: int(x[0:2]))

filled.addPage(page)
filled.updatePageFormFieldValues(filled.getPage(0), 
                                 {fields[0]: 'some filled in text'})

print(filled.getPage(0)['/Annots'][0].getObject()['/T'])
print(filled.getPage(0)['/Annots'][0].getObject()['/V'])

with open('test.pdf','wb') as fp:
    filled.write(fp)

prints text:

1 EFFECTIVE DATE OF THIS SCHEDULE <i.e. the field name>
some filled in text

But when I open up test.pdf, there is no added text on the page! Help!

mwhit74 · 2017-08-22T20:52:05Z

I am having this same issue. The data does not show up in Adobe Reader unless you activate the field. The data does show up in Bluebeam but if you print, flatten, or push the pdf to a studio session all the data is lost.

When the file is opened in Bluebeam it automatically thinks that the user has made changes, denoted by the asterisk next to the file name in the tab.

If you export the fdf file from Bluebeam all the data is in the fdf file in the proper place.

If you change any attribute of the field in Bluebeam or Adobe, it will recognize the text in that field. It will print correctly and flatten correctly. I am not sure if it will push to the Bluebeam studio but I assume it will. You can also just copy and paste the text in the field back into that field and it will render correctly.

I have not found any help after googling around all day. I think it is an issue with PyPDF2 not "redrawing" the PDF correctly.

I have contacted Bluebeam support and they have returned saying essentially that it is not on their end.

mwhit74 · 2017-09-06T18:03:01Z

Ok I think I have narrowed this down some by just comparing two different pdfs.

For reference I am trying to read a pdf that was originally created by Bluebeam, use the updatePageFormFields() function in PyPDF2 to push a bunch of data from a database into the form fields, and save. At some point we want to flatten these and that is when it all goes wrong in Bluebeam. In Adobe it is messed up from the start in that you don't see any values in the form fields until you scroll over them with the mouse.

I appears there is a problem with the stream object that follows the object(s) representing the text form field. See below.

This is a sample output from a pdf generated by PyPDF2 for a text form field:

26 0 obj<</Subtype/Widget/M(D:20160512102729-05'00')/NM(OEGVASQHFKGZPSZW)/MK<</IF<</A[0 0]>>>>/F 4/C[1 0 0]/Rect[227.157 346.3074 438.2147 380.0766]/V(Marshall CYG)/Type/Annot/FT/Tx/AP<</N 27 0 R>>/DA(0 0 0 rg /Helv 12 Tf)/T(Owner Group)/BS 29 0 R/Q 0/P 3 0 R>>
endobj
27 0 obj<</Type/XObject/Matrix[1 0 0 1 0 0]/Resources<</ProcSet[/PDF/Text]/Font<</Helv 28 0 R>>>>/Length 41/FormType 1/BBox[0 0 211.0577 33.76923]/Subtype/Form>>
stream
0 0 211.0577 33.76923 re W n /Tx BMC EMC 
endstream
endobj
28 0

And if I back up and edit the same based file in Bluebeam the output from that pdf for a text form field looks like this (I think the border object can be ignored):

16 0 obj<</Type/Annot/P 5 0 R/F 4/C[1 0 0]/Subtype/Widget/Q 0/FT/Tx/T(Owner Group)/MK<</IF<</A[0 0]>>>>/DA(0 0 0 rg /Helv 12 Tf)/AP<</N 18 0 R>>/M(D:20170906125217-05'00')/Rect[227.157 346.3074 438.2147 380.0766]/NM(OEGVASQHFKGZPSZW)/BS 17 0 R/V(Marshall CYG)>>
endobj
17 0 obj<</W 1/S/S/Type/Border>>
endobj
18 0 obj<</Type/XObject/Subtype/Form/FormType 1/BBox[0 0 211.0577 33.7692]/Resources<</ProcSet[/PDF/Text]/Font<</Helv 12 0 R>>>>/Matrix[1 0 0 1 0 0]/Length 106>>
stream
0 0 211.0577 33.7692 re W n /Tx BMC BT 0 0 0 rg /Helv 12 Tf 1 0 0 1 2 12.6486 Tm (Marshall CYG) Tj ET EMC 
endstream

Ok so the biggest difference here is the stream object at the end. The value /V(Marshall CYG) gets updated in the first object of each pdf, objects 26 and 16 respectively. However the stream object in the PyPDF2 generated pdf does not get updated and the stream object from Bluebeam does get updated.

In testing this theory I made a copy of the PyPDF2 pdf and manually edited the stream object in a text editor. I open this new file in Bluebeam and flattened it. It worked. This also appears to work in adobe reader.

Now how to fix....

ademidun · 2017-12-16T04:41:45Z

A potential solution seems to be setting the Need Appearances flag.
Not yet sure how to implement in pypdf2 but these 2 links may provide some clues:
https://stackoverflow.com/questions/12198742/pdf-form-text-hidden-unless-clicked
https://forums.adobe.com/thread/305250

ademidun · 2017-12-20T23:56:41Z

Okay, I think I have figured it out. If you read section 12.7.2 (page 431) of the PDF 1.7 specification, you will see that you need to set the NeedAppearances flag of the Acroform.

reader = PdfFileReader(open(infile, "rb"), strict=False)

if "/AcroForm" in reader.trailer["/Root"]:
    reader.trailer["/Root"]["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)}
    )
writer = PdfFileWriter()

if "/AcroForm" in writer._root_object:
    writer._root_object["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)}
    )

Tromar44 · 2018-01-24T23:31:21Z

ademidun - Can you elaborate on your suggested solution above? I too am having problems with pdf forms, edited with PyPDF2, not showing field values without clicking in the field. With the code example below, how do you "set the NeedAppearances flag of the Acroform"?

from PyPDF2 import PdfFileWriter, PdfFileReader

output = PdfFileWriter()
input = PdfFileReader(open("myInputPdf.pdf", "rb"))

field_dictionary = {'Make': 'Toyota', 'Model': 'Tacoma'}

for pageNum in range(input.numPages):
    pageObj = input.getPage(pageNum)
    output.addPage(pageObj)
    output.updatePageFormFieldValues(pageObj, field_dictionary)

outputStream = open("myOutputPdf.pdf", "wb")
output.write(outputStream)

I tried adding in your IF statements but two problems arise: 1) NameObject and BooleanObject are not defined within my PdfFileReader "input" variable (I do not know how to do that) and 2) "/AcroForm" is not found within the PdfFileWriter object (my "output" variable).

Thanks for any help!

ademidun · 2018-01-25T00:09:27Z

@Tromar44 Preamble, make sure your form is interactive. E.g. The pdf must already have editable fields.

Sorry forgot to mention you will have to import them:
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject
Are you sure you are using output.__root_object["/AcroForm"] or output.trailer["/Root"]["/AcroForm"] to access the "/AcroForm" key? and not just doing output["/AcroForm"]

Tromar44 · 2018-01-25T18:59:35Z

@ademidun I thank you very much for your help but unfortunately I'm still not having any luck. To be clear, my simple test pdf form does have two editable fields and the script will populate them with "Toyota" and "Tacoma" respectively but those values are not visible unless I click on the field in the form (they become invisible again after the field loses focus). Here is the rewritten code that includes your suggestions and the results of running the code in inline comments.

from PyPDF2 import PdfFileWriter, PdfFileReader
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject

infile = "myInputPdf.pdf"
outfile = "myOutputPdf.pdf"

reader = PdfFileReader(open(infile, "rb"), strict=False)
if "/AcroForm" in reader.trailer["/Root"]: # result: following "IF code is executed
    print(True)
    reader.trailer["/Root"]["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

writer = PdfFileWriter()
if "/AcroForm" in writer._root_object: # result: False - following "IF" code is NOT executed
    print(True)
    writer._root_object["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

if "/AcroForm" in writer._root_object["/AcroForm"]: # result: "KeyError: '/AcroForm'
    print(True)
    writer._root_object["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

if "/AcroForm" in writer.trailer["/Root"]["/AcroForm"]:  # result: AttributeError: 'PdfFileWriter' object has no attribute 'trailer'
    print(True)
    writer._root_object["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

field_dictionary = {"Make": "Toyota", "Model": "Tacoma"}

writer.addPage(reader.getPage(0))
writer.updatePageFormFieldValues(writer.getPage(0), field_dictionary)

outputStream = open(outfile, "wb")
writer.write(outputStream)

I would definitely appreciate any more suggestions that you may have! Thank you very much!

ademidun · 2018-01-25T19:10:44Z

It may also be a browser issue. I don't have the links anymore but I remember reading about some issues when opening/creating a PDF on Preview on Mac or viewing it in the browser vs. using an Adobe app etc. Maybe if you google things like "form fields only showing on click" or "form fields only active on click using preview mac".

I also recommend reading the PDF spec link I posted, its a bit dense but a combination of all these should get you in the right direction.

ademidun · 2018-01-25T19:14:51Z

@Tromar44 Okay, I also found this snippet from my code, maybe it will help:

def set_need_appearances_writer(writer: PdfFileWriter):
    # See 12.7.2 and 7.7.2 for more information: http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
    try:
        catalog = writer._root_object
        # get the AcroForm tree
        if "/AcroForm" not in catalog:
            writer._root_object.update({
                NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)
            })

        need_appearances = NameObject("/NeedAppearances")
        writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
        # del writer._root_object["/AcroForm"]['NeedAppearances']
        return writer

    except Exception as e:
        print('set_need_appearances_writer() catch : ', repr(e))
        return writer

Tromar44 · 2018-01-25T19:35:41Z

@ademidun That worked perfectly (I'd high five you right now if I could)! Thank you very much! For anyone else interested, the following worked for me:

from PyPDF2 import PdfFileWriter, PdfFileReader
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject

def set_need_appearances_writer(writer: PdfFileWriter):
    # See 12.7.2 and 7.7.2 for more information: http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
    try:
        catalog = writer._root_object
        # get the AcroForm tree
        if "/AcroForm" not in catalog:
            writer._root_object.update({
                NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})

        need_appearances = NameObject("/NeedAppearances")
        writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
        return writer

    except Exception as e:
        print('set_need_appearances_writer() catch : ', repr(e))
        return writer

infile = "input.pdf"
outfile = "output.pdf"

reader = PdfFileReader(open(infile, "rb"), strict=False)
if "/AcroForm" in reader.trailer["/Root"]:
    reader.trailer["/Root"]["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

writer = PdfFileWriter()
set_need_appearances_writer(writer)
if "/AcroForm" in writer._root_object:
    writer._root_object["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

field_dictionary = {"Make": "Toyota", "Model": "Tacoma"}

writer.addPage(reader.getPage(0))
writer.updatePageFormFieldValues(writer.getPage(0), field_dictionary)

with open(outfile, "wb") as fp:
    writer.write(fp)

kissmett · 2018-02-02T08:51:20Z

@ademidun you great!!!

caver456 · 2018-02-08T14:26:50Z

Just stumbled upon this solution - great work! A couple of issues I noticed - can you reproduce them? - won't have time to send test case details for a couple of days yet if you need them; we had been using the good-ol fdfgen-then-pdftk-subprocess-call method but would like to get away from the external pdftk dependency so pypdf2 is great:

text field values show on the generated pdf, but checkbox field values (populated with True or False) don't seem to show up
there are some vertical shifting issues in pypdf2 output as compared to pdftk output - a few of the fields get bumped up or down on the generated pdf

Borrowed code from ademidun in the comment history and inserted it into the proper location in the pdf.py module. Made some changes to the function to make it a method of the class. It appears to work. I don't have a huge test suite set up to check it.

shurshilov · 2018-04-09T12:26:11Z

output.pdf
Does not work in the fields in this file, for example, the first field for the phone, the second one for some reason works and a few more fields, so the fix is not working

saipawan999 · 2018-10-28T11:56:42Z

Hi i am facing the same issue...i have tried setting need lreferences true also.when i edited pdf using pypdf2 some fields are displaying correctly and some are displaying only after i click on that filed.Please help me out on this issue as it is blocking me from the work.
Thank you

The fields were showing up anyway, but not checkboxes, so I wanted to advance to the full state of the art before trying more tweaking. From: py-pdf/pypdf#355

fvw222 · 2019-02-15T12:03:26Z

The code works great! but only for PDFs with one page. I tried splitting my PDF into several one page files and looped through it. This worked great but when I merged them back together, the click-to-reveal-text problem reemerged. The problem lies in the .addPage command for the PdfFileWritter.

for page_number in range(pdf.total_pages):
    pdf2.addPage(pdf.getPage(page_number))
    pdf2.updatePageFormFieldValues(pdf2.getPage(page_number), field_dictionary)

When I enter this and try to save, I get an error message: "TypeError: argument should be integer or None, not 'NullObject'" It seems that the .addpage does not append the filewriter but treats each page as a seperate object. Does some one have a solution for this?

Problem solved:
I figured out the problem was I was running a protected PDF. I manually split the PDF and manually recombind it and now it works great. The solution is often right in front of your nose.

aatish29 · 2019-02-19T10:26:33Z

Hi All,

Thanks for your help.

I was able to view the text fields of the PDF Form using pypdf2. But still could not figure out to make the visibility(need appearances) of the checkbox of PDF Form.

Tried with this logic :
catalog = writer._root_object if '/AcroForm' in catalog: writer._root_object["/AcroForm"].update( {NameObject("/NeedAppearances"): BooleanObject(True)})

Thanks in advance.

karnh · 2019-03-25T15:41:52Z

I found answer for checkboxes issue at https://stackoverflow.com/questions/35538851/how-to-check-uncheck-checkboxes-in-a-pdf-with-python-preferably-pypdf2.

def updateCheckboxValues(page, fields):

    for j in range(0, len(page['/Annots'])):
        writer_annot = page['/Annots'][j].getObject()
        for field in fields:
            if writer_annot.get('/T') == field:
                writer_annot.update({
                    NameObject("/V"): NameObject(fields[field]),
                    NameObject("/AS"): NameObject(fields[field])
                })

And as the comment says checked value could be anything depending on how the form was created. It was present in '/AP' for me. Which I extracted using list(writer_annot.get('/AP').get('/N').keys())[0].

madornetto · 2019-03-26T18:37:25Z

ok, I have implemented the above and it works on my pdf forms however once the form has been updated by the python it can't be run through the code a second time, as getFormFields returns an empty list. If I open the updated pdf in Adobe and add a space to the end of a form field value and save, run the code on the form again, getFormFields returns the correct list.

ghost · 2019-04-11T21:51:19Z

I am having the same problem: fields not visible fixed by above-mentioned set_need_appearances_writer() approach but getFormFields/pdftk dump_data_fields does not see them.

In addition, it looks like my fonts somehow get messed up: one of the fields is actually a barcode font. But, after going through PyPDF2 to make a copy with updated fields, the field that uses the barcode font in the original copy now uses one of the other fonts.

willingham · 2019-10-03T16:02:55Z

I'm experiencing the same click-to-reveal-text issue. Here are a few interesting things I have noticed.

When using some of the irs forms e.g. https://www.irs.gov/pub/irs-pdf/f1095c.pdf, the issue doesn't happen.
When creating forms PDFElement's 'Form Field Recognition' feature, the issue doesn't happen.
When manually adding fields using PDFElement, the issue happens sometimes.

mjl · 2019-10-08T14:06:15Z

t can't be run through the code a second time, as getFormFields returns an empty list.

For reference, I just stumbled on the same issue. The problem is that the generated pdf does not have an /AcroForm, and the easiest solution is probably to copy it over from the source file like this:

trailer = reader.trailer["/Root"]["/AcroForm"]
writer._root_object.update({
        NameObject('/AcroForm'): trailer
    })

Nivatius · 2019-10-19T13:12:43Z

@mjl can you elaborate how to implement those lines?

zoiiieee · 2020-01-19T11:20:30Z

anyone figure out a solution to set /NeedAppearance for a pdf with multiple pages?

sstamand · 2020-01-30T16:04:25Z

To include multiple pages to the output PDF, I added the pages from the template onto the outpuf file....

if "/AcroForm" in pdf2._root_object:
        pdf2._root_object["/AcroForm"].update(
                {NameObject("/NeedAppearances"): BooleanObject(True)})
        pdf2.addPage(pdf.getPage(0))
        pdf2.updatePageFormFieldValues(pdf2.getPage(0), student_data)
        **pdf2.addPage(pdf.getPage(1))
        pdf2.addPage(pdf.getPage(2))**
        outputStream = open(cs_output, "wb")
        pdf2.write(outputStream)
        outputStream.close()

brzGatsu · 2023-02-18T23:21:58Z

Looking forward to your findings. We are currently stuck with pdftk due to this bug... would love to switch to PyPDF once the issue has been resolved. Thanks for taking a look at it!

cryzed · 2023-02-19T10:29:35Z

I read through the PDF docs and took a look at pdftk's source code: implementing appearance streams from scratch is possible, but quite tedious (especially if you want to support most common features). I think I'll go with the pdftk-route myself and use it until it becomes unsupported. If that ever happens, I'll take another look at it or hope that PDF is a dead format by that time.

However, I'll reopen my pull request -- the bug that prevents the proper creation of /Root/AcroForm does exist, and is fixed by my PR. With this at least, it's possible to render the first page correctly in Adobe Reader and all pages in most other PDF readers, without all these workarounds.

brzGatsu · 2023-02-19T11:03:00Z

If I understand correctly, with your PR we could split our pdf into single pages, fill the forms individually and then merge them again? Would that work for Acrobat?

pubpub-zz · 2023-02-19T11:20:24Z

@brzGatsu
I would have add expected 'PdfWriter.append()' to provide some capability to correctly split documents with fields. Can you confirm it ?

cryzed · 2023-02-19T11:37:28Z

@brzGatsu no, that won't work. The issue is that a PDF reader is supposed to render the appearance streams for all annotations if /Root/AcroForm/NeedAppearances is set, when the document is opened. This rendering only happens at runtime (when Adobe Reader displays the file) and is not persisted, so you can't just split the pages and merge them later.

PdfWriter.set_need_appearances_writer() (whether called directly or indirectly by PdfWriter.update_page_form_field_values()) fails to create the /Root/AcroForm object correctly when it doesn't already exist in the PdfWriter object. See #355 (comment) for more details. Fixes #355

csears123 · 2023-03-10T15:55:21Z

I am also experiencing this issue from Adobe Reader, where the NeedAppearances flag is only allowing the first page of the PDF to view the text in the fillable field, as @cryzed documented. On the second page if I click into the field the text becomes visible, only with the cursor focus.
Really hoping there is a solution to set the appearance-stream for every field if that is the best and most reliable method. I'll try the example above from @codigovision.

I haven't used pdftk but I will also explore that as an alternative.

csears123 · 2023-03-20T19:55:36Z

The example below of adding/updating the appearance-stream seemed to work for a 2-page PDF with fillable fields:
#355 (comment)
However the issue persists when merging another fillable PDF form into a single PDF output. The first 2 pages with the original PDF are still working correctly (after updating the appearance-streams), but all the fields on the 3rd page (from a different PDF) do not show the text, it is hidden behind the input until I click into that field (using Adobe Reader).
Doing a little more debugging it seems the writer annotation's on the 3rd page do not have a 'AP' attribute to begin with, and the function below returns 'None' type:
ap = writer_annot.get(AnnotationDictionaryAttributes.AP)
Not sure how to add the missing 'AP' appearance-streams, it's seems complicated.
I also ended up testing pdftk and it just worked first try, no workarounds, issues, or bugs that needed addressing. I'll probably be scrapping pypdf for now, unless this critical issue is resolved.

binury · 2023-04-29T19:27:48Z

Confirming: this is still an issue. Filled annotations do not display as expected in MacOS Preview/{Mobile,} Safari. They do render in Chrome & Acrobat

michael-hoang · 2023-05-03T07:38:59Z

Has anyone tried doing this for PyPDF2 v3.0.1?

pubpub-zz · 2023-05-03T07:58:35Z

@michael-hoang
PyPDF2 is no more support. you have to upgrade to pypdf latest version.

pubpub-zz · 2023-05-03T08:05:01Z

@binury
if annotation is displayed in Chrome and Acrobat but not in MacOS Preview/Mobile but, the issue is more likely on this latest program. You may have to identify by your own. At lease technically I have no mean to help you.

binury · 2023-05-03T17:31:28Z

@binury

if annotation is displayed in Chrome and Acrobat but not in MacOS Preview/Mobile but, the issue is more likely on this latest program. You may have to identify by your own. At lease technically I have no mean to help you.

FWIW There is a working implementation available currently (in pdftk, of course, as mentioned in preview comment) that does not exhibit the same inconsistent appearance between readers.

I think it's a stretch to claim Pypdf fill is working if the PDFs won't be displayed correctly when viewed on an iPhone…
Sure Acrobat is technically the official PDF viewer and the most spec-compliant...
But nobody is going to use this if it means writing off a huge majority of users viewing the documents on their mobile phones. Not to mention MacOS users.

In any case… no help needed. just wanted to leave a comment earlier to let you guys know that it's broken still.

In lieu of having access to iOS for testing… There is also a working implementation of creating default appearances in a Node lib called pdf-annot. As a reference that may shed some light on why the default appearances in pypdf aren't working.

alenards · 2023-06-06T00:11:16Z

@binury - I'm having trouble filling only a library named pdf-annot on npm.

Is it this: https://www.npmjs.com/package/ts-pdf-annot

pubpub-zz · 2023-06-06T04:38:52Z

@binury - I'm having trouble filling only a library named pdf-annot on npm.

Is it this: https://www.npmjs.com/package/ts-pdf-annot

Your library seems to be JavaScript. I do not think there is a link with pypdf (python)

pubpub-zz · 2023-06-06T04:42:52Z

@binury

if annotation is displayed in Chrome and Acrobat but not in MacOS Preview/Mobile but, the issue is more likely on this latest program. You may have to identify by your own. At lease technically I have no mean to help you.

FWIW There is a working implementation available currently (in pdftk, of course, as mentioned in preview comment) that does not exhibit the same inconsistent appearance between readers.

I think it's a stretch to claim Pypdf fill is working if the PDFs won't be displayed correctly when viewed on an iPhone…
Sure Acrobat is technically the official PDF viewer and the most spec-compliant...
But nobody is going to use this if it means writing off a huge majority of users viewing the documents on their mobile phones. Not to mention MacOS users.

In any case… no help needed. just wanted to leave a comment earlier to let you guys know that it's broken still.

In lieu of having access to iOS for testing… There is also a working implementation of creating default appearances in a Node lib called pdf-annot. As a reference that may shed some light on why the default appearances in pypdf aren't working.

@binury
A PR is under submission to improve field rendering if you want to have a try

alenards · 2023-06-06T12:22:16Z

@pubpub-zz - I think @binury was offering that the referenced library's filled in textfields shown as expected without having to do these workarounds with /Root/AcroForm, adjusting the annotations, or putting in the /NeedsAppearances. If the handling in that library is correct, it might help resolve the need for workarounds on the PdfWriter or these other approaches.

I'm a bit frustrated that the readthedocs for this library make it look like "filling out forms" work here without these workarounds noted. I was incredibly pumped up to see that the latest release for pypdf was on June 4. My use case is filling out forms and having them reliably render in PDF tools (macOS Preview being one of them).

My testing in ipython and macOS Preview led to reported behavior (the value were there, but only when I clicked into the fields, and then they will disappear after focus is lost). I'll try calling PdfWriter.set_need_appearances_writer() directly. But I'm probably going to have to look for another library in another language.

pubpub-zz · 2023-06-06T14:06:43Z

@pubpub-zz - I think @binury was offering that the referenced library's filled in textfields shown as expected without having to do these workarounds with /Root/AcroForm, adjusting the annotations, or putting in the /NeedsAppearances. If the handling in that library is correct, it might help resolve the need for workarounds on the PdfWriter or these other approaches.

I'm a bit frustrated that the readthedocs for this library make it look like "filling out forms" work here without these workarounds noted. I was incredibly pumped up to see that the latest release for pypdf was on June 4. My use case is filling out forms and having them reliably render in PDF tools (macOS Preview being one of them).

My testing in ipython and macOS Preview led to reported behavior (the value were there, but only when I clicked into the fields, and then they will disappear after focus is lost). I'll try calling PdfWriter.set_need_appearances_writer() directly. But I'm probably going to have to look for another library in another language.

Thanks for the comment : I may have read too quickly the message
I understand your position and agree with it. The workaround of needappearance is not the. Best. As said aboveI've produced PR #1864 that is generating the display. It is a first release but If you can test it, it would be great

alenards · 2023-06-06T15:28:58Z

@pubpub-zz - I definitely appreciate the effort for all the folx keeping pypdf maintained. All of the PyPDF2, PyPDF4, all that is dizzying; so relieved to see this library active.

I'll see if I can look at #1864 - and I'll comment there on that PR thread.

Thanks again.

thomasweiland93 · 2023-07-14T15:43:12Z

Hello, together I have taken a look on #1864 and tested with a PDF from my company. But unfortunately the appearance doesn't look correct on Iphones etc.

The Problem might be following structure on my pdf:

Parent (writer_parent_annot)
{'/DA': '/MyriadPro-Regular 9 Tf 0 0.290 0.439 rg', '/FT': '/Tx', '/Kids': [IndirectObject(88, 0, 2973937844032), IndirectObject(85, 0, 2973937844032)], '/T': '08-Mail2', '/V': '[[email protected]]'}
Child (writer_annot)
{'/AP': {'/N': IndirectObject(89, 0, 2973937844032)}, '/F': 4, '/MK': {}, '/P': IndirectObject(49, 0, 2973937844032), '/Parent': IndirectObject(87, 0, 2973937844032), '/Rect': [246.47300000000001, 232.13200000000001, 513.09299999999996, 220.27699999999999], '/Subtype': '/Widget', '/Type': '/Annot'}

In this case the code runs just in the else case of update_page_form_field_values and sets AA.AS to /Off

To get a correct view on the Iphone Viewer i have done some small changes in the _writer.py... (but just a messy fix for my current pdf)

I have used the /DA /FT and /V from the writer_parent_annot an the rest from the writer_annot.

Is this a know Issue?

Chrisd204 · 2024-01-20T18:18:36Z

@ademidun That worked perfectly (I'd high five you right now if I could)! Thank you very much! For anyone else interested, the following worked for me:

from PyPDF2 import PdfFileWriter, PdfFileReader
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject

def set_need_appearances_writer(writer: PdfFileWriter):
    # See 12.7.2 and 7.7.2 for more information: http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
    try:
        catalog = writer._root_object
        # get the AcroForm tree
        if "/AcroForm" not in catalog:
            writer._root_object.update({
                NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})

        need_appearances = NameObject("/NeedAppearances")
        writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
        return writer

    except Exception as e:
        print('set_need_appearances_writer() catch : ', repr(e))
        return writer

infile = "input.pdf"
outfile = "output.pdf"

reader = PdfFileReader(open(infile, "rb"), strict=False)
if "/AcroForm" in reader.trailer["/Root"]:
    reader.trailer["/Root"]["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

writer = PdfFileWriter()
set_need_appearances_writer(writer)
if "/AcroForm" in writer._root_object:
    writer._root_object["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

field_dictionary = {"Make": "Toyota", "Model": "Tacoma"}

writer.addPage(reader.getPage(0))
writer.updatePageFormFieldValues(writer.getPage(0), field_dictionary)

with open(outfile, "wb") as fp:
    writer.write(fp)

This solution works!

RomHartmann · 2024-01-23T19:32:37Z

This thread is crazy long, with a lot of old versions and red herrings.

As of now, this works for me for pypdf==3.17.4

import pypdf
from pypdf import generic as pypdf_generic

# ... load file
reader = pypdf.PdfReader(file)
writer = pypdf.PdfWriter()

writer.set_need_appearances_writer()

for page_nr, page in enumerate(reader.pages):
    form_fields = page.get('/Annots')
    if form_fields:
        for field in form_fields.get_object():
            field_object = field.get_object()

            # any other logic
            field_object.update({
                pypdf_generic.NameObject('/V'): pypdf_generic.create_string_object(field_value)
            })
    writer.add_page(page)

# create your output file or stream
writer.write(output_file)

Conditions of my test:

single page PDF
Only text fields

caver456 · 2024-01-25T15:33:33Z

Thanks @RomHartmann that definitely got closer.

In the end, as someone else pointed out, flattening is only part of the answer, and relying on NeedAppearences didn't quite do the trick, so modifying the stream directly gave much better results. Here's a stackoverflow question spelling out these specific symptoms (not sure if they are the exact same symptoms as everyone else has been experiencing):

Flattened filled PDF form is 'of invalid format' on Android, and shows blank fields in Chrome extension

and the solution (for our use case, at least) that basically references another solution at https://stackoverflow.com/a/73655665/3577105 - thanks to @JeremyM4n for sure.

WMiller256 · 2024-10-15T04:41:48Z

@ademidun That worked perfectly (I'd high five you right now if I could)! Thank you very much! For anyone else interested, the following worked for me:

from PyPDF2 import PdfFileWriter, PdfFileReader
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject

def set_need_appearances_writer(writer: PdfFileWriter):
    # See 12.7.2 and 7.7.2 for more information: http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
    try:
        catalog = writer._root_object
        # get the AcroForm tree
        if "/AcroForm" not in catalog:
            writer._root_object.update({
                NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})

        need_appearances = NameObject("/NeedAppearances")
        writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
        return writer

    except Exception as e:
        print('set_need_appearances_writer() catch : ', repr(e))
        return writer

infile = "input.pdf"
outfile = "output.pdf"

reader = PdfFileReader(open(infile, "rb"), strict=False)
if "/AcroForm" in reader.trailer["/Root"]:
    reader.trailer["/Root"]["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

writer = PdfFileWriter()
set_need_appearances_writer(writer)
if "/AcroForm" in writer._root_object:
    writer._root_object["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

field_dictionary = {"Make": "Toyota", "Model": "Tacoma"}

writer.addPage(reader.getPage(0))
writer.updatePageFormFieldValues(writer.getPage(0), field_dictionary)

with open(outfile, "wb") as fp:
    writer.write(fp)

For future users: this may corrupt the PDF file, if that is the case for you one possible solution is to move the lines

set_need_appearances_writer(writer)
if "/AcroForm" in writer._root_object:
    writer._root_object["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

After the page-adding is complete, i.e.

from PyPDF2 import PdfFileWriter, PdfFileReader
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject

def set_need_appearances_writer(writer: PdfFileWriter):
    # See 12.7.2 and 7.7.2 for more information: http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
    try:
        catalog = writer._root_object
        # get the AcroForm tree
        if "/AcroForm" not in catalog:
            writer._root_object.update({
                NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})

        need_appearances = NameObject("/NeedAppearances")
        writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
        return writer

    except Exception as e:
        print('set_need_appearances_writer() catch : ', repr(e))
        return writer

infile = "input.pdf"
outfile = "output.pdf"

reader = PdfFileReader(open(infile, "rb"), strict=False)
if "/AcroForm" in reader.trailer["/Root"]:
    reader.trailer["/Root"]["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

writer = PdfFileWriter()
field_dictionary = {"Make": "Toyota", "Model": "Tacoma"}
writer.addPage(reader.getPage(0))
writer.updatePageFormFieldValues(writer.getPage(0), field_dictionary)

set_need_appearances_writer(writer)
if "/AcroForm" in writer._root_object:
    writer._root_object["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

with open(outfile, "wb") as fp:
    writer.write(fp)

stefan6419846 · 2024-10-15T07:37:35Z

While this surely is an old issue, I recommend to switch to the maintained pypdf instead which might already solve this out of the box.

mwhit74 mentioned this issue Sep 6, 2017

Fields created by updatePageFormFieldValues do not show in Adobe Acrobat and Foxit #300

Closed

mwhit74 mentioned this issue Mar 30, 2018

Rebooting PyPDF2 Maintenance #385

Closed

mwhit74 mentioned this issue Mar 30, 2018

BUG: Updated pdf fields don't show up when page is written #412

Merged

mwhit74 mentioned this issue Aug 7, 2018

After using updatePageFormFieldValues PyPDF2 cannot read fields with getFormTextFields #441

Closed

cryzed mentioned this issue Feb 16, 2023

BUG: Write /Root/AcroForm in set_need_appearances_writer #1639

Merged

MartinThoma closed this as completed in #1639 Mar 5, 2023

pubpub-zz self-assigned this Mar 10, 2023

csears123 mentioned this issue Mar 21, 2023

Update Text Field Values Generates Text Behind The Text Field #1618

Closed

brzGatsu mentioned this issue Jun 26, 2023

Forms: appearance generation broken for autosized fields, fields with center alignment and unicode text #1919

Open

Updated pdf fields don't show up when page is written #355

Updated pdf fields don't show up when page is written #355

Comments

segevmalool commented Jun 13, 2017 • edited by MartinThoma Loading

mwhit74 commented Aug 22, 2017

mwhit74 commented Sep 6, 2017 • edited Loading

ademidun commented Dec 16, 2017

ademidun commented Dec 20, 2017 • edited by MartinThoma Loading

Tromar44 commented Jan 24, 2018 • edited by MartinThoma Loading

ademidun commented Jan 25, 2018

Tromar44 commented Jan 25, 2018 • edited by MartinThoma Loading

ademidun commented Jan 25, 2018

ademidun commented Jan 25, 2018 • edited by MartinThoma Loading

Tromar44 commented Jan 25, 2018 • edited by MartinThoma Loading

kissmett commented Feb 2, 2018

caver456 commented Feb 8, 2018

shurshilov commented Apr 9, 2018

saipawan999 commented Oct 28, 2018

fvw222 commented Feb 15, 2019 • edited Loading

aatish29 commented Feb 19, 2019 • edited Loading

karnh commented Mar 25, 2019 • edited by MartinThoma Loading

madornetto commented Mar 26, 2019 • edited Loading

ghost commented Apr 11, 2019

willingham commented Oct 3, 2019

mjl commented Oct 8, 2019 • edited Loading

Nivatius commented Oct 19, 2019

zoiiieee commented Jan 19, 2020

sstamand commented Jan 30, 2020 • edited by MartinThoma Loading

brzGatsu commented Feb 18, 2023

cryzed commented Feb 19, 2023 • edited Loading

brzGatsu commented Feb 19, 2023 • edited Loading

pubpub-zz commented Feb 19, 2023

cryzed commented Feb 19, 2023

csears123 commented Mar 10, 2023

csears123 commented Mar 20, 2023

binury commented Apr 29, 2023 • edited Loading

michael-hoang commented May 3, 2023

pubpub-zz commented May 3, 2023

pubpub-zz commented May 3, 2023

binury commented May 3, 2023 • edited Loading

alenards commented Jun 6, 2023

pubpub-zz commented Jun 6, 2023

pubpub-zz commented Jun 6, 2023

alenards commented Jun 6, 2023 • edited Loading

pubpub-zz commented Jun 6, 2023

alenards commented Jun 6, 2023

thomasweiland93 commented Jul 14, 2023

Chrisd204 commented Jan 20, 2024

RomHartmann commented Jan 23, 2024 • edited Loading

caver456 commented Jan 25, 2024

WMiller256 commented Oct 15, 2024 • edited Loading

stefan6419846 commented Oct 15, 2024

segevmalool commented Jun 13, 2017 •

edited by MartinThoma

Loading

mwhit74 commented Sep 6, 2017 •

edited

Loading

ademidun commented Dec 20, 2017 •

edited by MartinThoma

Loading

Tromar44 commented Jan 24, 2018 •

edited by MartinThoma

Loading

Tromar44 commented Jan 25, 2018 •

edited by MartinThoma

Loading

ademidun commented Jan 25, 2018 •

edited by MartinThoma

Loading

Tromar44 commented Jan 25, 2018 •

edited by MartinThoma

Loading

fvw222 commented Feb 15, 2019 •

edited

Loading

aatish29 commented Feb 19, 2019 •

edited

Loading

karnh commented Mar 25, 2019 •

edited by MartinThoma

Loading

madornetto commented Mar 26, 2019 •

edited

Loading

mjl commented Oct 8, 2019 •

edited

Loading

sstamand commented Jan 30, 2020 •

edited by MartinThoma

Loading

cryzed commented Feb 19, 2023 •

edited

Loading

brzGatsu commented Feb 19, 2023 •

edited

Loading

binury commented Apr 29, 2023 •

edited

Loading

binury commented May 3, 2023 •

edited

Loading

alenards commented Jun 6, 2023 •

edited

Loading

RomHartmann commented Jan 23, 2024 •

edited

Loading

WMiller256 commented Oct 15, 2024 •

edited

Loading