-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated pdf fields don't show up when page is written #355
Comments
I am having this same issue. The data does not show up in Adobe Reader unless you activate the field. The data does show up in Bluebeam but if you print, flatten, or push the pdf to a studio session all the data is lost. When the file is opened in Bluebeam it automatically thinks that the user has made changes, denoted by the asterisk next to the file name in the tab. If you export the fdf file from Bluebeam all the data is in the fdf file in the proper place. If you change any attribute of the field in Bluebeam or Adobe, it will recognize the text in that field. It will print correctly and flatten correctly. I am not sure if it will push to the Bluebeam studio but I assume it will. You can also just copy and paste the text in the field back into that field and it will render correctly. I have not found any help after googling around all day. I think it is an issue with PyPDF2 not "redrawing" the PDF correctly. I have contacted Bluebeam support and they have returned saying essentially that it is not on their end. |
Ok I think I have narrowed this down some by just comparing two different pdfs. For reference I am trying to read a pdf that was originally created by Bluebeam, use the updatePageFormFields() function in PyPDF2 to push a bunch of data from a database into the form fields, and save. At some point we want to flatten these and that is when it all goes wrong in Bluebeam. In Adobe it is messed up from the start in that you don't see any values in the form fields until you scroll over them with the mouse. I appears there is a problem with the stream object that follows the object(s) representing the text form field. See below. This is a sample output from a pdf generated by PyPDF2 for a text form field:
And if I back up and edit the same based file in Bluebeam the output from that pdf for a text form field looks like this (I think the border object can be ignored):
Ok so the biggest difference here is the stream object at the end. The value /V(Marshall CYG) gets updated in the first object of each pdf, objects 26 and 16 respectively. However the stream object in the PyPDF2 generated pdf does not get updated and the stream object from Bluebeam does get updated. In testing this theory I made a copy of the PyPDF2 pdf and manually edited the stream object in a text editor. I open this new file in Bluebeam and flattened it. It worked. This also appears to work in adobe reader. Now how to fix.... |
A potential solution seems to be setting the Need Appearances flag. |
Okay, I think I have figured it out. If you read section 12.7.2 (page 431) of the PDF 1.7 specification, you will see that you need to set the NeedAppearances flag of the Acroform. reader = PdfFileReader(open(infile, "rb"), strict=False)
if "/AcroForm" in reader.trailer["/Root"]:
reader.trailer["/Root"]["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)}
)
writer = PdfFileWriter()
if "/AcroForm" in writer._root_object:
writer._root_object["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)}
) |
ademidun - Can you elaborate on your suggested solution above? I too am having problems with pdf forms, edited with PyPDF2, not showing field values without clicking in the field. With the code example below, how do you "set the NeedAppearances flag of the Acroform"? from PyPDF2 import PdfFileWriter, PdfFileReader
output = PdfFileWriter()
input = PdfFileReader(open("myInputPdf.pdf", "rb"))
field_dictionary = {'Make': 'Toyota', 'Model': 'Tacoma'}
for pageNum in range(input.numPages):
pageObj = input.getPage(pageNum)
output.addPage(pageObj)
output.updatePageFormFieldValues(pageObj, field_dictionary)
outputStream = open("myOutputPdf.pdf", "wb")
output.write(outputStream) I tried adding in your IF statements but two problems arise: 1) NameObject and BooleanObject are not defined within my PdfFileReader "input" variable (I do not know how to do that) and 2) "/AcroForm" is not found within the PdfFileWriter object (my "output" variable). Thanks for any help! |
@Tromar44 Preamble, make sure your form is interactive. E.g. The pdf must already have editable fields.
|
@ademidun I thank you very much for your help but unfortunately I'm still not having any luck. To be clear, my simple test pdf form does have two editable fields and the script will populate them with "Toyota" and "Tacoma" respectively but those values are not visible unless I click on the field in the form (they become invisible again after the field loses focus). Here is the rewritten code that includes your suggestions and the results of running the code in inline comments. from PyPDF2 import PdfFileWriter, PdfFileReader
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject
infile = "myInputPdf.pdf"
outfile = "myOutputPdf.pdf"
reader = PdfFileReader(open(infile, "rb"), strict=False)
if "/AcroForm" in reader.trailer["/Root"]: # result: following "IF code is executed
print(True)
reader.trailer["/Root"]["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)})
writer = PdfFileWriter()
if "/AcroForm" in writer._root_object: # result: False - following "IF" code is NOT executed
print(True)
writer._root_object["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)})
if "/AcroForm" in writer._root_object["/AcroForm"]: # result: "KeyError: '/AcroForm'
print(True)
writer._root_object["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)})
if "/AcroForm" in writer.trailer["/Root"]["/AcroForm"]: # result: AttributeError: 'PdfFileWriter' object has no attribute 'trailer'
print(True)
writer._root_object["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)})
field_dictionary = {"Make": "Toyota", "Model": "Tacoma"}
writer.addPage(reader.getPage(0))
writer.updatePageFormFieldValues(writer.getPage(0), field_dictionary)
outputStream = open(outfile, "wb")
writer.write(outputStream) I would definitely appreciate any more suggestions that you may have! Thank you very much! |
It may also be a browser issue. I don't have the links anymore but I remember reading about some issues when opening/creating a PDF on Preview on Mac or viewing it in the browser vs. using an Adobe app etc. Maybe if you google things like "form fields only showing on click" or "form fields only active on click using preview mac". I also recommend reading the PDF spec link I posted, its a bit dense but a combination of all these should get you in the right direction. |
@Tromar44 Okay, I also found this snippet from my code, maybe it will help: def set_need_appearances_writer(writer: PdfFileWriter):
# See 12.7.2 and 7.7.2 for more information: http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
try:
catalog = writer._root_object
# get the AcroForm tree
if "/AcroForm" not in catalog:
writer._root_object.update({
NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)
})
need_appearances = NameObject("/NeedAppearances")
writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
# del writer._root_object["/AcroForm"]['NeedAppearances']
return writer
except Exception as e:
print('set_need_appearances_writer() catch : ', repr(e))
return writer |
@ademidun That worked perfectly (I'd high five you right now if I could)! Thank you very much! For anyone else interested, the following worked for me: from PyPDF2 import PdfFileWriter, PdfFileReader
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject
def set_need_appearances_writer(writer: PdfFileWriter):
# See 12.7.2 and 7.7.2 for more information: http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
try:
catalog = writer._root_object
# get the AcroForm tree
if "/AcroForm" not in catalog:
writer._root_object.update({
NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})
need_appearances = NameObject("/NeedAppearances")
writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
return writer
except Exception as e:
print('set_need_appearances_writer() catch : ', repr(e))
return writer
infile = "input.pdf"
outfile = "output.pdf"
reader = PdfFileReader(open(infile, "rb"), strict=False)
if "/AcroForm" in reader.trailer["/Root"]:
reader.trailer["/Root"]["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)})
writer = PdfFileWriter()
set_need_appearances_writer(writer)
if "/AcroForm" in writer._root_object:
writer._root_object["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)})
field_dictionary = {"Make": "Toyota", "Model": "Tacoma"}
writer.addPage(reader.getPage(0))
writer.updatePageFormFieldValues(writer.getPage(0), field_dictionary)
with open(outfile, "wb") as fp:
writer.write(fp) |
@ademidun you great!!! |
Just stumbled upon this solution - great work! A couple of issues I noticed - can you reproduce them? - won't have time to send test case details for a couple of days yet if you need them; we had been using the good-ol fdfgen-then-pdftk-subprocess-call method but would like to get away from the external pdftk dependency so pypdf2 is great:
|
Borrowed code from ademidun in the comment history and inserted it into the proper location in the pdf.py module. Made some changes to the function to make it a method of the class. It appears to work. I don't have a huge test suite set up to check it.
output.pdf |
Hi i am facing the same issue...i have tried setting need lreferences true also.when i edited pdf using pypdf2 some fields are displaying correctly and some are displaying only after i click on that filed.Please help me out on this issue as it is blocking me from the work. |
The fields were showing up anyway, but not checkboxes, so I wanted to advance to the full state of the art before trying more tweaking. From: py-pdf/pypdf#355
The code works great! but only for PDFs with one page. I tried splitting my PDF into several one page files and looped through it. This worked great but when I merged them back together, the click-to-reveal-text problem reemerged. The problem lies in the .addPage command for the PdfFileWritter.
When I enter this and try to save, I get an error message: "TypeError: argument should be integer or None, not 'NullObject'" It seems that the .addpage does not append the filewriter but treats each page as a seperate object. Does some one have a solution for this? Problem solved: |
Hi All, Thanks for your help. I was able to view the text fields of the PDF Form using pypdf2. But still could not figure out to make the visibility(need appearances) of the checkbox of PDF Form. Tried with this logic : Thanks in advance. |
I found answer for checkboxes issue at https://stackoverflow.com/questions/35538851/how-to-check-uncheck-checkboxes-in-a-pdf-with-python-preferably-pypdf2. def updateCheckboxValues(page, fields):
for j in range(0, len(page['/Annots'])):
writer_annot = page['/Annots'][j].getObject()
for field in fields:
if writer_annot.get('/T') == field:
writer_annot.update({
NameObject("/V"): NameObject(fields[field]),
NameObject("/AS"): NameObject(fields[field])
}) And as the comment says checked value could be anything depending on how the form was created. It was present in '/AP' for me. Which I extracted using |
ok, I have implemented the above and it works on my pdf forms however once the form has been updated by the python it can't be run through the code a second time, as getFormFields returns an empty list. If I open the updated pdf in Adobe and add a space to the end of a form field value and save, run the code on the form again, getFormFields returns the correct list. |
I am having the same problem: fields not visible fixed by above-mentioned set_need_appearances_writer() approach but getFormFields/pdftk dump_data_fields does not see them. In addition, it looks like my fonts somehow get messed up: one of the fields is actually a barcode font. But, after going through PyPDF2 to make a copy with updated fields, the field that uses the barcode font in the original copy now uses one of the other fonts. |
I'm experiencing the same click-to-reveal-text issue. Here are a few interesting things I have noticed.
|
For reference, I just stumbled on the same issue. The problem is that the generated pdf does not have an /AcroForm, and the easiest solution is probably to copy it over from the source file like this:
|
@mjl can you elaborate how to implement those lines? |
anyone figure out a solution to set /NeedAppearance for a pdf with multiple pages? |
To include multiple pages to the output PDF, I added the pages from the template onto the outpuf file.... if "/AcroForm" in pdf2._root_object:
pdf2._root_object["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)})
pdf2.addPage(pdf.getPage(0))
pdf2.updatePageFormFieldValues(pdf2.getPage(0), student_data)
**pdf2.addPage(pdf.getPage(1))
pdf2.addPage(pdf.getPage(2))**
outputStream = open(cs_output, "wb")
pdf2.write(outputStream)
outputStream.close() |
Looking forward to your findings. We are currently stuck with pdftk due to this bug... would love to switch to PyPDF once the issue has been resolved. Thanks for taking a look at it! |
I read through the PDF docs and took a look at However, I'll reopen my pull request -- the bug that prevents the proper creation of |
If I understand correctly, with your PR we could split our pdf into single pages, fill the forms individually and then merge them again? Would that work for Acrobat? |
@brzGatsu |
@brzGatsu no, that won't work. The issue is that a PDF reader is supposed to render the appearance streams for all annotations if |
PdfWriter.set_need_appearances_writer() (whether called directly or indirectly by PdfWriter.update_page_form_field_values()) fails to create the /Root/AcroForm object correctly when it doesn't already exist in the PdfWriter object. See #355 (comment) for more details. Fixes #355
I am also experiencing this issue from Adobe Reader, where the NeedAppearances flag is only allowing the first page of the PDF to view the text in the fillable field, as @cryzed documented. On the second page if I click into the field the text becomes visible, only with the cursor focus. I haven't used pdftk but I will also explore that as an alternative. |
The example below of adding/updating the appearance-stream seemed to work for a 2-page PDF with fillable fields: |
Confirming: this is still an issue. Filled annotations do not display as expected in MacOS Preview/{Mobile,} Safari. They do render in Chrome & Acrobat |
Has anyone tried doing this for PyPDF2 v3.0.1? |
@michael-hoang |
@binury |
FWIW There is a working implementation available currently (in pdftk, of course, as mentioned in preview comment) that does not exhibit the same inconsistent appearance between readers. I think it's a stretch to claim Pypdf fill is working if the PDFs won't be displayed correctly when viewed on an iPhone… In any case… no help needed. just wanted to leave a comment earlier to let you guys know that it's broken still. In lieu of having access to iOS for testing… There is also a working implementation of creating default appearances in a Node lib called pdf-annot. As a reference that may shed some light on why the default appearances in pypdf aren't working. |
@binury - I'm having trouble filling only a library named Is it this: https://www.npmjs.com/package/ts-pdf-annot |
Your library seems to be JavaScript. I do not think there is a link with pypdf (python) |
@binury |
@pubpub-zz - I think @binury was offering that the referenced library's filled in textfields shown as expected without having to do these workarounds with I'm a bit frustrated that the readthedocs for this library make it look like "filling out forms" work here without these workarounds noted. I was incredibly pumped up to see that the latest release for My testing in |
Thanks for the comment : I may have read too quickly the message |
@pubpub-zz - I definitely appreciate the effort for all the folx keeping I'll see if I can look at #1864 - and I'll comment there on that PR thread. Thanks again. |
Hello, together I have taken a look on #1864 and tested with a PDF from my company. But unfortunately the appearance doesn't look correct on Iphones etc. The Problem might be following structure on my pdf: Parent (writer_parent_annot) In this case the code runs just in the else case of update_page_form_field_values and sets AA.AS to /Off To get a correct view on the Iphone Viewer i have done some small changes in the _writer.py... (but just a messy fix for my current pdf) I have used the /DA /FT and /V from the writer_parent_annot an the rest from the writer_annot. Is this a know Issue? |
This solution works! |
This thread is crazy long, with a lot of old versions and red herrings. As of now, this works for me for
Conditions of my test:
|
Thanks @RomHartmann that definitely got closer. In the end, as someone else pointed out, flattening is only part of the answer, and relying on NeedAppearences didn't quite do the trick, so modifying the stream directly gave much better results. Here's a stackoverflow question spelling out these specific symptoms (not sure if they are the exact same symptoms as everyone else has been experiencing): and the solution (for our use case, at least) that basically references another solution at https://stackoverflow.com/a/73655665/3577105 - thanks to @JeremyM4n for sure. |
For future users: this may corrupt the PDF file, if that is the case for you one possible solution is to move the lines set_need_appearances_writer(writer)
if "/AcroForm" in writer._root_object:
writer._root_object["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)}) After the page-adding is complete, i.e. from PyPDF2 import PdfFileWriter, PdfFileReader
from PyPDF2.generic import BooleanObject, NameObject, IndirectObject
def set_need_appearances_writer(writer: PdfFileWriter):
# See 12.7.2 and 7.7.2 for more information: http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
try:
catalog = writer._root_object
# get the AcroForm tree
if "/AcroForm" not in catalog:
writer._root_object.update({
NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})
need_appearances = NameObject("/NeedAppearances")
writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
return writer
except Exception as e:
print('set_need_appearances_writer() catch : ', repr(e))
return writer
infile = "input.pdf"
outfile = "output.pdf"
reader = PdfFileReader(open(infile, "rb"), strict=False)
if "/AcroForm" in reader.trailer["/Root"]:
reader.trailer["/Root"]["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)})
writer = PdfFileWriter()
field_dictionary = {"Make": "Toyota", "Model": "Tacoma"}
writer.addPage(reader.getPage(0))
writer.updatePageFormFieldValues(writer.getPage(0), field_dictionary)
set_need_appearances_writer(writer)
if "/AcroForm" in writer._root_object:
writer._root_object["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)})
with open(outfile, "wb") as fp:
writer.write(fp) |
While this surely is an old issue, I recommend to switch to the maintained pypdf instead which might already solve this out of the box. |
I'd like to use PyPDF2 to fill out a pdf form. So far, everything is going smoothly, including updating the field text. But when I write the pdf to a file, there is apparently no change in the form. Running this code:
prints text:
But when I open up test.pdf, there is no added text on the page! Help!
The text was updated successfully, but these errors were encountered: