Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"KeyError: 0" when merging PDF page that has content-stream-inline images #196

Closed
yourcelf opened this issue Apr 25, 2015 · 2 comments · Fixed by #665
Closed

"KeyError: 0" when merging PDF page that has content-stream-inline images #196

yourcelf opened this issue Apr 25, 2015 · 2 comments · Fixed by #665
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF key-error Could be a bug, but also a robustness issue PdfMerger The PdfMerger component is affected

Comments

@yourcelf
Copy link

Attempting to merge in a PDF page which has an image stored inline in the content stream raises an error. Here's an example which generates such a PDF with reportlab:

import PyPDF2
from reportlab.pdfgen import canvas

pdf1 = PyPDF2.PdfFileReader(open("test.pdf", 'rb'))

c = canvas.Canvas("watermark.pdf")
c.drawInlineImage("watermark.png", 200, 300, 100, 100)
c.showPage()
c.save()

watermark = PyPDF2.PdfFileReader(open("watermark.pdf", 'rb'))
pdf1.getPage(0).mergePage(watermark.getPage(0))

Error:

Traceback (most recent call last):
  File "test.py", line 12, in <module>
    pdf1.getPage(0).mergePage(watermark.getPage(0))
  File "/venv/local/lib/python2.7/site-packages/PyPDF2/pdf.py", line 2013, in mergePage
    self._mergePage(page2)
  File "/venv/local/lib/python2.7/site-packages/PyPDF2/pdf.py", line 2058, in _mergePage
    page2Content, rename, self.pdf)
  File "/venv/local/lib/python2.7/site-packages/PyPDF2/pdf.py", line 1963, in _contentStreamRename
    op = operands[i]
KeyError: 0

It looks like _contentStreamRename doesn't expect to see a data object in the content stream.

It's easy to work around this by replacing canvas.drawInlineImage with canvas.drawImage -- but the inline variant is a valid PDF that may occur in the wild.

@schurlix
Copy link

Hi there, it seems I have a similar problem, and it also occurs
in _contentStreamRename in pdf.py with a key error. My script
takes a bunch of input files and pastes always 6 input pages on
one page of the output file. There is no problem if I only use
one input file with many pages, but more than one input files
throw the below error.

In the case of exactly one input file, operands in
_contentStreamRename is a list and there is no problem. In
the case of more than one input file operands is a dict and
my patch iterates over the values of the dict.

On the very bottom of my post you'll find the patch that fixed
the problem for me, but I am really not sure if it won't break
other things. Anyway, here you go:

the call and the traceback:

$ python ../../bin/mypdf.py C150334053445EUR2015* urxn.pdf
Traceback (most recent call last):
  File "../../bin/mypdf.py", line 42, in <module>
    out.schurlimerge (page)
  File "../../bin/mypdf.py", line 36, in schurlimerge
    self.newpage.mergeRotatedScaledTranslatedPage (page, 90, 2/3.0, offset_x,offset_y)
  File "/usr/local/lib/python2.7/dist-packages/PyPDF2/pdf.py", line 2462, in mergeRotatedScaledTranslatedPage
    ctm[2][0], ctm[2][1]], expand)
  File "/usr/local/lib/python2.7/dist-packages/PyPDF2/pdf.py", line 2299, in mergeTransformedPage
    PageObject._addTransformationMatrix(page2Content, page2.pdf, ctm), ctm, expand)
  File "/usr/local/lib/python2.7/dist-packages/PyPDF2/pdf.py", line 2255, in _mergePage
    page2Content, rename, self.pdf)
  File "/usr/local/lib/python2.7/dist-packages/PyPDF2/pdf.py", line 2160, in _contentStreamRename
    op = operands[i]
KeyError: 0

the script:

#!/usr/bin/env python

import PyPDF2
import sys 

ppmm = 2.83465
a4xmm = 210 
a4ymm = 297 
a4xp = a4xmm * ppmm
a4yp = a4ymm * ppmm

def pages (filenames):
   for filename in filenames:
      inpdf = PyPDF2.PdfFileReader(file(filename,"rb"))
      for i in range (inpdf.numPages):
         yield (inpdf.getPage (i))

class Writer:

   def __init__ (self, outfile):
      self.outfile = outfile
      self.curpagenum = 0 
      self.writer = PyPDF2.pdf.PdfFileWriter ()
      self.newpage = None

   def write (self):
      self.writer.write (file (self.outfile, "wb"))

   def schurlimerge (self, page):
      if self.curpagenum % 6 == 0:
         if self.newpage: self.newpage.update ()
         self.newpage = self.writer.addBlankPage(a4xp, a4yp)
      if self.curpagenum % 2 == 0: offset_y = 0 
      else: offset_y = a4yp / 2 
      offset_x = a4xp / 3.0 * ((self.curpagenum / 2) % 3) + a4xp / 3.0 
      self.newpage.mergeRotatedScaledTranslatedPage (page, 90, 2/3.0, offset_x,offset_y)
      self.curpagenum += 1

out = Writer (sys.argv [-1])

for page in pages (sys.argv [1:-1]):
   out.schurlimerge (page)

out.write ()

my patch:

Index: usr/local/lib/python2.7/dist-packages/PyPDF2/pdf.py
===================================================================
--- usr/local/lib/python2.7/dist-packages/PyPDF2/pdf.py (revision 2224)
+++ usr/local/lib/python2.7/dist-packages/PyPDF2/pdf.py (revision 2223)
@@ -43,6 +43,7 @@
 __maintainer_email = "[email protected]"

 import string
+import types
 import math
 import struct
 import sys
@@ -2156,10 +2157,18 @@
             return stream
         stream = ContentStream(stream, pdf)
         for operands, operator in stream.operations:
-            for i in range(len(operands)):
-                op = operands[i]
-                if isinstance(op, NameObject):
-                    operands[i] = rename.get(op,op)
+            if type (operands) == types.ListType:
+                for i in range(len(operands)):
+                    op = operands[i]
+                    if isinstance(op, NameObject):
+                        operands[i] = rename.get(op,op)
+            elif type (operands) == types.DictType:
+                for i in operands:
+                    op = operands[i]
+                    if isinstance(op, NameObject):
+                        operands[i] = rename.get(op,op)
+            else:
+                raise KeyError ("type of operands is %s" % type (operands))
         return stream
     _contentStreamRename = staticmethod(_contentStreamRename)

@josephernest
Copy link

josephernest commented May 20, 2018

@mstamy2 Please include @schurlix's patch , it works and it solves an annoying problem ;)

At the time of writing (20180520_1659), pip install pypdf2 on Python 2.7 64 didn't include it.

@MartinThoma MartinThoma added is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF PdfReader The PdfReader component is affected labels Apr 7, 2022
@py-pdf py-pdf deleted a comment from claird Apr 7, 2022
@MartinThoma MartinThoma added PdfMerger The PdfMerger component is affected and removed PdfReader The PdfReader component is affected labels Apr 7, 2022
MartinThoma added a commit that referenced this issue Apr 7, 2022
Appeared when merging PDFs that have
content-stream-inline images

This patch was provided by Georg Graf :
#196 (comment)
Thank you!

Closes #196
MartinThoma added a commit that referenced this issue Apr 7, 2022
Appeared when merging PDFs that have content-stream-inline images

This patch was provided by Georg Graf :
#196 (comment)
Thank you!

Closes #196
@MartinThoma MartinThoma added the key-error Could be a bug, but also a robustness issue label Aug 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF key-error Could be a bug, but also a robustness issue PdfMerger The PdfMerger component is affected
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants