PdfReadWarning: Superfluous whitespace found in object header b'1' b'0' [pdf.py:1666] #576

kalkovid19 · 2020-08-24T16:11:48Z

Hi all,
I coverting pdf file to text for processing. code was workig fine an drecently it started giving errors like below and not text extraction
PdfReadWarning: Superfluous whitespace found in object header b'1' b'0' [pdf.py:1666]

MCVE

from PyPDF2 import PdfReader

reader = PdfReader("TN_24.08.2020.pdf")
text = reader.pages[0].extract_text()
assert "Directorate" in text, text

my pdf file and process code are attached
pdf2txt.py.txt
TN_24.08.2020.pdf

Thanks in advance

luke4u · 2020-08-24T17:12:12Z

got the same issue. Can any please advise how to resolve it?

Grazx · 2020-11-25T17:31:49Z

Well I had the same problem, since I was trying to stamp a template PDF (made by me), on an existing one.
The solution:
I used Foxit Phatom to convert my Template file from PDF_1.4 to PDF_1.7 and the error "PdfReadWarning:" stopped showing.

Hope it helps.

EDIT:
I forgot to mention I also use the "PDF Optimizer" option in Phantom to "flatten" text and objects (more on that in https://www.foxitsoftware.com/blog/pdf-toolkit-pdf-optimizer/)

lmw0320 · 2022-02-16T02:03:47Z

Well I had the same problem, since I was trying to stamp a template PDF (made by me), on an existing one. The solution: I used Foxit Phatom to convert my Template file from PDF_1.4 to PDF_1.7 and the error "PdfReadWarning:" stopped showing.

Hope it helps.

EDIT: I forgot to mention I also use the "PDF Optimizer" option in Phantom to "flatten" text and objects (more on that in https://www.foxitsoftware.com/blog/pdf-toolkit-pdf-optimizer/)

Hi, for I have plenty of pdffiles, I want to find out all the content in pdffiles, but your solution can not be used as normal operation. Do you have any better code solution instead of treating the file by hand. Thanks

MartinThoma · 2022-04-07T16:35:54Z

Could somebody add a minimal Python script that shows the issue with the given files?

JGMSPY · 2022-04-15T17:24:58Z

Here is the code that gave me the subjected problem. I more or less randomly added a pdf file with 4 pages. I also got the error when using a single-paged PDF-file, and where the resulting file was OK.

from PyPDF2 import PdfFileReader, PdfFileWriter,PdfFileMerger

template = PdfFileReader(open('PythonHelp.pdf','rb'))
watermark = PdfFileReader(open("FactuurModelIkke.pdf", 'rb'))
output = PdfFileWriter()

for i in range(template.getNumPages()):
   page = template.getPage(i)
   page.mergePage(watermark.getPage(0))
   output.addPage(page)
file = open('waterMarked_PDF.pdf', 'wb')
output.write(file)

FactuurModelIkke.pdf
PythonHelp.pdf

Hope you can solve this.

Rapid1898-code · 2022-04-30T19:55:52Z

Hello - i have the same problem.
When anybody find a solution for that - this would be great.

MartinThoma · 2022-04-30T20:54:06Z

@Rapid1898-code "me too" comments don't provide any value. They distract and prevent devs from working on the issue.

If you want to help, please provide a full minimal example:

code
pdf
traceback
environment (python version, py-pdf version)

prz38573485 · 2022-05-18T09:56:50Z

I met the same issue.
PdfReadWarning: Superfluous whitespace found in object header b'225' b'0' [_reader.py:891]

pubpub-zz · 2022-05-18T17:17:32Z

All,
What you are reporting are some warnings that are not stopping the program. The PdfReadWarnings will not be reported if you set strict=False when calling the PdfFileReader constructor. In version 1.27 the default value is set to True, in the current 2.0.0-dev branch (for next release) it will be changed to False and by default the warnings will disappear without any change in your programs 😉

JGMSPY · 2022-05-18T19:20:37Z

All,
With great help from samples I managed to get the routine working without problems. This was my first Python experience and I learned a lot. Thanks you.

MartinThoma · 2022-06-06T11:47:15Z

I just checked with the current main branch and the minimal example from the first post - the issue is still there.

MartinThoma · 2022-07-09T13:56:15Z

I've just executed the MCVE example from the first post with the latest version of PyPDF2. Seems to work 🎉

MartinThoma added the is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF label Apr 7, 2022

MartinThoma added is-robustness-issue From a users perspective, this is about robustness Has MCVE A minimal, complete and verifiable example helps a lot to debug / understand feature requests labels Apr 16, 2022

VictorCarlquist mentioned this issue May 3, 2022

WIP: support font CMAP to translate chars with TJ operator #858

Closed

MartinThoma closed this as completed Jul 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PdfReadWarning: Superfluous whitespace found in object header b'1' b'0' [pdf.py:1666] #576

PdfReadWarning: Superfluous whitespace found in object header b'1' b'0' [pdf.py:1666] #576

kalkovid19 commented Aug 24, 2020 •

edited by MartinThoma

Loading

luke4u commented Aug 24, 2020

Grazx commented Nov 25, 2020 •

edited

Loading

lmw0320 commented Feb 16, 2022 •

edited

Loading

MartinThoma commented Apr 7, 2022

JGMSPY commented Apr 15, 2022 •

edited by MartinThoma

Loading

Rapid1898-code commented Apr 30, 2022

MartinThoma commented Apr 30, 2022

prz38573485 commented May 18, 2022

pubpub-zz commented May 18, 2022

JGMSPY commented May 18, 2022

MartinThoma commented Jun 6, 2022

MartinThoma commented Jul 9, 2022

PdfReadWarning: Superfluous whitespace found in object header b'1' b'0' [pdf.py:1666] #576

PdfReadWarning: Superfluous whitespace found in object header b'1' b'0' [pdf.py:1666] #576

Comments

kalkovid19 commented Aug 24, 2020 • edited by MartinThoma Loading

MCVE

luke4u commented Aug 24, 2020

Grazx commented Nov 25, 2020 • edited Loading

lmw0320 commented Feb 16, 2022 • edited Loading

MartinThoma commented Apr 7, 2022

JGMSPY commented Apr 15, 2022 • edited by MartinThoma Loading

Rapid1898-code commented Apr 30, 2022

MartinThoma commented Apr 30, 2022

prz38573485 commented May 18, 2022

pubpub-zz commented May 18, 2022

JGMSPY commented May 18, 2022

MartinThoma commented Jun 6, 2022

MartinThoma commented Jul 9, 2022

kalkovid19 commented Aug 24, 2020 •

edited by MartinThoma

Loading

Grazx commented Nov 25, 2020 •

edited

Loading

lmw0320 commented Feb 16, 2022 •

edited

Loading

JGMSPY commented Apr 15, 2022 •

edited by MartinThoma

Loading