Not able to deal with errors in the bookmark structure #2236

PAlvesLancs · 2023-10-03T08:52:37Z

I am using the code below (https://stackoverflow.com/questions/54303318/read-all-bookmarks-from-a-pdf-document-and-create-a-dictionary-with-pagenumber-a) as a starting point and it crashes in several PDFs (see an example here: https://easyupload.io/7fsipz).

Apparently, the PDF itself has some structural errors, but pypdf is not able to ignore them.
The output:

"( ValueError: not enough values to unpack (expected 3, got 1)"
C:\Users\XXXXX\PycharmProjects\pythonProject\venv\Scripts\python.exe "C:\Google Drive\python\projects\Get bookmarks.py"
Traceback (most recent call last):
File "C:\Google Drive\python\projects\Get bookmarks.py", line 24, in
bms = bookmark_dict(reader.outline, use_labels=False)
File "C:\Users\XXXXX\PycharmProjects\pythonProject\venv\lib\site-packages\pypdf_reader.py", line 844, in outline
return self._get_outline()
File "C:\Users\XXXXX\PycharmProjects\pythonProject\venv\lib\site-packages\pypdf_reader.py", line 880, in _get_outline
outline_obj = self._build_outline_item(node)
File "C:\Users\XXXXX\PycharmProjects\pythonProject\venv\lib\site-packages\pypdf_reader.py", line 1054, in _build_outline_item
outline_item = self._build_destination(title, dest)
File "C:\Users\XXXXX\PycharmProjects\pythonProject\venv\lib\site-packages\pypdf_reader.py", line 1018, in _build_destination
return Destination(title, page, Fit(fit_type=typ, fit_args=array)) # type: ignore
File "C:\Users\XXXXX\PycharmProjects\pythonProject\venv\lib\site-packages\pypdf\generic_data_structures.py", line 1495, in init
(
ValueError: not enough values to unpack (expected 3, got 2)
Process finished with exit code 1

The code (a direct use of the thread mentioned above).

from typing import Dict, Union
from pypdf import PdfReader

def bookmark_dict(
        bookmark_list, use_labels: bool = False
) -> Dict[Union[str, int], str]:
    result = {}
    for item in bookmark_list:
        if isinstance(item, list):
            result.update(bookmark_dict(item))
        else:
            page_index = reader.get_destination_page_number(item)
            page_label = reader.page_labels[page_index]
            if use_labels:
                result[page_label] = item.title
            else:
                result[page_index] = item.title
    return result

if __name__ == "__main__":
    folder ="x:\\"
    file="TestPDF.pdf"
    reader = PdfReader(folder + file)
    bms = bookmark_dict(reader.outline, use_labels=False)
    for page_nb, title in sorted(bms.items(), key=lambda n: f"{str(n[0]):>5}"):
         print(f"{page_nb:>3}: {title}")

The PDF file that is giving me an error can be found here:

Thanks guys!

The text was updated successfully, but these errors were encountered:

pubpub-zz · 2023-10-03T17:40:42Z

the PDF has outlines where the /XYZ destination has no top parameter. This is not in accordance with PDF reference however Acrobat Reader can process them. I pushed the test up to remove left parameter and the test is still good. I've added robustifcation for this case too.
cleaned test file below
tt1.pdf

closes py-pdf#2236

Closes #2236

pubpub-zz added a commit to pubpub-zz/pypdf that referenced this issue Oct 3, 2023

ROB: XYZ destination to cope with missing left and top param

73a3134

closes py-pdf#2236

pubpub-zz mentioned this issue Oct 3, 2023

ROB: XYZ destination to cope with missing left and top param #2237

Merged

MartinThoma closed this as completed in #2237 Oct 7, 2023

MartinThoma pushed a commit that referenced this issue Oct 7, 2023

ROB: XYZ destination to cope with missing left and top param (#2237)

4b090ba

Closes #2236

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not able to deal with errors in the bookmark structure #2236

Not able to deal with errors in the bookmark structure #2236

PAlvesLancs commented Oct 3, 2023

pubpub-zz commented Oct 3, 2023

Not able to deal with errors in the bookmark structure #2236

Not able to deal with errors in the bookmark structure #2236

Comments

PAlvesLancs commented Oct 3, 2023

pubpub-zz commented Oct 3, 2023