Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_toc(simple=False) return 'to' point coordinate is not based on top-left origin #3412

Closed
charosen opened this issue Apr 25, 2024 · 6 comments
Labels
not a bug not a bug / user error / unable to reproduce

Comments

@charosen
Copy link

charosen commented Apr 25, 2024

Description of the bug

i have a pdf, with outlines(titles) and content below:

1.1 Hello World

1.1.1. first step to hello world

content

and i want to extract all the outline(titles) and their coordinates in page.

when i use get_toc(simple=False), fitz return a toc list:

[[1,
  '1.1 Hello world',
  1,
  {'kind': 4,
   'xref': 41631,
   'page': 0,
   'to': Point(0.0, 761.8583),
   'zoom': 0.0,
   'nameddest': '_OPENTOPIC_TOC_PROCESSING_d13321e25969',
   'collapse': True,
   'color': (0.0, 0.0, 0.0)}],
 [2,
  '1.1.1 first step to hello world',
  1,
  {'kind': 4,
   'xref': 41632,
   'page': 0,
   'to': Point(0.0, 731.8583),
   'zoom': 0.0,
   'nameddest': '_OPENTOPIC_TOC_PROCESSING_d13321e25972',
   'collapse': True,
   'color': (0.0, 0.0, 0.0)}],
...
]

the returned 'to' points is not based on top-left origin, but bottom-left origin, because 1.1 Hello world is above 1.1.1 first step to hello world', but Point(0.0, 761.8583) is greater than Point(0.0, 731.8583),

it seems like pdf coordinates, not (py)mupdf coordinates.

how to covert those toc 'to' points to top-bottom coordinates.

How to reproduce the bug

import fitz

document = fitz.open('mypdf.pdf')

toc = document.get_toc(simple=False)

toc results:

[[1,
  '1.1 Hello world',
  1,
  {'kind': 4,
   'xref': 41631,
   'page': 0,
   **'to': Point(0.0, 761.8583),**
   'zoom': 0.0,
   'nameddest': '_OPENTOPIC_TOC_PROCESSING_d13321e25969',
   'collapse': True,
   'color': (0.0, 0.0, 0.0)}],
 [2,
  '1.1.1 first step to hello world',
  1,
  {'kind': 4,
   'xref': 41632,
   'page': 0,
   **'to': Point(0.0, 731.8583),**
   'zoom': 0.0,
   'nameddest': '_OPENTOPIC_TOC_PROCESSING_d13321e25972',
   'collapse': True,
   'color': (0.0, 0.0, 0.0)}],
...
]

PyMuPDF version

1.24.1

Operating system

Linux

Python version

3.9

@JorjMcKie
Copy link
Collaborator

You did not provide the reproducing file.

@charosen
Copy link
Author

You did not provide the reproducing file.

sorry, i could not upload mypdf file for some reason.

However, it is pretty clear that 'to' point in toc is based on bottom-left origin, not top-left origin.

i simply want to convert 'to' points to top-left coordinates.

@JorjMcKie
Copy link
Collaborator

It is not all clear:
What are we even looking at? Where do the "**" come from?
The TOC entries seem to point to named destinations - are there errors in the PDF? Or in our code?
Did the PDF creator want to point to the bottom left point 🤷‍♂️?
Have you tried to look at the PDF's names dictionary?

Again: without the file in question we are already wasting time.

@JorjMcKie
Copy link
Collaborator

Maybe you simply had a question and just wanted to know how to do coordinate transformation?
In that case you shouldn't have submitted an error report but a post in Discussions.

@charosen
Copy link
Author

It is not all clear: What are we even looking at? Where do the "**" come from? The TOC entries seem to point to named destinations - are there errors in the PDF? Or in our code? Did the PDF creator want to point to the bottom left point 🤷‍♂️? Have you tried to look at the PDF's names dictionary?

Again: without the file in question we are already wasting time.

Sorry for the "**" signs, i just want to get bolded fonts, and i already delete them.

my question is:

get_toc(simple=False) returns a Point(0.0, 761.8583) for 1.1 Hello World, and a Point(0.0, 731.8583) for 1.1.1. first step to hello world.

1.1 Hello World is above 1.1.1. first step to hello world, however, Point(0.0, 761.8583) is greater than Point(0.0, 731.8583), which is not based on pymupdf top-left coordinates.

@JorjMcKie
Copy link
Collaborator

Ok - to make some progress, I transferring this thread to Discussions, and we can continue there.

@JorjMcKie JorjMcKie added not a bug not a bug / user error / unable to reproduce and removed example required Waiting for information labels Apr 25, 2024
@pymupdf pymupdf locked and limited conversation to collaborators Apr 25, 2024
@JorjMcKie JorjMcKie converted this issue into discussion #3413 Apr 25, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
not a bug not a bug / user error / unable to reproduce
Projects
None yet
Development

No branches or pull requests

2 participants