Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault at caling get_cdrawings(extended=True) #2556

Closed
pulsar314 opened this issue Jul 25, 2023 · 4 comments
Closed

Segmentation fault at caling get_cdrawings(extended=True) #2556

pulsar314 opened this issue Jul 25, 2023 · 4 comments
Labels

Comments

@pulsar314
Copy link

Description

If a document contains sequences like

q
100 100 m
W n

invocation of page.get_cdrawings(extended=True) results in a segfault.

In this case dev_pathdict is being cleared due to the empty commands list, but jm_lineart_clip_path and jm_lineart_clip_stroke_path are trying to get a value from the dict.

Configuration

  • Linux MINT
  • Python 3.8
  • PyMuPDF 1.22.5, installed via pip
@JorjMcKie
Copy link
Collaborator

I think it is a duplicate of #2462 / #2539.
To confirm this, do you have an example / reproducing file at hand please?

@pulsar314
Copy link
Author

Unfortunately, I cannot provide you with the original PDF, but there is a reproduction of the failing sequence
segfault.pdf

@JorjMcKie
Copy link
Collaborator

Unfortunately, I cannot provide you with the original PDF, but there is a reproduction of the failing sequence segfault.pdf

Thanks a lot! Indeed, this error is not being fixed.

@JorjMcKie JorjMcKie added the bug label Jul 28, 2023
JorjMcKie added a commit that referenced this issue Jul 28, 2023
Guard against incompletely specified clip paths by checking whether any drawing items have been generated.
JorjMcKie added a commit that referenced this issue Jul 28, 2023
Ensure  #2556 is fixed properly.
@JorjMcKie JorjMcKie mentioned this issue Jul 28, 2023
JorjMcKie added a commit that referenced this issue Sep 11, 2023
For text extraction `get_text("words")`, or extractWORDS, words are defined as strings not containing white space.
This change allows adding up to 64 characters to also function as delimiters.
This allows for instance to separate words from punctuations or to decompose an e-mail address into its components.

Other changes:

Fixing #2522: correcting the typo

Remove some unnecessary setting of flags when creating annotations.

Fixing #2553:
Adjust plain text extraction to use the same approach as other variants. This entails using Unicode escape strings on output instead of using the output of fz_chartorune.
Another consequence is that standard text output is directed to a fz_buffer instead to a fz_output.

Fixing #2556: Add checking the existence of path dictionaries at every possible place.
Includes an additional test function.

Add functions JM_ignore_rect / JM_ignore_irect which return a bool. The functions return True if the rectangle should be ignored.
This is the case for infinite and empty rectangles, but also for any rectangle that has a common edge with the infinite rectangle.

Support variable setting of character border widths for insert_text() / insert_textbox(). This is a factor to be multiplied
with the font size. Default is 0.05 (read: 5% of the fontsize). This value is relevant for text rendering modes 1 and 2 only.

Fixing #2637:
In Page.insert_textbox, when the last word of a line won't fit in the line buffer, we did not increase the line position. This is now handled correctly.
JorjMcKie added a commit that referenced this issue Sep 19, 2023
Immunize against wrong path specifications by checking whether the current path dictionary actually exists.
@julian-smith-artifex-com
Copy link
Collaborator

Fixed in 1.23.4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants