-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get_drawings()
does not see show_pdf_page
's layers
#2539
Comments
This is a duplicate. The fix for this bug will roll out with the next version. This example file In [1]: import fitz
In [2]: doc = fitz.open("text-oc.pdf")
In [3]: page = doc[0]
In [4]: page.get_drawings()
Out[4]:
[{'items': [('re', Rect(45.0, 45.0, 405.0, 305.0), 1)],
'type': 'fs',
'even_odd': False,
'fill_opacity': 1.0,
'fill': (0.800000011920929, 0.800000011920929, 0.800000011920929),
'rect': Rect(45.0, 45.0, 405.0, 305.0),
'seqno': 0,
'layer': 'graphic',
'stroke_opacity': 1.0,
'color': (0.0, 0.0, 1.0),
'width': 1.0,
'lineCap': (0, 0, 0),
'lineJoin': 0.0,
'closePath': False,
'dashes': '[ 3 1 ] 0'}] |
Great, thank you for the super quick answer :) Out of curiosity, can you share a link to the original bug/fix? |
This was the closed issue #2462. Why closed: |
Thanks again for the detail :) At the risk of sounding argumentative I'm not convinced it's the same bug...
For what it's worth, I still get randomly garbled strings (but no crashes) on
This seem to be fixable by copying the name string in static void
jm_lineart_begin_layer(fz_context *ctx, fz_device *dev_, const char *name)
{
// layer_name = name;
layer_name = realloc(layer_name, strlen(name) + 1);
strcpy(layer_name, name);
}
|
Ah, of course you are right: this is your bug, not my bug 😂! from pathlib import Path
import fitz
colors = [
("red", (1, 0, 0)),
("green", (0, 1, 0)),
("blue", (0, 0, 1)),
("grey", (0.5, 0.5, 0.5)),
]
doc = fitz.open()
bounds = fitz.Rect(0, 0, 205, 55)
page = doc.new_page()
page.set_mediabox(bounds)
for i, (name, color) in enumerate(colors):
tmp_doc = fitz.open()
oc = tmp_doc.add_ocg(name)
tmp_page = tmp_doc.new_page()
tmp_page.set_mediabox(bounds)
shape = tmp_page.new_shape()
x, y = 5 + i * 50, 5
shape.draw_rect((x, y, x + 45, y + 45))
shape.finish(fill=color, oc=oc)
shape.commit()
page.show_pdf_page(bounds, tmp_doc, 0)
doc.save(Path(__file__).with_suffix(".pdf"))
print(f'{set(ocg["name"] for ocg in doc.get_ocgs().values()) = }')
print(f'{set(d.get("layer") for d in doc[0].get_drawings()) = }') |
You were assigning the OCG to the XObject representing the page insert by This is unfortunate but inevitable is like that. |
Ah, I didn't read the docs hard enough to find out about the So basically that makes your version an implementation of the workaround I was wondering about in my original post, as it wraps the shape in marked content tags:
But where does that leave us on |
As I wrote above: you and I are not guilty here. |
Method |
All this brings up the question about whether the layer name is of much value at all ... in |
I understand the situation from a PDF spec point-of-view. And I think the layer name does have value. However, the confusing part is that, as a user of the API and not an expert of the details of the PDF spec, I would expect OC to behave consistently just like it does in viewers (that is, without having to find out about stream tags vs xobjects attributes) Again, I understand now that this is not a PyMuPDF issue, it would fix itself if MuPDF would call |
They do this begin-end business upon encountering Also notable:
|
Well I'm no PDF expert so I cannot tell who/what is right or wrong here, I'm not even sure what should be reasonably expected anymore... my original example, example1.pdf:
your modified example, example2.pdf:
|
@snoyer yes, I have produced the same results, and in fact my previous post was based on them. So then let me wrap up on where we are, what can be expected and what is impossible to achieve. If a souce page is imported into a target page via
Behaviors explained under the two points above are inevitable and cannot be influenced for principle reasons. So the remaining part of your post remains the above referenced other issue. Obviously, all this shouldbe documented to avoid any such confusion going forward. |
For an example file, see: pymupdf/PyMuPDF#2539 (comment)
Documentation now corectly comments on this situation. |
Describe the bug
PyMuPDF's
get_drawings()
does not grab the layer information from pages generated usingshow_pdf_page
with anoc
arguments.To Reproduce
doc.show_pdf_page(..., oc="blah")
pymupdf-layers-evince.webm
doc.get_ocgs()
and observe PyMuPDF does finds layers in documentset(ocg["name"] for ocg in doc.get_ocgs().values()) = {'blue', 'green', 'grey', 'red'}
doc[0].get_drawings()
and observe PyMuPDF does find the drawings but they all have'layer': ''
set(d.get("layer") for d in doc[0].get_drawings()) = {''}
Expected behavior
The drawing items should have the appropriate OC name as their
layer
attribute.Your configuration
Additional context
Does
get_drawings
inferlayer
only based onBDC
/EMC
tags in streams, whileshow_pdf_page
marks the OC as an attribute on objects? In which case this may be a MuPDF bug rather than PyMuPDF? Would it be an acceptable workaround to haveshow_pdf_page
add [superfluous]BDC
/EMC
commands to make MuPDF happy?The text was updated successfully, but these errors were encountered: