-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
image.get_pos()
returns wrong values for images nested in Form XObjects
#277
Comments
Unfortunately I can't really comment on the values returned by I acknowledge DPI is a problem, but not really a bug, since pdfium calculates it from pixel size relative to the occupied canvas area, so this is not actually the DPI metadata embedded in the image. The docs for (Also note, I would have expected people to use one of the package-specific templates for an issue like this, just to have version info available and so on. I plan to clarify point 2 of the checklist as this seems to be unclear.) |
I also checked with (If you're confident what pdfium returns is wrong, then feel free to ask about this on pdfium's mailing list or file a pdfium bug report.) - Update: see finding below |
@PasaOpasen Ah, I figured something out. The image seems to be recursively nested in Form XObjects (twice, actually). >>> import pypdfium2 as pdfium
>>> pdf = pdfium.PdfDocument("color_lines_bad.pdf")
>>> pdf
<PdfDocument uuid:9198c957 from '/home/me/Downloads/color_lines_bad.pdf'>
>>> page = pdf[0]
>>> list(page.get_objects(filter=[pdfium.raw.FPDF_PAGEOBJ_IMAGE], max_depth=1))
[]
>>> list(page.get_objects(filter=[pdfium.raw.FPDF_PAGEOBJ_IMAGE], max_depth=2))
[]
>>> list(page.get_objects(filter=[pdfium.raw.FPDF_PAGEOBJ_IMAGE], max_depth=3))
[<PdfImage uuid:0fc6e6e2>] |
image.get_pos()
returns wrong values for images nested in Form XObjects
I just filed https://bugs.chromium.org/p/pdfium/issues/detail?id=2100 for this. |
@mara004 thank u! |
Would you mind closing this issue? I don't think we can do much else now except wait for pdfium. Or do you reckon we should prevent |
Checklist
Reason for Generic issue (keyword/topic)
seems like a logic problem, not build
Description
I found that sometimes
.get_pos()
method returns wrong bbox for images.Example document
To reproduce:
Also I found that image meta has wrong dpi=96 instead of real 150, which can be found using
pdfimages
(poppler):The text was updated successfully, but these errors were encountered: