Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Extracting picture data for raster images found in PPTX #349

Merged
merged 3 commits into from
Nov 18, 2024

Conversation

maxmnemonic
Copy link
Contributor

@maxmnemonic maxmnemonic commented Nov 15, 2024

This PR populates image data in docling documents by PPTX backend, also introduces basic PPTX tests.

Checklist:

  • Documentation has been updated, if necessary.
  • Examples have been added, if necessary.
  • Tests have been added, if necessary.

Copy link

mergify bot commented Nov 15, 2024

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?:

🟢 Require two reviewer for test updates

Wonderful, this rule succeeded.

When test data is updated, we require two reviewers

  • #approved-reviews-by >= 2

@maxmnemonic maxmnemonic marked this pull request as ready for review November 15, 2024 13:41
cau-git
cau-git previously approved these changes Nov 15, 2024
dolfim-ibm
dolfim-ibm previously approved these changes Nov 15, 2024
@PeterStaar-IBM PeterStaar-IBM marked this pull request as draft November 18, 2024 08:29
Signed-off-by: Maksym Lysak <[email protected]>
@maxmnemonic maxmnemonic dismissed stale reviews from dolfim-ibm and cau-git via 2240008 November 18, 2024 09:46
@maxmnemonic maxmnemonic marked this pull request as ready for review November 18, 2024 10:08
@maxmnemonic
Copy link
Contributor Author

Added tests, ready for re-review

dolfim-ibm
dolfim-ibm previously approved these changes Nov 18, 2024
doc.add_picture(parent=parent_slide, caption=None, prov=prov)
doc.add_picture(
parent=parent_slide,
image=ImageRef.from_pil(image=pil_image, dpi=72),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to hard-code 72 DPI here? I guess we have no better information about the actual DPI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added another commit, now extracting image DPI from the input file

Copy link
Contributor

@cau-git cau-git left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@PeterStaar-IBM PeterStaar-IBM self-requested a review November 18, 2024 14:20
Copy link
Contributor

@PeterStaar-IBM PeterStaar-IBM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@maxmnemonic maxmnemonic merged commit 7a97d71 into main Nov 18, 2024
8 checks passed
@maxmnemonic maxmnemonic deleted the dev/pptx_images branch November 18, 2024 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants