Skip to content

Commit

Permalink
fix: use correct image index in word backend (#442)
Browse files Browse the repository at this point in the history
* fix image index in word backend

Signed-off-by: Manuel030 <[email protected]>

* fix: Fixes for wordx (#432)

* fixes for referencing drawing blip in wordx

Signed-off-by: Maksym Lysak <[email protected]>

* Added safety try-except when trying to load pillow image from a docx blob. Added explicit dependency on lxml.

Signed-off-by: Maksym Lysak <[email protected]>

* Added test for word file with embedded emf images, re-generated full tests for docx, eased up dependency on lxml

Signed-off-by: Maksym Lysak <[email protected]>

* Updated lxml dependency version

Signed-off-by: Maksym Lysak <[email protected]>

---------

Signed-off-by: Maksym Lysak <[email protected]>
Co-authored-by: Maksym Lysak <[email protected]>
Signed-off-by: Manuel030 <[email protected]>

* sign dco

Signed-off-by: Manuel030 <[email protected]>

* correct rebase error

Signed-off-by: Manuel030 <[email protected]>

---------

Signed-off-by: Manuel030 <[email protected]>
Signed-off-by: Maksym Lysak <[email protected]>
Co-authored-by: Maxim Lysak <[email protected]>
Co-authored-by: Maksym Lysak <[email protected]>
  • Loading branch information
3 people authored Nov 27, 2024
1 parent 29807a2 commit 767563b
Showing 1 changed file with 3 additions and 2 deletions.
5 changes: 3 additions & 2 deletions docling/backend/msword_backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -507,18 +507,19 @@ def get_docx_image(element, drawing_blip):

image_data = get_docx_image(element, drawing_blip)
image_bytes = BytesIO(image_data)
level = self.get_level()
# Open the BytesIO object with PIL to create an Image
try:
pil_image = Image.open(image_bytes)
doc.add_picture(
parent=self.parents[self.level],
parent=self.parents[level - 1],
image=ImageRef.from_pil(image=pil_image, dpi=72),
caption=None,
)
except (UnidentifiedImageError, OSError) as e:
_log.warning("Warning: image cannot be loaded by Pillow")
doc.add_picture(
parent=self.parents[self.level],
parent=self.parents[level - 1],
caption=None,
)
return

0 comments on commit 767563b

Please sign in to comment.