Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fixes for wordx #432

Merged
merged 4 commits into from
Nov 26, 2024
Merged

fix: Fixes for wordx #432

merged 4 commits into from
Nov 26, 2024

Conversation

maxmnemonic
Copy link
Contributor

@maxmnemonic maxmnemonic commented Nov 25, 2024

  • Fixes for referencing drawing blip in wordx
  • Added safety try-except when trying to load pillow image from a docx blob (protection against unsupported image formats e.g. EMF, WMF, etc.).
  • Added explicit dependency on lxml.
  • Added tests.

Issue resolved by this Pull Request:
Resolves #417
Resolves #410

Checklist:

  • Documentation has been updated, if necessary.
  • Examples have been added, if necessary.
  • Tests have been added, if necessary.

Copy link

mergify bot commented Nov 25, 2024

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?:

🟢 Require two reviewer for test updates

Wonderful, this rule succeeded.

When test data is updated, we require two reviewers

  • #approved-reviews-by >= 2

…blob. Added explicit dependency on lxml.

Signed-off-by: Maksym Lysak <[email protected]>
…tests for docx, eased up dependency on lxml

Signed-off-by: Maksym Lysak <[email protected]>
@maxmnemonic maxmnemonic marked this pull request as ready for review November 26, 2024 09:25
pyproject.toml Outdated Show resolved Hide resolved
Signed-off-by: Maksym Lysak <[email protected]>
Copy link
Contributor

@cau-git cau-git left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@dolfim-ibm dolfim-ibm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@maxmnemonic maxmnemonic merged commit d0a1180 into main Nov 26, 2024
9 checks passed
@maxmnemonic maxmnemonic deleted the dev/word_fixes branch November 26, 2024 13:44
Manuel030 pushed a commit to Manuel030/docling that referenced this pull request Nov 27, 2024
* fixes for referencing drawing blip in wordx

Signed-off-by: Maksym Lysak <[email protected]>

* Added safety try-except when trying to load pillow image from a docx blob. Added explicit dependency on lxml.

Signed-off-by: Maksym Lysak <[email protected]>

* Added test for word file with embedded emf images, re-generated full tests for docx, eased up dependency on lxml

Signed-off-by: Maksym Lysak <[email protected]>

* Updated lxml dependency version

Signed-off-by: Maksym Lysak <[email protected]>

---------

Signed-off-by: Maksym Lysak <[email protected]>
Co-authored-by: Maksym Lysak <[email protected]>
Manuel030 pushed a commit to Manuel030/docling that referenced this pull request Nov 27, 2024
* fixes for referencing drawing blip in wordx

Signed-off-by: Maksym Lysak <[email protected]>

* Added safety try-except when trying to load pillow image from a docx blob. Added explicit dependency on lxml.

Signed-off-by: Maksym Lysak <[email protected]>

* Added test for word file with embedded emf images, re-generated full tests for docx, eased up dependency on lxml

Signed-off-by: Maksym Lysak <[email protected]>

* Updated lxml dependency version

Signed-off-by: Maksym Lysak <[email protected]>

---------

Signed-off-by: Maksym Lysak <[email protected]>
Co-authored-by: Maksym Lysak <[email protected]>
Signed-off-by: Manuel030 <[email protected]>
maxmnemonic added a commit that referenced this pull request Nov 27, 2024
* fix image index in word backend

Signed-off-by: Manuel030 <[email protected]>

* fix: Fixes for wordx (#432)

* fixes for referencing drawing blip in wordx

Signed-off-by: Maksym Lysak <[email protected]>

* Added safety try-except when trying to load pillow image from a docx blob. Added explicit dependency on lxml.

Signed-off-by: Maksym Lysak <[email protected]>

* Added test for word file with embedded emf images, re-generated full tests for docx, eased up dependency on lxml

Signed-off-by: Maksym Lysak <[email protected]>

* Updated lxml dependency version

Signed-off-by: Maksym Lysak <[email protected]>

---------

Signed-off-by: Maksym Lysak <[email protected]>
Co-authored-by: Maksym Lysak <[email protected]>
Signed-off-by: Manuel030 <[email protected]>

* sign dco

Signed-off-by: Manuel030 <[email protected]>

* correct rebase error

Signed-off-by: Manuel030 <[email protected]>

---------

Signed-off-by: Manuel030 <[email protected]>
Signed-off-by: Maksym Lysak <[email protected]>
Co-authored-by: Maxim Lysak <[email protected]>
Co-authored-by: Maksym Lysak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

parse docx file error : Using .DOCX format in cloud - suggestion on the below error?
3 participants