Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using docx format in cloud #72

Closed
acsankar opened this issue Nov 22, 2024 · 2 comments
Closed

using docx format in cloud #72

acsankar opened this issue Nov 22, 2024 · 2 comments
Assignees

Comments

@acsankar
Copy link

I am trying to use this in cloud and just trying to convert it to markdown without images. Assuming below error is coming when there are images in document. Any suggestions to fix this?

doc_converter = DocumentConverter(
allowed_formats=[InputFormat.DOCX],
format_options={
InputFormat.DOCX: WordFormatOption(pipeline_cls=SimplePipeline),
},
)

I am getting below error
---> 30 result = doc_converter.convert(temp_file.name)

18 frames
/usr/local/lib/python3.10/dist-packages/PIL/ImageFile.py in load(self)
375 if loader is None:
376 msg = f"cannot find loader for this {self.format} file"
--> 377 raise OSError(msg)
378 image = loader.load(self)
379 assert image is not None

OSError: cannot find loader for this WMF file

@cau-git
Copy link
Contributor

cau-git commented Nov 25, 2024

This seems to be a duplicate of DS4SD/docling#410, closing here.

@cau-git cau-git closed this as completed Nov 25, 2024
@maxmnemonic
Copy link
Contributor

I think the most likely problem here is that word file includes an image, blob of which can't be loaded by PIL library.
Error should trigger, even if file would be fully local.

@acsankar, any chance you could make an example file for this error?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants