Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Some images were not loaded. #354

Closed
udit-pandey-1 opened this issue May 24, 2024 · 2 comments
Closed

RuntimeError: Some images were not loaded. #354

udit-pandey-1 opened this issue May 24, 2024 · 2 comments

Comments

@udit-pandey-1
Copy link

udit-pandey-1 commented May 24, 2024

OS: Ubuntu - 20.04.6 LTS
Python: 3.11
Requirements:
unstructured==0.14.2 unstructured-inference==0.7.33 pillow-heif==0.16.0

I am getting the following error when extracting text and images from pdf:
image

The way I am using unstructured is:
image

Is there anyway we can fix this issue?

@tbs17
Copy link

tbs17 commented May 24, 2024

Hi there, some of the troubleshooting steps we can consider:

Check PDF File Accessibility:

Ensure the PDF file is accessible and not corrupted.
Try opening the file with a PDF reader to verify its integrity.

File Path and Permissions:
Verify that the file path is correct.
Ensure you have the necessary permissions to read the file.

Temporary File Issues:

The error suggests a problem with accessing a temporary file. Ensure that the file is correctly created and accessible before processing it.
You might want to use a different temporary directory or explicitly manage the creation and deletion of temporary files.

Handling Large Files:

If the PDF is very large, it might cause issues during processing. Try with a smaller PDF to see if the problem persists.

Library Versions and Dependencies:

Ensure that all the libraries (unstructured, PyMuPDF, etc.) are up-to-date. There might be bug fixes or improvements in newer versions.

Debugging and Logging:

Add logging to your script to capture more details about where the error occurs.
Log the paths of the files being processed, and any other relevant information.

@udit-pandey-1
Copy link
Author

Hi, thanks for the reply.
I got over the error by upgrading to unstructured-inference==0.7.33.
However, I am getting a different issue now and have opened an issue for it (Unstructured-IO/unstructured#3102).

I'll anyways close this one for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants