You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi there, some of the troubleshooting steps we can consider:
Check PDF File Accessibility:
Ensure the PDF file is accessible and not corrupted.
Try opening the file with a PDF reader to verify its integrity.
File Path and Permissions:
Verify that the file path is correct.
Ensure you have the necessary permissions to read the file.
Temporary File Issues:
The error suggests a problem with accessing a temporary file. Ensure that the file is correctly created and accessible before processing it.
You might want to use a different temporary directory or explicitly manage the creation and deletion of temporary files.
Handling Large Files:
If the PDF is very large, it might cause issues during processing. Try with a smaller PDF to see if the problem persists.
Library Versions and Dependencies:
Ensure that all the libraries (unstructured, PyMuPDF, etc.) are up-to-date. There might be bug fixes or improvements in newer versions.
Debugging and Logging:
Add logging to your script to capture more details about where the error occurs.
Log the paths of the files being processed, and any other relevant information.
Hi, thanks for the reply.
I got over the error by upgrading to unstructured-inference==0.7.33.
However, I am getting a different issue now and have opened an issue for it (Unstructured-IO/unstructured#3102).
OS: Ubuntu - 20.04.6 LTS
Python: 3.11
Requirements:
unstructured==0.14.2 unstructured-inference==0.7.33 pillow-heif==0.16.0
I am getting the following error when extracting text and images from pdf:
The way I am using unstructured is:
Is there anyway we can fix this issue?
The text was updated successfully, but these errors were encountered: