Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug/Auto partition fails on text files which are empty or contain only whitespaces #3674

Closed
tc360950 opened this issue Sep 28, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@tc360950
Copy link
Contributor

Describe the bug
Inference of .txt file type fails if the file has only whitespaces.

To Reproduce

from tempfile import NamedTemporaryFile

from unstructured.partition.auto import partition

with NamedTemporaryFile(mode="w", suffix=".txt") as f:
    f.write("   \n")
    f.seek(0)
    elements = partition(filename=f.name)

Raises IndexError
Expected behavior
The file should be properly partitioned.

Environment Info
OS version: Linux-6.8.0-45-generic-x86_64-with-glibc2.35
Python version: 3.10.12
unstructured version: 0.15.13
unstructured-inference version: 0.7.36
pytesseract is not installed
Torch version: 2.4.1
Detectron2 is not installed
PaddleOCR is not installed
Libmagic version: file-5.41
magic file from /etc/magic:/usr/share/misc/magic
LibreOffice version: LibreOffice 7.3.7.2 30(Build:2)

@tc360950 tc360950 added the bug Something isn't working label Sep 28, 2024
cragwolfe pushed a commit that referenced this issue Sep 29, 2024
This is a fix for this
[bug](#3674), auto partition fails on text files which are empty or contain only whitespaces

Inference of .txt file type fails if the file has only whitespaces.

To Reproduce:

```
from tempfile import NamedTemporaryFile

from unstructured.partition.auto import partition

with NamedTemporaryFile(mode="w", suffix=".txt") as f:
    f.write("   \n")
    f.seek(0)
    elements = partition(filename=f.name)
```
@scanny
Copy link
Collaborator

scanny commented Dec 16, 2024

Fixed by #3675

@scanny scanny closed this as completed Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants
@scanny @tc360950 and others