Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with tempfiles when repeatedly saving embeddings to archive #665

Closed
LoePhi opened this issue Feb 13, 2024 · 3 comments
Closed

Problem with tempfiles when repeatedly saving embeddings to archive #665

LoePhi opened this issue Feb 13, 2024 · 3 comments

Comments

@LoePhi
Copy link

LoePhi commented Feb 13, 2024

When repeatedly saving embeddings to an archive I get a NotADirectoryError on all but the first file.
From what I can tell everything still works, but it does lead to the temp files not being removed.
It only happens when content-storage is enabled.
It does not happen when saving to directories unless embeddings were saved to an archive before.

Python 3.9, txtai 6.3.0 on Windows 11

from txtai import Embeddings
from llm import get_openai_embeddings

data = [(1, "Apple"), (2, "pie"), (3, "wheat")]
for i in range(2):
    print(i)
    embeddings = Embeddings(
        {
            "transform": get_openai_embeddings,
            "content": True,
        }
    )
    embeddings.index(data)
    embeddings.save(f"mini{i}.zip")

0
1
Exception ignored in: <finalize object at 0x1cced87f4a0; dead>
Traceback (most recent call last):
File "C:\Users\redacted\Anaconda3\envs\semfs\lib\weakref.py", line 591, in call
return info.func(*info.args, **(info.kwargs or {}))
File "C:\Users\redacted\Anaconda3\envs\semfs\lib\tempfile.py", line 820, in _cleanup
cls._rmtree(name)
File "C:\Users\redacted\Anaconda3\envs\semfs\lib\tempfile.py", line 816, in _rmtree
_shutil.rmtree(name, onerror=onerror)
File "C:\Users\redacted\Anaconda3\envs\semfs\lib\shutil.py", line 759, in rmtree
return _rmtree_unsafe(path, onerror)
File "C:\Users\redacted\Anaconda3\envs\semfs\lib\shutil.py", line 629, in _rmtree_unsafe
onerror(os.unlink, fullname, sys.exc_info())
File "C:\Users\redacted\Anaconda3\envs\semfs\lib\tempfile.py", line 808, in onerror
cls._rmtree(path)
File "C:\Users\redacted\Anaconda3\envs\semfs\lib\tempfile.py", line 816, in _rmtree
_shutil.rmtree(name, onerror=onerror)
File "C:\Users\redacted\Anaconda3\envs\semfs\lib\shutil.py", line 759, in rmtree
return _rmtree_unsafe(path, onerror)
File "C:\Users\redacted\Anaconda3\envs\semfs\lib\shutil.py", line 610, in _rmtree_unsafe
onerror(os.scandir, path, sys.exc_info())
File "C:\Users\redacted\Anaconda3\envs\semfs\lib\shutil.py", line 607, in _rmtree_unsafe
with os.scandir(path) as scandir_it:
NotADirectoryError: [WinError 267] Der Verzeichnisname ist ungültig: 'C:\Users\redacted\AppData\Local\Temp\tmp5e5o97q5\documents'

@davidmezzetti
Copy link
Member

Hello - thank you for writing this up. I've seen this in the GitHub Actions logs for Windows builds for a while. I believe it's a known CPython issue (python/cpython#107408).

It looks like it's fixed in Python 3.11+. Perhaps it's worth trying a different Python version?

@LoePhi
Copy link
Author

LoePhi commented Feb 14, 2024

Thank you for the swift response. I just tested it with python 3.11.7, but the problem is still there. When trying to build a 3.12 environment I ran into an error with faiss, so i was not able to test that.

@davidmezzetti
Copy link
Member

It's hard to follow that CPython chain, it's possible it was only fixed in 3.12.

It looks like Faiss supports 3.12 but it hasn't been pushed to PyPI yet (kyamagu/faiss-wheels#88). You can try to install Faiss from GitHub (https://github.com/kyamagu/faiss-wheels).

Unfortunately, there isn't much I can do about either problem from txtai.

@LoePhi LoePhi closed this as completed Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants