-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix mkdir race condition in LooseObjectDB.store #91
Conversation
This replaces the conditional call to os.mkdir that raises an unintended FileExistsError if the directory is created between the check and the os.mkdir call, using a single os.makedirs call instead, with exist_ok=True. This way, we attempt creation in a way that produces no error if the directory is already present, while still raising FileExistsError if a non-directory filesystem entry (such as a regular file) is present where we want the directory to be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot!
Please do that I would prefer if GitDB could be removed from GitPython entirely, which is even worth a major version change to me. I dare to say that not too many people use it, and if they do, they should rather use an implementation that is more trustworthy (like the native git
one). In hindsight, I do consider it a mistake to have tried to write an ODB in python.
With that said, it might be that parts of GitPython rely on GitDB, maybe for writing, and I'd hope these could be ported over to GitPython ease maintenance - I firmly believe that most of the code in GitDB isn't actually used by GitPython if the default git
ODB backend is used.
Thanks--if I contemplate significant changes to, or related to, gitdb, I'll being by looking into whether I can remove GitPython's dependence on it or vendor the specific parts that GitPython needs. In this case, I just noticed that I was aware of a way to fix the race condition that might be acceptable. so I figured I'd open a PR. Although there are some other changes I may want to propose to the CI workflow in this repository, I'm not sure I really know enough to fix the most important gitdb-related issues, in GitPython or in gitdb itself. As my familiarity with the GitPython codebase increases, that might change. |
I would absolutely love if I could direct your incredible skill and energy away from the GitDB CI and towards making it unnecessary entirely :). It's probably more of a refactoring task, albeit a complex one.
It's a great gift to have you contribute to this project, and I love the idea to have more of that. The project, to my mind, has significant issues with quality and some limitations stem from these. The greatest problems come from incorrect handling of encodings, both for paths and for data, along with many naive implementations of some |
Thanks! Based on this, the only further gitdb CI pull request I will open now is one to re-add Python 3.7 support, as you indicated should be done in a few of the comments in gitpython-developers/GitPython#1654. (That will include CI, but also changing the lower bound back to 3.7 in |
Fixes #85
This replaces the conditional call to
os.mkdir
that raises an unintendedFileExistsError
if the directory is created between the check and theos.mkdir
call, using a singleos.makedirs
call instead, withexist_ok=True
.This way, we attempt creation in a way that produces no error if the directory is already present, while still raising
FileExistsError
if a non-directory filesystem entry (such as a regular file) is present where we want the directory to be. This is the advantage of this approach over the approach of swallowingFileExistError
as suggested in #85.Note, however, that
os.makedirs
behaves likemkdir -p
: it attempts to create parent directories (and their parents, etc.) if they do not already exist. So it should only be used if that is acceptable in this case. I am not aware of a reason it wouldn't be, but I am not very familiar with gitdb.So that aspect of the situation deserves special consideration in reviewing this PR. I'd be pleased to change the approach if
os.makdirs
is judged not suitable here. I think the approach suggested in #85 is reasonable, and it can be made more robust by checking that the directory exists after the creation attempt (or in other ways).The code was under test: that line is exercised in
TestExamples.test_base
,TestGitDB.test_writing
,TestLooseDB.test_basics
, andTestObjDBPerformance.test_large_data_streaming
. However, no test catches the race condition this fixes, and I have not added one.Testing that the race condition does not occur in the specific way as before by accessing and calling the same functions as before in the same order would be easy, but it would be more of an illusion of a regression test than a useful test. Testing by trying to brute-force a race condition, without modifying the operation of the code for the test, would work but the tests would take a very long time to run. Testing it in a way that is fairly robust against new ways of reintroducing the race condition and that is not too slow should be possible, but I don't know of a good way to do it; everything I've thought of would be complicated, and possibly make running the test in a debugger like
pdb
infeasible. So I have not added a regression test for this bug. However, if it is considered important to have one, then I can consider the matter further.