-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build failing with "text file busy" #1609
Comments
Hrm, this shouldn't be happening, as we try to make and effort to block builds if another task is running for the version. This might be a stale build on the builders, but is more likely a bug |
I think this is the same thing? |
Looks like it, to me. |
I should clarify that I have tried wiping the virtualenvs for all my builds, which usually makes things work a little better for the first build after that, but then goes back to failing pretty much everytime after that. |
@jhamrick forgot to look into this on the servers, sorry for the delay. I see the Here's the specific command that that forked for python3 and never cleaned up: home/docs/checkouts/readthedocs.org/user_builds/nbgrader/envs/master/bin/python /home/docs/checkouts/readthedocs.org/user_builds/nbgrader/envs/master/bin/jupyter-nbconvert --to rst --execute --FilesWriter.build_directory=user_guide user_guide/03_generating_assignments.ipynb I've killed the processes for now, but sounds like it will come back. @colons I don't see the same defunct process behavior on your issue, unless it has already resolved |
@agjohnson Thanks for looking into this! The defunct python processes is definitely odd. There are two other types of errors that I've been seeing, both of which are bizarre to me, but perhaps are possibly related?
I am going to be away for a couple weeks, but will investigate the second type of failed build further when I get back. Do you have any insight as to what is causing the first type of build (that doesn't seem to have any error messages) to fail? |
My guess is that failures without response are likely the task not reporting a response when python process goes defunct, with subsequent errors about busy files a symptom of the defunct process still hanging around. |
I would love to see some progress on this as my project (https://readthedocs.org/projects/dataanalysispython/builds/) is failing to build for several days now. I tried wiping it to no avail. |
@khrapovs your project is also causing defunct processes, I've had to clean out stale tasks from both projects periodically. Your build processes aren't going defunct, they seem to be in a loop eating up a considerable amount of resources. |
@khrapovs the fact that you are loading thedoctest module and making heavy use of doctest syntax in your examples might be part of the problem -- see http://sphinx-doc.org/ext/doctest.html#confval-doctest_test_doctest_blocks |
@agjohnson Ok, I have removed doctest extension. I wiped the project. I deleted it completely a couple of times. It still does not compile. Moreover, it "builds" for two hours and then fails. Logs do not report any errors. |
@khrapovs Your last build was building for 60m and I see the same behavior on a build of your project. strace on the python process shows it's completely locked up -- perhaps in a loop? |
@khrapovs your use of ipython to perform calculations is the problem. I can reproduce this locally. I halted the sphinx build process when it hung locally looping over your calculation:
|
@agjohnson Yep, just found it and already pushed the fix. |
Cool that we could help resolving the issue. Closing this as the builds are back to normal 🌞 |
When building a project, if it tooks more than `REPO_LOCK_SECONDS` and while building after that time another build is triggered for the same Version and the same builder takes the task the lock will be considered "old" and remove and taken by the new build. This will end up in a collision when accessing the files and it could raise an exception like `IOError: [Errno 26] Text file busy`. Also, it could fail with another unexpected reasons. This PR increases the `max_lock_age` to the same value assigned for the project to end the build in order: * custom container time limit or, * `settings.DOCKER_LIMITS['time']` or, * `settings.REPO_LOCK_SECONDS` or, * 30 seconds Related to #1609
When building a project, if it tooks more than `REPO_LOCK_SECONDS` and while building after that time another build is triggered for the same Version and the same builder takes the task the lock will be considered "old" and remove and taken by the new build. This will end up in a collision when accessing the files and it could raise an exception like `IOError: [Errno 26] Text file busy`. Also, it could fail with another unexpected reasons. This PR increases the `max_lock_age` to the same value assigned for the project to end the build in order: * custom container time limit or, * `settings.DOCKER_LIMITS['time']` or, * `settings.REPO_LOCK_SECONDS` or, * 30 seconds Related to #1609
When building a project, if it tooks more than `REPO_LOCK_SECONDS` and while building after that time another build is triggered for the same Version and the same builder takes the task the lock will be considered "old" and remove and taken by the new build. This will end up in a collision when accessing the files and it could raise an exception like `IOError: [Errno 26] Text file busy`. Also, it could fail with another unexpected reasons. This PR increases the `max_lock_age` to the same value assigned for the project to end the build in order: * custom container time limit or, * `settings.DOCKER_LIMITS['time']` or, * `settings.REPO_LOCK_SECONDS` or, * 30 seconds Related to #1609
I've recently been having trouble with my builds failing, usually with the following error message (from https://readthedocs.org/projects/nbgrader/builds/3267954/):
I read somewhere else that this could be due to running multiple builds at the same time, but then if that's the case, how can I prevent this from happening? It happens on pretty much every commit (e.g. it will pass on
master
but fail onlatest
).The text was updated successfully, but these errors were encountered: