-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Background job causes "Failed to acquire write lock for id: -333." #15445
Comments
Hi there @enkelmedia! Firstly, a big thank you for raising this issue. Every piece of feedback we receive helps us to make Umbraco better. We really appreciate your patience while we wait for our team to have a look at this but we wanted to let you know that we see this and share with you the plan for what comes next.
We wish we could work with everyone directly and assess your issue immediately but we're in the fortunate position of having lots of contributions to work with and only a few humans who are able to do it. We are making progress though and in the meantime, we will keep you in the loop and let you know when we have any questions. Thanks, from your friendly Umbraco GitHub bot 🤖 🙂 |
The examine indexes are built just after startup on cold-boots in the background and indeed reads all content. |
Hi! Thanks for your answer @bergmania! I was mistaken here, it was not related to Examine, but since the "symptoms" were, similar it was my initial suspicion. I digged deeper and found out what was going on and I think that the root cause is probably not a bug but maybe something that could be addressed in a better way. First I need to describe the setup that we have for development at the moment which is probably just because we "always had it like that" and it used to work well on older versions of Umbraco where the load balancing support wasn't great. We have a central database used for development and for CI/Test. Some days we might have two devs working against the db while the test-server is also connected. We do have the Anyway. Yesterday we ran a big batch-job on the test server that imported content and performed save/publish. The symptoms that I mistook for being examine index population was actually my machine processing all the cache instructions created by the import. I'm guessing that there were a lot of instructions in the queue and the central database is hosted with an external provider so some which add some network latency as well. So far things "make sense" given the setup and configuration we have. It might be a problem or issue that the whole process in In our setup we do not store the NuCache files on disk (we have I think that there might be a couple of things that could be considered here:
I'm not sure if the particular circumstances around or setup are the "real" problem here or if these ideas might be something that could be helpful for others as well. Let me know what you think. Cheers! |
This is interesting because we are coming to the same conclusion. Another symptom (we believe) of #14195 is now the cache instructions are no longer being populated on our Subscriber instance. Shutting off the Nucache db makes the site unbearably slow, so that is not an option. The only way we have found to fix the Subscriber instance is to restart the webapp :(. Looking at the Load Balancing documentation it mentions: The process is as follows:
I would love more insight on this process. It seems that the instructions are saved into umbracoCacheInstruction table from the Scheduler, and then picked up by the cacheInstructionService on each running Subscriber. Another symptom we have seen is something like the following on the Subscriber:
The CacheInstructionService is running here, and Lucene is trying to access a file within the TEMP directory of the app.
Between this caching issue and consistent Failed to Acquire writelock -333 errors, our production instance is extremely unstable. Users of the backoffice are constantly unable to publish pages due to the "writelock", and when they are able to publish, the are not seeing their changes populated on the Subscriber. The only workaround we have found is to restart everything or KILL processes (like I mentioned in #14195). It appears to me that Load Balancing is not fully fleshed out as the documentation says it is. Our only other option at this point is to go back to a single webapp for everything, and see if that helps our cause. |
Hi @chriskarkowsky.. TempFileSystemDirectoryFactory should only be used on multi-instance setups (multiple sites on the same filesystem). Otherwise I would also use SyncedTempFileSystemDirectoryFactory, as this should limit the required full index rebuild. |
Hi @bergmania, The documentation states: The single instance Backoffice Administrative Web App should be set to use SyncedTempFileSystemDirectoryFactory We have one single-instance appservice running for the backoffice, and then we have another appservice for the frontend, which we initially had scaled out to 2 instances. The frontend has TempFileSystemDirectoryFactory, and the backoffice has SyncedTempFileSystemDirectoryFactory. Am I misunderstanding? |
If by 2 instanced you mean some azure magic to scale out, then you are most likely correct. If it is two instances you control, and there by different webapp folders, then it should work with SyncedTempFileSystemDirectoryFactory basically just fall back and copy the entire index from the webapp folder, if it do not exist in the temp folder, instead of rebuilding everything using the database. No matter what, when you see the -333 lock, what is interesting is two things.
Multiple readers are allowed, but write locks are exclusive and forces both write and read to wait. |
We are able to get the "writelock" error pretty consistently in our dev environment after deploying code and then attempting to save a page in the backoffice. I looked at the UmbracoTraceLog. Note: at no point before this in the log do I see anything requesting WriteLock before this record, so I am assuming the answer to #1 is "nothing".
.....
I am gathering that this isn't an issue with the umbracoLock table itself, but rather an issue with a lock on the umbracoLock table :). The SQL query to obtain the lock doesn't complete, causing the error. This is why the only workaround I have found when we get this is to find the blocking process and killing it. I mentioned in the other issue that if you try to do the ObtainWriteLockQuery manually, it will run longer that whatever value is SET LOCK_TIMEOUT is within the method. If this ObtainWriteLock query constantly times out, users will never be able to Save/Publish. If this is indeed the case, much the like OP, the Unable to Obtain Writelock issue is a symptom of a larger issue with db queries taking a long time, and presenting itself when doing background jobs which we can't necessarily see within the logs, and maybe get some insight from Long Running Queries like OP. I understand this issue is hard to debug, so if I can be of help testing anything, I will gladly participate because this issue is wreaking havoc on our production environment. |
Which Umbraco version are you using? (Please write the exact version, example: 10.1.0)
12.3.5
Bug summary
We have been working on a migration project from v7 to v12 (over around 20k content items and 25k media) and were almost done when we started to see errors like this when saving content, document types and other "stuff":
I first thought it was an issue with some kind of notification handler so I disabled all of them. No luck. I've noticed that the error takes about 20-30 seconds to "appear".
While profiling the SQL Server, some time after the site has started we see a bunch of these:
I've seen other similar issues like:
#14195
and
#13804
To me, it feels like some background job starts after X seconds that keeps holding that lock on 333 which means that the edit actions can't get the lock to save changes. This happens both when I try to save content and document types.
I've tried to disable the "content version cleanup" using
ContentVersionCleanupPolicy
and maxed outDistributedLockingWriteLockDefaultTimeout
but nothing works. While this background job is processing the content we can't write to the database.I also tried the fix posted by @bergmania here: #13804 (comment) without success, I even disabled all the populators in the code sample but the background jobs that is blocking is still running.
I'm not sure what to do next?
Specifics
No response
Steps to reproduce
Create a site with 25k content and 25k media, use that database in a fresh folder of Umbraco.
Expected result / actual result
Saving should be possible even when background processing is running.
The text was updated successfully, but these errors were encountered: