Adjustments for SqlMainDomLock and others to make azure operations more resilient #8398

Shazwazza · 2020-07-08T03:09:52Z

Details are here: #8215

This should also resolve #8392

Improvements to SqlMainDomLock
- Better management of db instances and transactions - previously this was trying to manage a single database instances which isn't what should be done, db instances should exist for a short period of time and disposed of. Changed to more explicit code for managing transactions
- Increase polling times - though according to logs this wasn't causing problems but is still too aggressive for what it's doing
- Don't stop listening (shut down maindom) if there are SQL exceptions, only if the appdomain has actually triggered a shutdown. We cannot just stop listening even if there are errors since it will lead to premature MainDom shutdown when the app is not actually being shutdown. This means it will just retry on the next poll.
Investigation into scheduled publishing lock request timeouts (also reported here Scheduled content leads to SQL error when rebuilding cache #8392) which occurs around the same time as when SqlMainDomLock fails
- The PerformScheduledPublishInternal was using yield returns with a wrapped Scope/IDisposable which is a bit hard to follow along. One issue is that we were yield returning when the Saving event was canceled for an individual item but not continuing so the logic would have proceeded anyways for that item even if that event was canceled and found other similar issues where it would actually break out of the entire process instead of skip to the next record
- I believe one of the main issues seen with the lock request timeouts like those in Scheduled content leads to SQL error when rebuilding cache #8392 and in the logs for 8.6.1 - Azure Web App - Indexes disappear completely #8215 is because the PerformScheduledPublishInternal takes a WriteLock at the very beginning of the method. This is the write lock that times out according to the stack traces. This is probably because scheduled publishing runs every minute which is actually a bit crazy to think we lock the whole content tree every minute even if we aren't writing anything. And because of that I think this may also be one of the other main suspects as to why we saw SqlMainDomLock error around the same time we see the sql lock timeout error for scheduled publishing. To fix this:
  - We need a new and much faster method: DocumentRepository.HasContentForRelease instead of only relying on the potentially expensive DocumentRepository.GetContentForRelease, similarly for GetContentForExpiration we should also have HasContentForExpiration. Then we separate the job of PerformScheduledPublishInternal into 2 parts: First, without any read or write locks just check if either HasContentForRelease or GetContentForExpiration is true, then we can take a write lock and proceed as we are today. This will mean that there isn't a writelock on all content attempted to be made every minute.
Don't short circuit index rebuilding if one of the populators throws an exception, continue with the other populators
Brings our sql transient fault handling in-line with the latest specs from MS

…s fails

…transactions

…in a using! this will not work and transactions/connections will be lost

…, only shutdown if the appdomain is triggered to shutdown, else we'll keep listening/logging

… stored in my personal repo)

…nt standards, adds a scope for the health check schedule tasks

…, only take a write lock when necessary

bergmania

Changes looks good, and makes sense due to your description..

Only thing I don't understand/think is intentional, is the delete of the ssl-port

src/Umbraco.Web.UI/Umbraco.Web.UI.csproj

src/Umbraco.Core/Services/Implement/ContentService.cs

src/Umbraco.Web/Compose/DatabaseServerRegistrarAndMessengerComponent.cs

Shazwazza changed the base branch from v8/contrib to v8/dev July 8, 2020 03:10

Shazwazza added 6 commits July 8, 2020 13:43

fixes error logging

8814875

Ensure index rebuilding doesn't short circuit if one of the populator…

a233264

…s fails

Don't try to reuse db instances, thsi can result in potential zombie …

6aa4924

…transactions

transactions for sqlmaindom

65101be

Fix for PerformScheduledPublishInternal, don't use yield returns with…

53db2df

…in a using! this will not work and transactions/connections will be lost

Ensure we don't shutdown MainDom if there is an error while listening…

651756d

…, only shutdown if the appdomain is triggered to shutdown, else we'll keep listening/logging

Shazwazza force-pushed the v8/bugfix/sqlmaindom-updates branch from 56477d9 to 651756d Compare July 8, 2020 03:44

Shazwazza changed the base branch from v8/dev to v8/8.6 July 8, 2020 03:44

Shazwazza added 7 commits July 8, 2020 14:51

comments

a947fa3

comments

7819d1a

comments

7590161

readability

b80dc8f

Adds Load Test controller to test data project (instead of just being…

f0dea44

… stored in my personal repo)

removes comments and no need for private method

384531e

Fixes our sql azure transient fault detection to be inline with curre…

e175717

…nt standards, adds a scope for the health check schedule tasks

This was linked to issues Jul 8, 2020

Scheduled content leads to SQL error when rebuilding cache #8392

Closed

8.6.1 - Azure Web App - Indexes disappear completely #8215

Closed

refactors scheduled publishing logic - splits into 2x scopes/2x trans…

df61f30

…, only take a write lock when necessary

Shazwazza marked this pull request as ready for review July 8, 2020 08:17

bergmania reviewed Jul 8, 2020

View reviewed changes

src/Umbraco.Web.UI/Umbraco.Web.UI.csproj Outdated Show resolved Hide resolved

clausjensen reviewed Jul 8, 2020

View reviewed changes

src/Umbraco.Core/Services/Implement/ContentService.cs Outdated Show resolved Hide resolved

src/Umbraco.Core/Services/Implement/ContentService.cs Outdated Show resolved Hide resolved

src/Umbraco.Web/Compose/DatabaseServerRegistrarAndMessengerComponent.cs Show resolved Hide resolved

clausjensen added the release/8.6.4 label Jul 8, 2020

Shazwazza added 2 commits July 9, 2020 13:40

revert ssl port

6dfb31f

comments

15b9031

nul800sebastiaan added the type/bug label Jul 9, 2020

nul800sebastiaan added this to the sprint140 milestone Jul 9, 2020

Merge branch 'v8/8.6' into v8/bugfix/sqlmaindom-updates

191060f

This was referenced Jul 9, 2020

Scheduled content leads to SQL error when rebuilding cache #8392

Closed

8.6.1 - Azure Web App - Indexes disappear completely #8215

Closed

Shazwazza merged commit 892e3a4 into v8/8.6 Jul 9, 2020

Shazwazza deleted the v8/bugfix/sqlmaindom-updates branch July 9, 2020 07:13

nul800sebastiaan mentioned this pull request Jul 9, 2020

Rebuilding indexes can fail with a NullReferenceException when trying to reindex forms data umbraco/Umbraco.Forms.Issues#373

Closed

TimZander mentioned this pull request Jul 14, 2020

umbracoLock timeout exceeded frequently for members #8433

Closed

joepvtl mentioned this pull request Jul 21, 2020

Sql Lock Timeout on save #8459

Closed

Shazwazza mentioned this pull request Aug 13, 2020

PostSave error on most actions (save, publish, move) #8625

Closed

Shazwazza mentioned this pull request Dec 14, 2020

SqlMainDomLock will stop listening if Sql Server connection terminates #9543

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjustments for SqlMainDomLock and others to make azure operations more resilient #8398

Adjustments for SqlMainDomLock and others to make azure operations more resilient #8398

Shazwazza commented Jul 8, 2020 •

edited

Loading

bergmania left a comment

Adjustments for SqlMainDomLock and others to make azure operations more resilient #8398

Adjustments for SqlMainDomLock and others to make azure operations more resilient #8398

Conversation

Shazwazza commented Jul 8, 2020 • edited Loading

bergmania left a comment

Choose a reason for hiding this comment

Shazwazza commented Jul 8, 2020 •

edited

Loading