Backport of client: allow incomplete allocrunners to be removed on restore into release/1.5.x #19360
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport
This PR is auto-generated from #16638 to be assessed for backporting due to the inclusion of the label backport/1.5.x.
🚨
The person who merged in the original PR is:
@tgross
This person should manually cherry-pick the original PR into a new backport PR,
and close this one when the manual backport PR is merged in.
The below text is copied from the body of the original PR.
If an allocrunner is persisted to the client state but the client stops before task runner can start, we end up with an allocation in the database with allocrunner state but no taskrunner state. This ends up mimicking an old pre-0.9.5 state where this state was not recorded and that hits a backwards compatibility shim. This leaves allocations in the client state that can never be restored, but won't ever be removed either.
Update the backwards compatibility shim so that we fail the restore for the allocrunner and remove the allocation from the client state. Taskrunners persist state during graceful shutdown, so it shouldn't be possible to leak tasks that have actually started. This lets us "start over" with the allocation, if the server still wants to place it on the client.
This work came out of discussions in #16623 where old state was kicking around and making log noise that was not useful to the user and making it harder to debug the real problem.
Overview of commits