Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harden the reconciliation loop to work best with the APIserver #29

Merged
merged 6 commits into from
Jul 24, 2020

Conversation

metral
Copy link
Contributor

@metral metral commented Jul 23, 2020

In the build out of the integration testing suite (#28), we uncovered issues in how the finalizer and status updates operated, mainly related to loop failures and as a result, extra loops spawning and causing indeterministic outcomes.

This PR hardens the reconciliation loop based on the findings from running the tests, borrowing approaches from the Helm operator in the operator-sdk examples.


  • (stack-cntlr): do not err on CreateStack if stack already exists

  • fix(stack-cntlr): harden status and finalizers based on Helm operator
    The use of GET and UPDATE API calls to the apiserver operate on optimistic
    locking, which can lead to invalid operations if an object becomes stale. This a feature in k8s, not a bug.

    We harden status updates and finalizers using a best-effort approach of getting
    the latest resource first before attempting to update it.

    However, loops can still fail on outdated objects albeit less frequently, and practically
    all events on a Stack can invoke yet another reconciliation loop. We defensively avoid loops
    where possible, and lean on the resourceGeneration predicate to help lower eventing from invoking more loops than necessary.

    See for more details: https://git.io/JJlcx

  • fix(stack-cntlr): retry HTTP 409s optionally, and requeue HTTP 404s


@metral metral requested review from lukehoban and lblackstone July 23, 2020 00:06
@metral metral changed the title Harden the reconciliation loop to work best with the APIserver and stack updates Harden the reconciliation loop to work best with the APIserver Jul 23, 2020
Copy link
Member

@lblackstone lblackstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall

pkg/controller/stack/stack_controller.go Outdated Show resolved Hide resolved
pkg/controller/stack/stack_controller.go Outdated Show resolved Hide resolved
@metral metral requested a review from lblackstone July 24, 2020 04:19
metral added 3 commits July 24, 2020 04:20
The use of GET and UPDATE API calls to the apiserver operate on optimistic
locking, which can lead to invalid operations if an object becomes stale. This a feature in k8s, not a bug.

We harden status updates and finalizers using a best-effort approach of getting
the latest resource first before attempting to update it.

However, loops can still fail on outdated objects albeit less frequently, and practically
all events on a Stack can invoke yet another reconciliation loop. We defensively avoid loops
where possible, and lean on the resourceGeneration predicate to help lower eventing from invoking more loops than necessary.

See for more details: https://git.io/JJlcx
@metral metral force-pushed the metral/rework-reconcile-loop branch 2 times, most recently from e4afee8 to 271c098 Compare July 24, 2020 15:20
pkg/controller/stack/stack_controller.go Outdated Show resolved Hide resolved
@metral metral force-pushed the metral/rework-reconcile-loop branch from 271c098 to 532533f Compare July 24, 2020 15:32
@metral metral merged commit 3b48172 into master Jul 24, 2020
@pulumi-bot pulumi-bot deleted the metral/rework-reconcile-loop branch July 24, 2020 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cleanup temp workdir for stacks in the operator
2 participants