ReleaseController application picks the tip of the release chain as contender #166

osdrv · 2019-08-27T11:38:14Z

The current implementation has a slightly relaxed algorithm of finding
contender and incumbent in the chain of releases: in particular, the
latest (sorted by generation) scheduled release is identified as a
contender and the latest complete one is our incumbent.

This commit changes the approach and forces release controller to start
using releaseutil methods for finding incumbent and contender. The
incumbent finding is still very naive as it's relying on release status
to figure out whether it's complete or not. On the improvement side, we
got rid of using the most dangerous condition in release object:
scheduled criteria. From now on, we always pick the tip of the release
chain.

Signed-off-by: Oleg Sidorov [email protected]

osdrv · 2019-08-29T12:43:13Z

@kanatohodets @juliogreff Hey folks! Mind dropping a line on this one as we all had concerns regarding this change, it would be great to keep it tracked. Thanks!

pkg/util/application/releases.go

pkg/controller/release/release_controller.go

juliogreff · 2019-09-02T13:35:45Z

My original worry on this change was actually unfounded (I took a code comment to its word, but that has since been addressed). I'm actually quite comfortable with the spirit of the change now that my misunderstanding has been cleared up.

There is a quite unlikely scenario that might be worth considering though. Imagine that a healthy release A is being replaced by release B. If B has its target step advanced all the way to the last step in the strategy, but a new release C gets created before B has time to complete (say, because of an emergency hotfix), the incumbent will remain being A, even though it might've lost quite some capacity during the rollout, and maybe in this particular case B could've been a better candidate.

Honestly I can't think of a good way around this (maybe preventing new releases when there's an incomplete contender?), so this is just putting it out there. As I said earlier, the idea behind the current state of this PR is sound to me now.

This change addresses the concern raised in #166 (comment) and enforces release equality check by their name, not pointer-based. Signed-off-by: Oleg Sidorov <[email protected]>

pkg/controller/release/release_controller.go

pkg/chart/repo/repo.go

pkg/controller/release/release_controller.go

pkg/controller/release/release_controller_test.go

The current implementation has a slightly relaxed algorithm of finding contender and incumbent in the chain of releases: in particular, the latest (sorted by generation) scheduled release is identified as a contender and the latest complete one is our incumbent. This commit changes the approach and forces release controller to start using releaseutil methods for finding incumbent and contender. The incumbent finding is still very naive as it's relying on release status to figure out whether it's complete or not. On the improvement side, we got rid of using the most dangerous condition in release object: scheduled criteria. From now on, we always pick the tip of the release chain. Some extra fixes in release controller and chart repo: * Chart repo returns ChartFetchFailureError if fetch failed * Release controller: moved away from 2-step cluster selection and scheduling: 1 pass now * ChartRepoInternalError is a recognised error now * Application utils: incumbent equality to contender is name-based Signed-off-by: Oleg Sidorov <[email protected]>

Fixed

This may or may not make conceptual sense (it does to me, but I could not find out why choosing clusters and scheduling a release was two separate steps in the first place, although after #166[1] I'm fairly convinced this was just a technical artifact), but it sure is convenient: we move all the error handling during the scheduling step to a single chunk of code. This fixes an issue where errors in ChooseClusters were not reflected in any condition, making the Release object not change during the sync, and therefore not triggering any events, being essentially invisible to users. As a bonus, I restored the actual testing part of this in the unit tests. We were previously just checking that ChooseClusters didn't trigger any updates, without actually checking if it was doing the right thing (choosing clusters). [1] https://github.com/bookingcom/shipper/pull/166/files#diff-caffe52421149f1f8d77a0e7c749867dR327-R341

osdrv self-assigned this Aug 27, 2019

osdrv added the bug Something isn't working label Aug 27, 2019

osdrv added this to the release-0.6 milestone Aug 27, 2019

ksurent previously requested changes Aug 30, 2019

View reviewed changes

pkg/util/application/releases.go Outdated Show resolved Hide resolved

juliogreff reviewed Sep 2, 2019

View reviewed changes

pkg/controller/release/release_controller.go Outdated Show resolved Hide resolved

juliogreff reviewed Sep 3, 2019

View reviewed changes

pkg/controller/release/release_controller.go Outdated Show resolved Hide resolved

isutton reviewed Sep 4, 2019

View reviewed changes

osdrv force-pushed the olegs/fix-get-release-pair branch from 416cb0e to 6f30009 Compare September 4, 2019 14:14

osdrv marked this pull request as ready for review September 4, 2019 14:17

osdrv force-pushed the olegs/fix-get-release-pair branch 3 times, most recently from e8bec08 to a580c1a Compare September 5, 2019 09:02

isutton reviewed Sep 5, 2019

View reviewed changes

pkg/chart/repo/repo.go Outdated Show resolved Hide resolved

pkg/chart/repo/repo.go Outdated Show resolved Hide resolved

osdrv force-pushed the olegs/fix-get-release-pair branch 2 times, most recently from 3897928 to 66b122d Compare September 5, 2019 14:19

juliogreff reviewed Sep 12, 2019

View reviewed changes

pkg/controller/release/release_controller.go Show resolved Hide resolved

juliogreff reviewed Sep 12, 2019

View reviewed changes

pkg/controller/release/release_controller_test.go Outdated Show resolved Hide resolved

juliogreff previously approved these changes Sep 12, 2019

View reviewed changes

osdrv dismissed juliogreff’s stale review via d1d5d79 September 12, 2019 12:49

osdrv force-pushed the olegs/fix-get-release-pair branch from 66b122d to d1d5d79 Compare September 12, 2019 12:49

osdrv force-pushed the olegs/fix-get-release-pair branch from d1d5d79 to 3ef70e6 Compare September 12, 2019 15:38

juliogreff approved these changes Sep 12, 2019

View reviewed changes

ghost approved these changes Sep 13, 2019

View reviewed changes

osdrv merged commit c2c2742 into master Sep 13, 2019

osdrv deleted the olegs/fix-get-release-pair branch September 13, 2019 12:18

juliogreff mentioned this pull request Oct 1, 2019

release controller: move cluster choosing to scheduling #211

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ReleaseController application picks the tip of the release chain as contender #166

ReleaseController application picks the tip of the release chain as contender #166

osdrv commented Aug 27, 2019

osdrv commented Aug 29, 2019

juliogreff commented Sep 2, 2019

ReleaseController application picks the tip of the release chain as contender #166

ReleaseController application picks the tip of the release chain as contender #166

Conversation

osdrv commented Aug 27, 2019

osdrv commented Aug 29, 2019

juliogreff commented Sep 2, 2019