-
Notifications
You must be signed in to change notification settings - Fork 39
ReleaseController application picks the tip of the release chain as contender #166
Conversation
@kanatohodets @juliogreff Hey folks! Mind dropping a line on this one as we all had concerns regarding this change, it would be great to keep it tracked. Thanks! |
My original worry on this change was actually unfounded (I took a code comment to its word, but that has since been addressed). I'm actually quite comfortable with the spirit of the change now that my misunderstanding has been cleared up. There is a quite unlikely scenario that might be worth considering though. Imagine that a healthy release Honestly I can't think of a good way around this (maybe preventing new releases when there's an incomplete contender?), so this is just putting it out there. As I said earlier, the idea behind the current state of this PR is sound to me now. |
This change addresses the concern raised in #166 (comment) and enforces release equality check by their name, not pointer-based. Signed-off-by: Oleg Sidorov <[email protected]>
416cb0e
to
6f30009
Compare
e8bec08
to
a580c1a
Compare
3897928
to
66b122d
Compare
66b122d
to
d1d5d79
Compare
The current implementation has a slightly relaxed algorithm of finding contender and incumbent in the chain of releases: in particular, the latest (sorted by generation) scheduled release is identified as a contender and the latest complete one is our incumbent. This commit changes the approach and forces release controller to start using releaseutil methods for finding incumbent and contender. The incumbent finding is still very naive as it's relying on release status to figure out whether it's complete or not. On the improvement side, we got rid of using the most dangerous condition in release object: scheduled criteria. From now on, we always pick the tip of the release chain. Some extra fixes in release controller and chart repo: * Chart repo returns ChartFetchFailureError if fetch failed * Release controller: moved away from 2-step cluster selection and scheduling: 1 pass now * ChartRepoInternalError is a recognised error now * Application utils: incumbent equality to contender is name-based Signed-off-by: Oleg Sidorov <[email protected]>
d1d5d79
to
3ef70e6
Compare
This may or may not make conceptual sense (it does to me, but I could not find out why choosing clusters and scheduling a release was two separate steps in the first place, although after #166[1] I'm fairly convinced this was just a technical artifact), but it sure is convenient: we move all the error handling during the scheduling step to a single chunk of code. This fixes an issue where errors in ChooseClusters were not reflected in any condition, making the Release object not change during the sync, and therefore not triggering any events, being essentially invisible to users. As a bonus, I restored the actual testing part of this in the unit tests. We were previously just checking that ChooseClusters didn't trigger any updates, without actually checking if it was doing the right thing (choosing clusters). [1] https://github.com/bookingcom/shipper/pull/166/files#diff-caffe52421149f1f8d77a0e7c749867dR327-R341
This may or may not make conceptual sense (it does to me, but I could not find out why choosing clusters and scheduling a release was two separate steps in the first place, although after #166[1] I'm fairly convinced this was just a technical artifact), but it sure is convenient: we move all the error handling during the scheduling step to a single chunk of code. This fixes an issue where errors in ChooseClusters were not reflected in any condition, making the Release object not change during the sync, and therefore not triggering any events, being essentially invisible to users. As a bonus, I restored the actual testing part of this in the unit tests. We were previously just checking that ChooseClusters didn't trigger any updates, without actually checking if it was doing the right thing (choosing clusters). [1] https://github.com/bookingcom/shipper/pull/166/files#diff-caffe52421149f1f8d77a0e7c749867dR327-R341
This may or may not make conceptual sense (it does to me, but I could not find out why choosing clusters and scheduling a release was two separate steps in the first place, although after #166[1] I'm fairly convinced this was just a technical artifact), but it sure is convenient: we move all the error handling during the scheduling step to a single chunk of code. This fixes an issue where errors in ChooseClusters were not reflected in any condition, making the Release object not change during the sync, and therefore not triggering any events, being essentially invisible to users. As a bonus, I restored the actual testing part of this in the unit tests. We were previously just checking that ChooseClusters didn't trigger any updates, without actually checking if it was doing the right thing (choosing clusters). [1] https://github.com/bookingcom/shipper/pull/166/files#diff-caffe52421149f1f8d77a0e7c749867dR327-R341
The current implementation has a slightly relaxed algorithm of finding
contender and incumbent in the chain of releases: in particular, the
latest (sorted by generation) scheduled release is identified as a
contender and the latest complete one is our incumbent.
This commit changes the approach and forces release controller to start
using releaseutil methods for finding incumbent and contender. The
incumbent finding is still very naive as it's relying on release status
to figure out whether it's complete or not. On the improvement side, we
got rid of using the most dangerous condition in release object:
scheduled criteria. From now on, we always pick the tip of the release
chain.
Signed-off-by: Oleg Sidorov [email protected]