-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Canary Deployments Proposal #3837
Comments
Thank you for the useful proposal! I have two questions:
|
The idea for the Canary deployments is to allow users to create a new app instance with new code and target it to see if everything is good. Afterward, it will continue with a
My knowledge in CF is not too vast, but I am assuming that by introducing the |
I like the proposal 👍 Two questions came to my mind: Does the canary instance participate in app routing? I guess so similar to the standard rolling update. Might be a nice (future) enhancement to offer an option so that the canary instance does not participate in app routing but can only be reached via instance specific routing (or use a separate canary route) until it was successfully evaluated (i.e. the deployment gets continued). How does the canary strategy behave when the canary instance or later one of the other instances fails on deploying? Will the deployment get canceled = switch back to the last droplet in a non-ZDM way as for canceling a rolling deployment )? |
yes, I agree on this. |
@Gerg and I have been chatting a bit about this as well. This could be possible with a future enhancement to route destinations to also support an optional We were thinking mostly in terms of supporting a dedicated "canary route" that only routes to the canary and leaving the main route alone, but that original route would then direct traffic to all process instances still. To do what you're suggesting we'd need to support You may also be able to do something by flagging a process as a canary or something, but that solution feels a little overfit to this problem. |
Having canaries only reachable via dedicated validation routes makes sense as a feature. I agree with Tim that it will probably be relatively easy to make a dedicated validation route for canaries, but more difficult to exclude them from the process's normal route. |
Using UpdateDesiredLRP we theoretically should be able to isolate a canary instance to a separate route, and update it once the deployment is promoted. I can see it be handy to have (for example) some easy way of defining a custom canary route in the Deployment create request:
Of course the problem with this is that it's unclear how it would mesh with CCNG internal routing modeling. Providing another field like In either case, I feel as though it would be a little confusing to have Route Destinations constantly being updated with new processes, instead of a single RouteDestination that doesn't disappear between deployments and users can clearly identify as what is being used for their canary routing. Maybe having Canary Deployments use a special Until we figure this out, canary instances will participate in app routing. It's not ideal, but hopefully we can drive out a solution soon. cc @stephanme |
CF Canary Deployments
Authors: @sethboyles @pivotalgeorge @joaopapereira
Reviewers: @Gerg @tcdowney @Samze
CF Canary Deployments
Draft Proposal
Authors: Seth Boyles George Gelashvili Joao De Almeida Pereira
Reviewers: Greg Cobb
Feature goals
Canary Deployments will allow App Developers to create a new Application Instance with a new version of their application. By monitoring the Canary Instance, App Developers will be able to ensure the reliability of this Canary Instance before promoting the new version to the rest of their application instances.
How to Use It
User Workflow
Basic Usage
Starting a Canary Deployment
App Developers will be able to push an application with Canary Deployments by using the
--strategy
flag on the CLI, similar to Rolling Deployments:(this flag will also be available for other actions like
restart
andrestage
)Once the Canary Deployment has brought up the canary instance, the CLI will exit.
Observing the Canary Instance
App Developers will be able to monitor the canary instance’s status by:
monitoring app logs by a tag identifying which logs originate from the Canary Instance
routing requests to the canary instance directly
Promoting the New Version
If the App Developer determines that the Canary Instance is reliable and wants to promote the new app version to all instances, they would execute the
cf continue-deployment
command:Cloud Controller will then promote the rest of the Application Instances to the new app version.
Canceling the Canary Deployment
If the App Developer determines that the Canary Instance is NOT reliable, they would execute the
cf cancel-deployment
command:Cloud Controller will then tear down the Canary Instance.
Technical Behavior Overview
Upon creation, a Canary Deployment brings up a single ‘Canary Instance’ of the new process.
The Deployment will pause, awaiting external evaluation.
An App Operator (Admin, Space Developer, or Space Supporter) will indicate that the Canary Instance has a) passed evaluation or b) failed evaluation
If the Canary Instance has failed evaluation, the App Operator will Cancel the Deployment to rollback to the previous revision
If the Canary Instance has passed evaluation, the App Operator will Continue the Deployment to promote the new revision
CF CLI
Initiating a Canary Deployment
Currently the CF CLI supports passing
--strategy=rolling
option to the push, restage, and restart commands to use a Rolling Deployment.Similarly, adding support for passing
canary
to the--strategy
flag would allow App Operators to deploy with a Canary Deployment.Like Rolling Deployments, the CF CLI will poll CAPI for updates on the deployment’s progress. Unlike with Rolling Deployments, where the CLI waits until the Deployment reaches the
FINALIZED
status before exiting, the CLI will instead wait until the Canary Deployment has reached thePAUSED
status. Once the deployment is paused, the CLI will prompt the user to call thecontinue-deployment
command and exit.Continuing a Canary Deployment
Upon executing the
cf continue-deployment
command, the CLI will call the deployment’s continue action and the Deployment will proceed until completion like a Rolling Deployment.Surfacing Canary Deployment Status
App operators will be able to discover if an app is currently PAUSED during a Canary Deployment by calling the
cf app
command:Additionally, this might be an opportune time to add other Deployment information with the
cf app,
(such as Rolling Deployments, or Canary Deployments pre/post the PAUSED step)CAPI
Creating a Canary Deployment
Creating a Canary Deployment will use the strategy property of the Deployment resource. Instead of the value
rolling
(which currently is the only valid value), clients will set the value tocanary
to create a Canary Deployment.Canary Deployment JSON example:
Upon creation, a Canary Deployment will immediately bring up a single Canary Instance of the app’s new revision or droplet.
Monitoring Progress of a Canary Deployment
A new
status.reason
on the Deployment object,PAUSED
, will be introduced to track the Canary Deployment’s state as the Canary instance is evaluated.(see The Deployment Object in the V3 apidocs for more information on the
status
field)Once in the
PAUSED
state, Canary Deployments will remainPAUSED
until an App Operator has indicated external evaluation has passed and the deployment is ready to proceed.Initially there will be no timeout–Canary Deployments can remain
PAUSED
indefinitely. See [Configurable Timeout] under [Possible Future Enhancements]Promoting Canary Deployments
Once the App Operator has determined they would like to promote the Canary Deployment, they will call an action endpoint (see action to cancel a deployment for an existing action)
Once the Canary Deployment’s continue action has been called, the Deployment will transition from
PAUSED
toDEPLOYING
The Canary Deployment will proceed similar to a Rolling Deployment (that is, 1 new instance will be brought up and 1 old instance will be brought down in serial, repeated with no further pausing until the Deployment is complete).
Canceling Canary Deployments
App Operators will be able to use the existent Cancel Deployment API action to rollback a Canary Deployment with current status
ACTIVE
and reasonPAUSED.
Supersedence of Canary Deployments
Like Rolling Deployments, Canary Deployments can be superseded by a Deployment created before the Canary Deployment has finished.
If a Canary Deployment has
status.value
ofACTIVE
, then the Deployment can be superseded, even if thestatus.reason
isPAUSED
.Possible Future Enhancements
While the above proposal is kept as feature-minimal as possible while meeting the needs of a basic Canary Deployment, App Operators may eventually expect more control over their Deployment strategies. The following are potential ways Canary Deployments and Deployments in general can be enhanced.
To support various configurable options specific to the deployment
strategy
, a newoptions
property could be added to the Create Deployment request:Configurable Number of Canary Steps
App Operators may wish to perform multiple evaluations of a Canary Deployment.
A Canary Deployment with a
step
value of 3 would transition toPAUSED
3 times throughout the entire rollout. The Canary Deployment would require the App Operator to call thecontinue
action 3 times before fully promoting the canary.A
step
value ofNULL
would require the App Operator to evaluate the entire rollout.Configurable Number of Canary Instances per Step
An
instances_per_step
property would allow multiple Canary Instances to be brought up before the Deployment isPAUSED
for evaluation.Configurable Step Weights (alternative to Canary Steps/Canary Instances per Step)
A single configurable value,
stepWeights
, could be an alternative to configuringinstances_per_step
andsteps.
A Canary Deployment with the above
step_weights,
would roll out 20% of instances, then 40% (total), 50%, 100%, pausing at each step for evaluation.Configurable Max-in-Flight
Note: break this out into new document
Orthogonal to Canary Deployments,
max_in_flight
is also applicable to Rolling Deployments. A Deployment withmax_in_flight
of 3 would simultaneously bring up 3 new instances at once, and tear down one old instance as each new instance is brought up.This, however, is complicated by the Canary Deployments
PAUSED
state–would the teardown of instances wait until after the Deployment’scontinue
action has been called?NOTE: Distinction between ‘Instances per Step’ and ‘Max-in-Flight’
instances_per_step
andmax_in_flight
differ in purpose/behavior:max_in_flight
: number of instances CC will request Diego to bring up/down at once. (a value that could be applied to both Rolling Deployments and Canary Deployments)instances_per_step
: number of instances to rollout before pausing for evaluation (A Canary Deployment specific value)A Canary Deployment with
instances_per_step
of 10, butmax_in_flight
of 1, would create a slow rollout that paused after 10 canary instances were brought up.Configurable Timeout
An optional configurable timeout property named
step_timeout
could be added to the Deployment resource:If the timeout is reached without the Canary Deployment having been progressed via the “continue” action endpoint, the deployment would automatically be canceled and rolled back to the previous revision (i.e. the single canary instance will be taken down)
The name
step_timeout
is chosen as opposed totimeout
to clarify the timeout is not a generic timeout that could apply to the entire deployment lifecycle, or to other deployment types, likerolling.
Support for automatic evaluation of SLOs
Automatic rollback of a Canary deployment based on app metrics such as HTTP request success rate, response time, or other or custom metrics will likely require large cross CF-component changes to support.
Deployment Specific Routing
Currently App Instance routing does not work with multiple processes. To allow for such features as keeping a subset of user sessions only on the old/new deployment instances, we would need to fix instance-based routing and expand it to support instances from different processes.
Mirroring of Idempotent Requests
Traffic mirroring (i.e. mirror traffic to from each incoming request, sending one request to the new app version and one to the old, as a way of measuring the new version without impacting user experience) would require to ability to route to individual app instances and also likely require large cross CF-component changes to support.
The text was updated successfully, but these errors were encountered: