Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deployment improvements #4259

Merged
merged 52 commits into from
May 8, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
15c3abf
Initial implementation
dadgar Mar 23, 2018
1627675
Fix tests
dadgar Apr 4, 2018
bcaaa10
Progress deadline in deployment state
dadgar Apr 4, 2018
2bb9ada
Deployment watcher based on deployment having progress deadline
dadgar Apr 4, 2018
54f9e1b
Handle progressed deployments and tests
dadgar Apr 6, 2018
e424a11
Pass through timestamp
dadgar Apr 6, 2018
0e2866d
add latest eval back
dadgar Apr 6, 2018
04a4b1f
Drop file
dadgar Apr 6, 2018
c3b9a9c
Small test fix
dadgar Apr 6, 2018
1050b89
small review feedback fixes
dadgar Apr 10, 2018
2d6264e
rework where time gets set
dadgar Apr 10, 2018
062f236
Use UpdateAllocDesiredTransistion instead of UpsertEval but no transi…
dadgar Apr 6, 2018
91402ed
Set Reschedule from deployment watcher
dadgar Apr 7, 2018
b1df461
Only reschedule allowed deployment allocs
dadgar Apr 8, 2018
be3e3ea
fix reconcile tests
dadgar Apr 8, 2018
011a084
Test fixes
dadgar Apr 9, 2018
b8aa63a
Add test where deployment is marked as complete when done even with f…
dadgar Apr 10, 2018
f952300
Fix typos
dadgar Apr 10, 2018
01fcba1
Fix not enqueuing eval
dadgar Apr 10, 2018
c240e02
change default to 10m and docs
dadgar Apr 10, 2018
eb6a99a
CLI
dadgar Apr 10, 2018
32557a1
Only use DesiredTransition.Reschedule in reconciler when its an activ…
Apr 17, 2018
aab6149
Fix deadlock in deployment watcher when deployment starts with no all…
Apr 19, 2018
334f5fb
better comments and remove commented code
Apr 19, 2018
8be599a
Mark canaries on creation, and unmark on promotion
dadgar Apr 19, 2018
20df5ae
Canary tags structs
dadgar Apr 19, 2018
5c8238c
Ensure canaries tags are interpolated
dadgar Apr 19, 2018
4bad815
remove unnessary merge of DeploymentStatus.Timestamp
dadgar Apr 19, 2018
4c45ca8
vendor testify
dadgar Apr 19, 2018
be30f02
Fix tests
dadgar Apr 19, 2018
ff7b1be
Allow canary count greater than desired
dadgar Apr 20, 2018
588bf68
Test for rescheduling when there are canaries
dadgar Apr 20, 2018
686cff2
canary reschedule test
dadgar Apr 20, 2018
1154ccc
typo: transistion -> transition
schmichael Apr 24, 2018
17c6eb8
consul: support canary tags for services
schmichael Apr 23, 2018
435a6bd
consul: remove services with/without canary tags
schmichael Apr 24, 2018
0e1fb91
Reschedule when we have canaries properly
dadgar Apr 23, 2018
42d3c05
Allow healthy canary deployment to skip progress deadline
dadgar Apr 25, 2018
ca588f9
clarify comment
dadgar Apr 25, 2018
0babfcc
Make sure that task group has a deployment state before using it
Apr 25, 2018
fc099e5
Add test
dadgar Apr 25, 2018
64240e4
consul: change hashed canary bytes
schmichael Apr 26, 2018
83ad99c
Some test fixes to e2e rescheduling tests
Apr 26, 2018
801d147
More e2e test fixes after changes to rescheduling during deployments
Apr 26, 2018
3701ee0
Fix the initial progress deadline calculation when the alloc is inpla…
dadgar Apr 27, 2018
ceafb2b
Update end to end tests to use shorter progress deadlines
Apr 30, 2018
6b3eb8e
Fix typo
May 1, 2018
809c2a9
Fix panic in deployment watcher when deployment is not in the state s…
May 1, 2018
2e393fc
Set modify time for allocs in unit test, and define current time in o…
May 2, 2018
c317c54
Fix test set up to set ModifyTime for alloc
May 3, 2018
083541e
Fix deadlock in deadline timer logic when progress deadline is passed…
May 3, 2018
4bc7db4
newlines in test
May 4, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions api/allocations.go
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,8 @@ type AllocationListStub struct {
// healthy.
type AllocDeploymentStatus struct {
Healthy *bool
Timestamp time.Time
Canary bool
ModifyIndex uint64
}

Expand Down Expand Up @@ -214,6 +216,10 @@ type DesiredTransition struct {
// Migrate is used to indicate that this allocation should be stopped and
// migrated to another node.
Migrate *bool

// Reschedule is used to indicate that this allocation is eligible to be
// rescheduled.
Reschedule *bool
}

// ShouldMigrate returns whether the transition object dictates a migration.
Expand Down
19 changes: 11 additions & 8 deletions api/deployments.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ package api

import (
"sort"
"time"
)

// Deployments is used to query the deployments endpoints.
Expand Down Expand Up @@ -139,14 +140,16 @@ type Deployment struct {

// DeploymentState tracks the state of a deployment for a given task group.
type DeploymentState struct {
PlacedCanaries []string
AutoRevert bool
Promoted bool
DesiredCanaries int
DesiredTotal int
PlacedAllocs int
HealthyAllocs int
UnhealthyAllocs int
PlacedCanaries []string
AutoRevert bool
ProgressDeadline time.Duration
RequireProgressBy time.Time
Promoted bool
DesiredCanaries int
DesiredTotal int
PlacedAllocs int
HealthyAllocs int
UnhealthyAllocs int
}

// DeploymentIndexSort is a wrapper to sort deployments by CreateIndex. We
Expand Down
46 changes: 32 additions & 14 deletions api/jobs.go
Original file line number Diff line number Diff line change
Expand Up @@ -343,26 +343,28 @@ type periodicForceResponse struct {

// UpdateStrategy defines a task groups update strategy.
type UpdateStrategy struct {
Stagger *time.Duration `mapstructure:"stagger"`
MaxParallel *int `mapstructure:"max_parallel"`
HealthCheck *string `mapstructure:"health_check"`
MinHealthyTime *time.Duration `mapstructure:"min_healthy_time"`
HealthyDeadline *time.Duration `mapstructure:"healthy_deadline"`
AutoRevert *bool `mapstructure:"auto_revert"`
Canary *int `mapstructure:"canary"`
Stagger *time.Duration `mapstructure:"stagger"`
MaxParallel *int `mapstructure:"max_parallel"`
HealthCheck *string `mapstructure:"health_check"`
MinHealthyTime *time.Duration `mapstructure:"min_healthy_time"`
HealthyDeadline *time.Duration `mapstructure:"healthy_deadline"`
ProgressDeadline *time.Duration `mapstructure:"progress_deadline"`
AutoRevert *bool `mapstructure:"auto_revert"`
Canary *int `mapstructure:"canary"`
}

// DefaultUpdateStrategy provides a baseline that can be used to upgrade
// jobs with the old policy or for populating field defaults.
func DefaultUpdateStrategy() *UpdateStrategy {
return &UpdateStrategy{
Stagger: helper.TimeToPtr(30 * time.Second),
MaxParallel: helper.IntToPtr(1),
HealthCheck: helper.StringToPtr("checks"),
MinHealthyTime: helper.TimeToPtr(10 * time.Second),
HealthyDeadline: helper.TimeToPtr(5 * time.Minute),
AutoRevert: helper.BoolToPtr(false),
Canary: helper.IntToPtr(0),
Stagger: helper.TimeToPtr(30 * time.Second),
MaxParallel: helper.IntToPtr(1),
HealthCheck: helper.StringToPtr("checks"),
MinHealthyTime: helper.TimeToPtr(10 * time.Second),
HealthyDeadline: helper.TimeToPtr(5 * time.Minute),
ProgressDeadline: helper.TimeToPtr(10 * time.Minute),
AutoRevert: helper.BoolToPtr(false),
Canary: helper.IntToPtr(0),
}
}

Expand Down Expand Up @@ -393,6 +395,10 @@ func (u *UpdateStrategy) Copy() *UpdateStrategy {
copy.HealthyDeadline = helper.TimeToPtr(*u.HealthyDeadline)
}

if u.ProgressDeadline != nil {
copy.ProgressDeadline = helper.TimeToPtr(*u.ProgressDeadline)
}

if u.AutoRevert != nil {
copy.AutoRevert = helper.BoolToPtr(*u.AutoRevert)
}
Expand Down Expand Up @@ -429,6 +435,10 @@ func (u *UpdateStrategy) Merge(o *UpdateStrategy) {
u.HealthyDeadline = helper.TimeToPtr(*o.HealthyDeadline)
}

if o.ProgressDeadline != nil {
u.ProgressDeadline = helper.TimeToPtr(*o.ProgressDeadline)
}

if o.AutoRevert != nil {
u.AutoRevert = helper.BoolToPtr(*o.AutoRevert)
}
Expand Down Expand Up @@ -457,6 +467,10 @@ func (u *UpdateStrategy) Canonicalize() {
u.HealthyDeadline = d.HealthyDeadline
}

if u.ProgressDeadline == nil {
u.ProgressDeadline = d.ProgressDeadline
}

if u.MinHealthyTime == nil {
u.MinHealthyTime = d.MinHealthyTime
}
Expand Down Expand Up @@ -496,6 +510,10 @@ func (u *UpdateStrategy) Empty() bool {
return false
}

if u.ProgressDeadline != nil && *u.ProgressDeadline != 0 {
return false
}

if u.AutoRevert != nil && *u.AutoRevert {
return false
}
Expand Down
98 changes: 53 additions & 45 deletions api/jobs_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -304,9 +304,10 @@ func TestJobs_Canonicalize(t *testing.T) {
},
Services: []*Service{
{
Name: "redis-cache",
Tags: []string{"global", "cache"},
PortLabel: "db",
Name: "redis-cache",
Tags: []string{"global", "cache"},
CanaryTags: []string{"canary", "global", "cache"},
PortLabel: "db",
Checks: []ServiceCheck{
{
Name: "alive",
Expand Down Expand Up @@ -354,13 +355,14 @@ func TestJobs_Canonicalize(t *testing.T) {
JobModifyIndex: helper.Uint64ToPtr(0),
Datacenters: []string{"dc1"},
Update: &UpdateStrategy{
Stagger: helper.TimeToPtr(30 * time.Second),
MaxParallel: helper.IntToPtr(1),
HealthCheck: helper.StringToPtr("checks"),
MinHealthyTime: helper.TimeToPtr(10 * time.Second),
HealthyDeadline: helper.TimeToPtr(5 * time.Minute),
AutoRevert: helper.BoolToPtr(false),
Canary: helper.IntToPtr(0),
Stagger: helper.TimeToPtr(30 * time.Second),
MaxParallel: helper.IntToPtr(1),
HealthCheck: helper.StringToPtr("checks"),
MinHealthyTime: helper.TimeToPtr(10 * time.Second),
HealthyDeadline: helper.TimeToPtr(5 * time.Minute),
ProgressDeadline: helper.TimeToPtr(10 * time.Minute),
AutoRevert: helper.BoolToPtr(false),
Canary: helper.IntToPtr(0),
},
TaskGroups: []*TaskGroup{
{
Expand All @@ -387,13 +389,14 @@ func TestJobs_Canonicalize(t *testing.T) {
},

Update: &UpdateStrategy{
Stagger: helper.TimeToPtr(30 * time.Second),
MaxParallel: helper.IntToPtr(1),
HealthCheck: helper.StringToPtr("checks"),
MinHealthyTime: helper.TimeToPtr(10 * time.Second),
HealthyDeadline: helper.TimeToPtr(5 * time.Minute),
AutoRevert: helper.BoolToPtr(false),
Canary: helper.IntToPtr(0),
Stagger: helper.TimeToPtr(30 * time.Second),
MaxParallel: helper.IntToPtr(1),
HealthCheck: helper.StringToPtr("checks"),
MinHealthyTime: helper.TimeToPtr(10 * time.Second),
HealthyDeadline: helper.TimeToPtr(5 * time.Minute),
ProgressDeadline: helper.TimeToPtr(10 * time.Minute),
AutoRevert: helper.BoolToPtr(false),
Canary: helper.IntToPtr(0),
},
Migrate: DefaultMigrateStrategy(),
Tasks: []*Task{
Expand Down Expand Up @@ -425,6 +428,7 @@ func TestJobs_Canonicalize(t *testing.T) {
{
Name: "redis-cache",
Tags: []string{"global", "cache"},
CanaryTags: []string{"canary", "global", "cache"},
PortLabel: "db",
AddressMode: "auto",
Checks: []ServiceCheck{
Expand Down Expand Up @@ -515,13 +519,14 @@ func TestJobs_Canonicalize(t *testing.T) {
ID: helper.StringToPtr("bar"),
ParentID: helper.StringToPtr("lol"),
Update: &UpdateStrategy{
Stagger: helper.TimeToPtr(1 * time.Second),
MaxParallel: helper.IntToPtr(1),
HealthCheck: helper.StringToPtr("checks"),
MinHealthyTime: helper.TimeToPtr(10 * time.Second),
HealthyDeadline: helper.TimeToPtr(6 * time.Minute),
AutoRevert: helper.BoolToPtr(false),
Canary: helper.IntToPtr(0),
Stagger: helper.TimeToPtr(1 * time.Second),
MaxParallel: helper.IntToPtr(1),
HealthCheck: helper.StringToPtr("checks"),
MinHealthyTime: helper.TimeToPtr(10 * time.Second),
HealthyDeadline: helper.TimeToPtr(6 * time.Minute),
ProgressDeadline: helper.TimeToPtr(7 * time.Minute),
AutoRevert: helper.BoolToPtr(false),
Canary: helper.IntToPtr(0),
},
TaskGroups: []*TaskGroup{
{
Expand Down Expand Up @@ -569,13 +574,14 @@ func TestJobs_Canonicalize(t *testing.T) {
ModifyIndex: helper.Uint64ToPtr(0),
JobModifyIndex: helper.Uint64ToPtr(0),
Update: &UpdateStrategy{
Stagger: helper.TimeToPtr(1 * time.Second),
MaxParallel: helper.IntToPtr(1),
HealthCheck: helper.StringToPtr("checks"),
MinHealthyTime: helper.TimeToPtr(10 * time.Second),
HealthyDeadline: helper.TimeToPtr(6 * time.Minute),
AutoRevert: helper.BoolToPtr(false),
Canary: helper.IntToPtr(0),
Stagger: helper.TimeToPtr(1 * time.Second),
MaxParallel: helper.IntToPtr(1),
HealthCheck: helper.StringToPtr("checks"),
MinHealthyTime: helper.TimeToPtr(10 * time.Second),
HealthyDeadline: helper.TimeToPtr(6 * time.Minute),
ProgressDeadline: helper.TimeToPtr(7 * time.Minute),
AutoRevert: helper.BoolToPtr(false),
Canary: helper.IntToPtr(0),
},
TaskGroups: []*TaskGroup{
{
Expand All @@ -601,13 +607,14 @@ func TestJobs_Canonicalize(t *testing.T) {
Unlimited: helper.BoolToPtr(true),
},
Update: &UpdateStrategy{
Stagger: helper.TimeToPtr(2 * time.Second),
MaxParallel: helper.IntToPtr(2),
HealthCheck: helper.StringToPtr("manual"),
MinHealthyTime: helper.TimeToPtr(1 * time.Second),
HealthyDeadline: helper.TimeToPtr(6 * time.Minute),
AutoRevert: helper.BoolToPtr(true),
Canary: helper.IntToPtr(1),
Stagger: helper.TimeToPtr(2 * time.Second),
MaxParallel: helper.IntToPtr(2),
HealthCheck: helper.StringToPtr("manual"),
MinHealthyTime: helper.TimeToPtr(1 * time.Second),
HealthyDeadline: helper.TimeToPtr(6 * time.Minute),
ProgressDeadline: helper.TimeToPtr(7 * time.Minute),
AutoRevert: helper.BoolToPtr(true),
Canary: helper.IntToPtr(1),
},
Migrate: DefaultMigrateStrategy(),
Tasks: []*Task{
Expand Down Expand Up @@ -642,13 +649,14 @@ func TestJobs_Canonicalize(t *testing.T) {
Unlimited: helper.BoolToPtr(true),
},
Update: &UpdateStrategy{
Stagger: helper.TimeToPtr(1 * time.Second),
MaxParallel: helper.IntToPtr(1),
HealthCheck: helper.StringToPtr("checks"),
MinHealthyTime: helper.TimeToPtr(10 * time.Second),
HealthyDeadline: helper.TimeToPtr(6 * time.Minute),
AutoRevert: helper.BoolToPtr(false),
Canary: helper.IntToPtr(0),
Stagger: helper.TimeToPtr(1 * time.Second),
MaxParallel: helper.IntToPtr(1),
HealthCheck: helper.StringToPtr("checks"),
MinHealthyTime: helper.TimeToPtr(10 * time.Second),
HealthyDeadline: helper.TimeToPtr(6 * time.Minute),
ProgressDeadline: helper.TimeToPtr(7 * time.Minute),
AutoRevert: helper.BoolToPtr(false),
Canary: helper.IntToPtr(0),
},
Migrate: DefaultMigrateStrategy(),
Tasks: []*Task{
Expand Down
5 changes: 3 additions & 2 deletions api/tasks.go
Original file line number Diff line number Diff line change
Expand Up @@ -295,8 +295,9 @@ type Service struct {
Id string
Name string
Tags []string
PortLabel string `mapstructure:"port"`
AddressMode string `mapstructure:"address_mode"`
CanaryTags []string `mapstructure:"canary_tags"`
PortLabel string `mapstructure:"port"`
AddressMode string `mapstructure:"address_mode"`
Checks []ServiceCheck
CheckRestart *CheckRestart `mapstructure:"check_restart"`
}
Expand Down
15 changes: 8 additions & 7 deletions api/tasks_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -252,13 +252,14 @@ func TestTaskGroup_Canonicalize_Update(t *testing.T) {
job := &Job{
ID: helper.StringToPtr("test"),
Update: &UpdateStrategy{
AutoRevert: helper.BoolToPtr(false),
Canary: helper.IntToPtr(0),
HealthCheck: helper.StringToPtr(""),
HealthyDeadline: helper.TimeToPtr(0),
MaxParallel: helper.IntToPtr(0),
MinHealthyTime: helper.TimeToPtr(0),
Stagger: helper.TimeToPtr(0),
AutoRevert: helper.BoolToPtr(false),
Canary: helper.IntToPtr(0),
HealthCheck: helper.StringToPtr(""),
HealthyDeadline: helper.TimeToPtr(0),
ProgressDeadline: helper.TimeToPtr(0),
MaxParallel: helper.IntToPtr(0),
MinHealthyTime: helper.TimeToPtr(0),
Stagger: helper.TimeToPtr(0),
},
}
job.Canonicalize()
Expand Down
5 changes: 4 additions & 1 deletion client/alloc_runner.go
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,8 @@ type AllocRunner struct {
alloc *structs.Allocation
allocClientStatus string // Explicit status of allocation. Set when there are failures
allocClientDescription string
allocHealth *bool // Whether the allocation is healthy
allocHealth *bool // Whether the allocation is healthy
allocHealthTime time.Time // Time at which allocation health has been set
allocBroadcast *cstructs.AllocBroadcaster
allocLock sync.Mutex

Expand Down Expand Up @@ -580,6 +581,7 @@ func (r *AllocRunner) Alloc() *structs.Allocation {
alloc.DeploymentStatus = &structs.AllocDeploymentStatus{}
}
alloc.DeploymentStatus.Healthy = helper.BoolToPtr(*r.allocHealth)
alloc.DeploymentStatus.Timestamp = r.allocHealthTime
}
r.allocLock.Unlock()

Expand Down Expand Up @@ -943,6 +945,7 @@ OUTER:
// If the deployment ids have changed clear the health
if r.alloc.DeploymentID != update.DeploymentID {
r.allocHealth = nil
r.allocHealthTime = time.Time{}
}

r.alloc = update
Expand Down
1 change: 1 addition & 0 deletions client/alloc_runner_health_watcher.go
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,7 @@ func (r *AllocRunner) watchHealth(ctx context.Context) {

r.allocLock.Lock()
r.allocHealth = helper.BoolToPtr(allocHealthy)
r.allocHealthTime = time.Now()
r.allocLock.Unlock()

// If deployment is unhealthy emit task events explaining why
Expand Down
9 changes: 3 additions & 6 deletions client/consul.go
Original file line number Diff line number Diff line change
@@ -1,17 +1,14 @@
package client

import (
"github.com/hashicorp/nomad/client/driver"
cstructs "github.com/hashicorp/nomad/client/structs"
"github.com/hashicorp/nomad/command/agent/consul"
"github.com/hashicorp/nomad/nomad/structs"
)

// ConsulServiceAPI is the interface the Nomad Client uses to register and
// remove services and checks from Consul.
type ConsulServiceAPI interface {
RegisterTask(allocID string, task *structs.Task, restarter consul.TaskRestarter, exec driver.ScriptExecutor, net *cstructs.DriverNetwork) error
RemoveTask(allocID string, task *structs.Task)
UpdateTask(allocID string, existing, newTask *structs.Task, restart consul.TaskRestarter, exec driver.ScriptExecutor, net *cstructs.DriverNetwork) error
RegisterTask(*consul.TaskServices) error
RemoveTask(*consul.TaskServices)
UpdateTask(old, newTask *consul.TaskServices) error
AllocRegistrations(allocID string) (*consul.AllocRegistration, error)
}
Loading