-
Notifications
You must be signed in to change notification settings - Fork 741
*: implement configurable backup timeout #1908
*: implement configurable backup timeout #1908
Conversation
switch spec.StorageType { | ||
case api.BackupStorageTypeS3: | ||
bs, err := handleS3(b.kubecli, spec.S3, spec.EtcdEndpoints, spec.ClientTLSSecret, b.namespace) | ||
bs, err := handleS3(b.kubecli, spec.S3, backupTimeout, spec.EtcdEndpoints, spec.ClientTLSSecret, b.namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rethink about it.
Instead of defining the timeout to be "saving backup", we should define the timeout to be the entire process. Each type could spend different amount of time doing other work not on saving backup. As a user, I might just define 60s and want the backup to be made around that limit without caring about getting secret, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In code:
ctx, cancel := context.WithTimeout(context.Background(), backupTimeout)
handleS3(ctx, ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hongchaodeng yeah, that's my original thinking. However, the issue with an timeout for entire process is that kube client doesn't take-in context so I can't include kubecli activities such as retrieving secret as part of the total timeout. Hence, the Timeout
only applies to retrieving+saving backup.
39b90a6
to
767a19d
Compare
k8s client upstream issue ref: kubernetes/kubernetes#46503 It also looks like in the REST API level client-go provides context support: https://github.com/kubernetes/client-go/blob/4def1285ff0e4d1fee7cc9e2684ef9923dca591f/rest/request.go#L411 There is a default 30s Dialer.Timeout: My current conclusion is that this is not limited to backup. This is a general issue we need to work around or solve in upstream. Otherwise if it keeps hanging, anything using client-go will have problems. My current suggest would be to leave a TODO signifying the current limitation of timeout. |
1a1d508
to
7ce0bf9
Compare
7ce0bf9
to
89bb119
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The BackupWriter.Write() would also need to take in a context.
For S3 upload, there is a method called UploadWithContext().
For ABS, there is a Timeout int
but not context. Leave a TODO for now?
@rjtsdl Can you also provide some thoughts?
@@ -84,7 +84,7 @@ type BackupSource struct { | |||
|
|||
// BackupPolicy defines backup policy. | |||
type BackupPolicy struct { | |||
// timeout is the maximal time of retriving plus saving an etcd backup. | |||
// timeout is the maximal allowed time in second of the entire backup process. | |||
Timeout int64 `json:"timeout,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a second thought, maybe we should rename this TimeoutInSecond or TimeoutSecond
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TimeoutInSecond
seems good.
pkg/backup/backup_manager.go
Outdated
mapEps := make(map[string]*clientv3.Client) | ||
var maxClient *clientv3.Client | ||
maxRev := int64(0) | ||
errors := make([]string, 0) | ||
for _, endpoint := range endpoints { | ||
// TODO: update clientv3 to 3.2.x and thenuse ctx as in clientv3.Config. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thenuse -> then use
a0799fb
to
74da62a
Compare
74da62a
to
aabcd45
Compare
all fixed. PTAL cc/ @hongchaodeng |
After offline discussion, we are going to add |
Let's move forward this and change BackupWriter in following PR. |
The backup timeout was made configurable in coreos#1908 but is not documented. This adds a note about the default value and makes it clear this is an optional feature.
ref: #1906