-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor node draining racing avoid condition #130
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,6 +18,10 @@ import ( | |
"time" | ||
|
||
"github.com/golang/glog" | ||
mcfgv1 "github.com/openshift/machine-config-operator/pkg/apis/machineconfiguration.openshift.io/v1" | ||
daemonconsts "github.com/openshift/machine-config-operator/pkg/daemon/constants" | ||
mcclientset "github.com/openshift/machine-config-operator/pkg/generated/clientset/versioned" | ||
mcfginformers "github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions" | ||
"golang.org/x/time/rate" | ||
corev1 "k8s.io/api/core/v1" | ||
v1 "k8s.io/api/core/v1" | ||
|
@@ -32,18 +36,15 @@ import ( | |
"k8s.io/client-go/kubernetes" | ||
listerv1 "k8s.io/client-go/listers/core/v1" | ||
"k8s.io/client-go/tools/cache" | ||
"k8s.io/client-go/tools/leaderelection" | ||
"k8s.io/client-go/tools/leaderelection/resourcelock" | ||
"k8s.io/client-go/util/workqueue" | ||
"k8s.io/kubectl/pkg/drain" | ||
|
||
// "k8s.io/client-go/kubernetes/scheme" | ||
sriovnetworkv1 "github.com/k8snetworkplumbingwg/sriov-network-operator/api/v1" | ||
snclientset "github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/client/clientset/versioned" | ||
sninformer "github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/client/informers/externalversions" | ||
"github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/utils" | ||
mcfgv1 "github.com/openshift/machine-config-operator/pkg/apis/machineconfiguration.openshift.io/v1" | ||
daemonconsts "github.com/openshift/machine-config-operator/pkg/daemon/constants" | ||
mcclientset "github.com/openshift/machine-config-operator/pkg/generated/clientset/versioned" | ||
mcfginformers "github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions" | ||
) | ||
|
||
const ( | ||
|
@@ -124,7 +125,7 @@ const ( | |
var namespace = os.Getenv("NAMESPACE") | ||
var pluginsPath = os.Getenv("PLUGINSPATH") | ||
|
||
// writer implements io.Writer interface as a pass-through for klog. | ||
// writer implements io.Writer interface as a pass-through for glog. | ||
type writer struct { | ||
logFunc func(args ...interface{}) | ||
} | ||
|
@@ -791,27 +792,61 @@ func (dn *Daemon) getNodeMachinePool() error { | |
return fmt.Errorf("getNodeMachinePool(): Failed to find the MCP of the node") | ||
} | ||
|
||
func (dn *Daemon) getDrainLock(ctx context.Context, done chan bool) { | ||
var err error | ||
|
||
lock := &resourcelock.LeaseLock{ | ||
LeaseMeta: metav1.ObjectMeta{ | ||
Name: "config-daemon-draining-lock", | ||
Namespace: namespace, | ||
}, | ||
Client: dn.kubeClient.CoordinationV1(), | ||
LockConfig: resourcelock.ResourceLockConfig{ | ||
Identity: dn.name, | ||
}, | ||
} | ||
|
||
// start the leader election | ||
leaderelection.RunOrDie(ctx, leaderelection.LeaderElectionConfig{ | ||
Lock: lock, | ||
ReleaseOnCancel: true, | ||
LeaseDuration: 5 * time.Second, | ||
RenewDeadline: 3 * time.Second, | ||
RetryPeriod: 1 * time.Second, | ||
Callbacks: leaderelection.LeaderCallbacks{ | ||
OnStartedLeading: func(ctx context.Context) { | ||
glog.V(2).Info("getDrainLock(): started leading") | ||
for { | ||
time.Sleep(3 * time.Second) | ||
if dn.drainable { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Outside of any concern for this PR because this pattern was here before this PR - but have you folks ever seen nodes getting stuck on this condition? It could happen if a node reboots and doesn't startup and daemonset is unable to update its draining status. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That is intentional. We don't want a configuration mistake to break more nodes. If users encounter such a problem, they'd better do some troubleshooting to find out why the node cannot come back. |
||
glog.V(2).Info("getDrainLock(): no other node is draining") | ||
err = dn.annotateNode(dn.name, annoDraining) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. using this mechanism, do we still need to annotate the node with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That is the trick. The leader election mechanism requires the leader to keep updating the Lease object. But in our case, the node may reboot itself, then lose leadership. So I use a 2 layers lock here. The node can only start draining with 2 conditions: 1) it becomes the leader 2) no other node is draining which is indicated by the annotation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see, thanks for clarifying There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @pliurh mind mentioning this in the commit message ? so its clear in the commit message how this mechanism is used to control node draining |
||
if err != nil { | ||
glog.Errorf("getDrainLock(): Failed to annotate node: %v", err) | ||
continue | ||
} | ||
done <- true | ||
return | ||
} | ||
glog.V(3).Info("getDrainLock(): other node is draining, wait...") | ||
} | ||
}, | ||
OnStoppedLeading: func() { | ||
glog.V(2).Info("getDrainLock(): stopped leading") | ||
}, | ||
}, | ||
}) | ||
} | ||
|
||
func (dn *Daemon) drainNode(name string) error { | ||
glog.Info("drainNode(): Update prepared") | ||
var err error | ||
|
||
ctx, cancel := context.WithCancel(context.TODO()) | ||
defer cancel() | ||
// wait a random time to avoid all the nodes drain at the same time | ||
time.Sleep(wait.Jitter(3*time.Second, 3)) | ||
wait.JitterUntil(func() { | ||
if !dn.drainable { | ||
glog.V(2).Info("drainNode(): other node is draining") | ||
return | ||
} | ||
glog.V(2).Info("drainNode(): no other node is draining") | ||
err = dn.annotateNode(dn.name, annoDraining) | ||
if err != nil { | ||
glog.Errorf("drainNode(): Failed to annotate node: %v", err) | ||
return | ||
} | ||
cancel() | ||
}, 3*time.Second, 3, true, ctx.Done()) | ||
|
||
done := make(chan bool) | ||
go dn.getDrainLock(ctx, done) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we run There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The |
||
<-done | ||
|
||
if utils.ClusterType == utils.ClusterTypeOpenshift { | ||
mcpInformerFactory := mcfginformers.NewSharedInformerFactory(dn.mcClient, | ||
|
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at leaderelection docs:
is it not an issue ? i think what we are doing here is considered fencing
how will the system behave when there is only one endpoint trying to take the lead on LeaseLock ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using endpoint, I use Lease API for leader election here. I don't think that statement applies. As all the clients race for the same Lease object. So I don't think there could be more than one acting as leader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK thanks for explaining