KEP: efficient node heartbeats #2090

wojtek-t · 2018-04-30T07:45:48Z

/assign @dchen1107 @derekwaynecarr
/reviewers @lavalamp @deads2k

/cc @bgrant0607 @yujuhong @liggitt @smarterclayton
@kubernetes/sig-node-api-reviews @kubernetes/sig-api-machinery-api-reviews

liggitt · 2018-04-30T13:37:54Z

keps/0008-node-heartbeat.md

+The underlying root cause is combination of:
+
+- etcd keeping both current state and transaction log with copy-on-write
+- node heartbeats being pontetially very large objects (even though


it seems like it might be worthwhile to investigate and correct this... this is likely the "images present on node" list in node status. would it make more sense to move that out of the object instead?

Images are only one potential problem (yes - we may consider moving them to a separate object, but that doesn't solve the generic problem of "heartbeats are expensive").

The second big problems are Volumes - we will soon support 100+ volumes per VM in GCE and that has exactly the same issue.

So we really should solve this problem in more generic way.

I agree with Wojtek.

Also, the image info is unnecessarily large, regardless where it is stored.

And I want to mitigate the proliferation of APIs.

Incorporated into text

Basically the node object is unbound today. Besides images and volumes, there could be labels and annotations to that API object. And periodically (now 10s) updating makes it very expensive.

liggitt · 2018-04-30T13:38:47Z

keps/0008-node-heartbeat.md

+
+We will use CRD (as opposed to built-in API), because:
+
+- CRDs are `the` way to created new APIs


I don't think this is quite accurate. It doesn't seem proper for a core kubernetes component to depend on the existence of a "custom" resource, which would be required to exist in all clusters to be functional

I'm not the API-expert. I've been discussing it with @bgrant0607 and he explicitly said that it should be done with CRD.
Personally, I don't really have preference here, but I trust that Brian knows what he is saying.

I'm not the API-expert. I've been discussing it with @bgrant0607 and he explicitly said that it should be done with CRD. Personally, I don't really have preference here, but I trust that Brian knows what he is saying.

that doesn't seem proper... node heartbeating isn't an extension, add-on, or optional component... it's fundamental to a kubernetes cluster.

@liggitt - is your concerns only about CRD vs built-in API or more?

To cite Brian on CRDs:

1. CRD is a mechanism for building APIs 2. Most new APIs will at least start as CRDs 3. Most new APIs will stay CRDs forever

But I will let @bgrant0607 to comment more on that.

is your concerns only about CRD vs built-in API or more?

That's my main concern. An API the kubelet depends on is an API the core repo has to define, document, build clients for, and support. Most of those are easier and far more reliable for the kube-apiserver to do as a compiled-in type, rather than a CRD. Given that the schema stability and iteration time is tied to kube releases, I don't see many benefits to building these as a CRD, and see plenty of downsides.

The existing leader election code is in client-go. It could go there, or elsewhere in k/k.

I initially wanted to have this somewhere in k/k. But it's not a strong preference - client-go sounds also fine for me.

Clarified where to put that code in the doc.

@wojtek-t Can you include your experimental benchmark on CRD vs. built-in into this KEP? I am fine with CRD, at least we can start with CRD first. If the performance concern remains, we can graduate that to built-in later.

Technically, so far I only did json vs protobuf benchmark on this particular object with some ~real data. I can create a better benchmark when I'm back from leave in couple days.

@dchen1107 that said - how exactly you want this to be incorporated into this KEP? Just the results? Or maybe link to PR to main repo? Or sth else?

zhouhaibing089 · 2018-04-30T15:00:39Z

keps/0008-node-heartbeat.md

+This proposal presents a proper solution for that problem.
+
+
+Note that currently:


Note that currently by default?

zhouhaibing089 · 2018-04-30T15:06:15Z

keps/0008-node-heartbeat.md

+
+Other notes:
+
+1. Additional advantage of using LeaderElection for that purpose would be the ability


liggitt · 2018-05-01T02:53:49Z

keps/0008-node-heartbeat.md

+   - there was a meaningful change in it (initially we can probably assume that every
+     change is meaningful, including e.g. images on the node)
+   - or it didn’t report it over last `node-status-update-period` seconds
+1. Kubelet creates and periodically updates its own LeaderElection object and frequency


its own LeaderElection object

what name/pattern is reserved for kubelets to use? presumably we don't want to manually configure 5000 kubelets with distinct heartbeat object names

I agree that should be explicitly specified in the proposal. It should be the node name, or possibly trivially derived from the node name, but isolating the resources by namespace should be good enough to not require prefixing IMO.

Added to the doc above

liggitt · 2018-05-01T02:55:41Z

keps/0008-node-heartbeat.md

+
+## Proposal
+
+We propose introducing a new `LeaderElection` CRD with the following schema:


is this a namespaced or cluster-scoped object?

I think it should be namespaced, so that paople would also be able to take advantage of it in their applications.

In that case, I would say that the the object corresponding to a given Kubelet should be:
namespace: kube-system
name: kubelet-<node name>

Does that sound ok?

Yes, it needs to be namespaced.

I'd put them in a new namespace and skip the prefix.

Added to doc.

What would the new namespace be called?
I assume the entire namespace is just scoped to node leases?
I guess we should add it to the set of immortal namespaces.

What would the new namespace be called?

TBD

I assume the entire namespace is just scoped to node leases?

Yes.

I guess we should add it to the set of immortal namespaces.

Yes

liggitt · 2018-05-01T02:59:17Z

keps/0008-node-heartbeat.md

+Cons:
+
+- Introduces a new, narrow-purpose API. Leader election is already used by other
+  components, implemented using annotations on Endpoints and ConfigMaps.


I don't see this as a con. Using a leader election API object for a single actor which doesn't make use of any of the "leader" aspects of it is pretty confusing.

The leader election API works well when you have functionally identical components sharing a single leader election lock object to determine who is the leader. It makes little sense when you have distinct components each given their own object.

I think only the name "leader election" is confusing. This is a Lease API, which is what I'd call it.

The other actor is the node controller, which takes action when the Kubelet doesn't renew its lease in time. It just doesn't acquire the lease.

Also note that my opinion on a "node heartbeat" API hasn't changed since the original proposal (kubernetes/kubernetes#14735) in 2015. In fact, with ~60 APIs now it's even more critical that we introduce new ones with great care.

liggitt · 2018-05-01T03:12:25Z

keps/0008-node-heartbeat.md

+  // Standard object's metadata.
+  // More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata
+  // +optional
+	ObjectMeta metav1.ObjectMeta `json:"metadata,omitempty"`


what indicates the object is connected to a particular node? the name? the spec.holderIdentity?

I think the name. As I mentioned above, the object corresponding to given Kubelet should be:
namespace: kube-system
name: kubelet-

[holderIdentity will probably be the same as name though]

Added that explanation below.

liggitt · 2018-05-01T03:13:21Z

keps/0008-node-heartbeat.md

+
+In the meantime, we will change `NodeController` to treat both updates of NodeStatus
+object as well as updates of the new `LeaderElection` object corresponding to a given
+node as healthiness signal from a given Kubelet. This will make it work for both old


how will the node controller watch/filter to objects that are related to nodes?

I would say we should be watching kube-system namespace.

Or a dedicated namespace

smarterclayton · 2018-05-01T03:54:01Z

keps/0008-node-heartbeat.md

+   of those updates is independent from NodeStatus update frequency.
+
+In the meantime, we will change `NodeController` to treat both updates of NodeStatus
+object as well as updates of the new `LeaderElection` object corresponding to a given


This breaks backwards compatibility for any integration anyone has done to watching node objects for changes. That’s any custom scheduler, and alerting based function, and probably a non trivial amount of custom node integrations built by users.

Treating both as a healthiness signal? Why? Can you clarify that?

We'll still write the status, just less frequently. It's effectively a configuration change.

Also, schedulers shouldn't be watching elapsed time since last heartbeat directly. The node controller encodes cordoning and eviction in taints and conditions.

@smarterclayton We thought about such compatibility issues and this is why we still updated lastHeartbeatTime periodically, but with a increasing period.

resouer · 2018-05-02T15:34:36Z

keps/0008-node-heartbeat.md

+   object (if they relied on frequent NodeStatus changes)
+1. further reduce NodeStatus updates frequency to not less often than once per
+   1 minute.
+   We can’t stop periodically updating NodeStatus as it would be API breaking change,


Just to confirm, today's NodeStatus update will be kept, right?

While in this case, isn't it true that the old consumers may get stale status of Node?

I don't really understand the question here.

We will be updating NodeStatus less frequently (say once every 1m instead every 10s). So NodeStatus will be potentially 1 minute old instead of 10s old.
But if something will change in the status, we will be updating it (please read the proposal carefully).

So the status itself won't be more stale than it is now - it's only "lastHeartbeatTimestamp" that will potentially be older, but users shouldn't really rely on that...

If something material changes in the node status, it will be updated by kubelet, as it is now.

If the node is deemed MIA, the node controller will update the node status and taints, as it does now.

Thanks, I am aware of the status updating design in this doc, but then what would be the API breaking change if we stop the periodically NodeStatus update? It seems harmless.

bgrant0607 · 2018-05-03T16:45:10Z

keps/0008-node-heartbeat.md

+
+Note that currently (by default):
+
+- Lack of NodeStatus update for 40s results in NodeController marking node


Actually, if some other component changed the Ready condition to Unknown, would node controller then set the taint correctly?

If not, it might be useful to decouple that code in node controller and create a way to disable the node controller's status checking so that someone could implement their own probing system if they wanted to.

why 40s and 5min, and are they configurable? I can see use cases in IoT where devices may be disconnected for minutes at a time, but had pods scheduled to them that should remain running.

@PatrickLang its configurable via node-monitor-grace-period and pod-eviction-timeout arguments to node lifecycle controller. its probably best to refer to these timings generically and just reference their defaults.

Changed to mention appropriate settings mentioned by Derek.

Answering Brian`s comment - yes, we should probably decouple that logic. But that's independent from this proposal (we should probably do it even without this proposal), so I will not mention it here to avoid making it to wide/complex.

bgrant0607 · 2018-05-03T16:47:09Z

keps/0008-node-heartbeat.md

+
+We propose introducing a new `LeaderElection` CRD with the following schema:
+```
+type LeaderElection struct {


I would call this Lease.

It's fine that the leader-election library would use it, since that's what that library does

Switched to Lease - that sounds fine to me too.

bgrant0607 · 2018-05-07T21:37:47Z

keps/0008-node-heartbeat.md

+owning-sig: sig-node
+participating-sigs:
+  - sig-scalability
+  - sig-apimachinery


I would add sig-scheduling, due to the image issue

bgrant0607 · 2018-05-07T21:48:43Z

keps/0008-node-heartbeat.md

+
+### Risks and Mitigations
+
+Increasing default frequency of of NodeStatus updates may potential break customers


s/customers/clients/g

wojtek-t

Thanks for comments. PTAL

dchen1107 · 2018-05-10T16:43:15Z

The proposal looks good to me, but I hope @derekwaynecarr can take a look before I apply /lgtm label. He is out for OpenShift Summit this week.

bgrant0607 · 2018-05-11T16:21:56Z

keps/0008-node-heartbeat.md

+- CRDs are `the` way to created new APIs
+- even though CRDs aren't very efficient due to lack of protobuf support now:
+    - performance should be acceptable (because Lease object will be small)
+      <br> TODO(wojtek-t): Working on microbenchmark to prove with data.


Can we simulate the problem in kubemark?

We can simulate the solution (once implemented) with kubemark, but we are not really measuring there e.g. serialization time, which is quite crucial from perspective of it.
But yes - from Kubemark we will know if it regressed our SLOs or not.

bgrant0607 · 2018-05-11T16:22:50Z

keps/0008-node-heartbeat.md

+- even though CRDs aren't very efficient due to lack of protobuf support now:
+    - performance should be acceptable (because Lease object will be small)
+      <br> TODO(wojtek-t): Working on microbenchmark to prove with data.
+    - protobuf support for CRDs is on the roadmap (though not near term)


kubernetes/kubernetes#63677

smarterclayton · 2018-05-11T23:14:52Z

keps/0008-node-heartbeat.md

+  HolderIdentity       string      `json:"holderIdentity"`
+  LeaseDurationSeconds int32       `json:"leaseDurationSeconds"`
+  AcquireTime          metav1.Time `json:"acquireTime"`
+  RenewTime            metav1.Time `json:"renewTime"`


Probably should be microtime (higher precision) except we haven’t gotten that in.

Actually, we have that already in:
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/micro_time.go#L33
And it's already used in new events and in audit logs:
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/apis/audit/v1alpha1/types.go#L131

And I agree we should use it here (probably for both acquire and renew times). Changed.

liggitt · 2018-05-15T17:16:57Z

keps/0008-node-heartbeat.md

+
+## Proposal
+
+We propose introducing a new `Lease` CRD. To make it easily


given where the discussion in #2090 (comment) ended up, I was surprised to see this still proposed as a CRD. Isn't it odd to build core function on top of extension APIs? The things CRDs let you do (easily disable, change schema, validation, etc) are all things I would not want any chance of happening to the resource my nodes used for heartbeating.

@bgrant0607 for more comments

Personally my slight preference was also towards built-in API, but Brian has very strong opinion on using CRD for it and ultimately he is the API reviewer.
That said, as long as this is not yet enabled by default, we will be free to change that. And purely from code-perspective, using CRD vs built-in API doesn't change much.

I agree with @liggitt that using a CRD for this does not seem right. If it's core to the function of Kubernetes, it should be core to the API.

I put this on the agenda for SIG Architecture for tomorrow (the 17th).

Sorry I missed the arch meeting, but 💯 to being in the built-in API

We have just discussed this during SIG Architecture.
Long story short, it seems that we want to pursue with using CRDs for even important APIs, but given that CRDs are not feature-complete it and given how core this object will be (necessary for Kubelets to work correctly), there seem to be a decision to use a built-in API.

The question, that I have now, is - what API groups it should belong then. IIUC, we don't want to add anything new to "core" api group. And I don't really see any other apigroup to be a good match. So should we create a new one?

@bgrant0607 @smarterclayton @liggitt @lavalamp - I would like to hear your suggestions here.

The convincing argument was about risk.

We could (and should) benchmark CRDs, but since they don't have versioning support yet and since we don't know whether we could seamlessly transition storage from a CRD to a built-in API, if we ran into a performance or other problem, it may not be easy to resolve if we started as a CRD.

Yes, don't add any new APIs to the core API group. Besides, we don't want to add a beta-ish API to a v1 group.

We'd need to choose a group name even if it were a CRD.

Conceptually, I'd categorize it as "coordination".

And I believe we've agreed that new groups will be suffixed with .k8s.io.

So that would suggest coordination.k8s.io.

Thanks Brian!

I changed the doc to talk about built-in API (and the coordination.k8s.io api group). And added some explanation for future reference that we lso considered CRDs and why we decided for built-in api.

gyuho · 2018-05-15T21:35:28Z

Thanks for sharing detailed explanation on etcd limits.

Just a note,

Delta compression in etcd would be interesting, but not sure how it will be implemented when etcd also stores Protocol Buffer encoded values. Same for other compression algorithms. We once evaluated transport-layer compression etcd-io/etcd#5186, but not much benefit. Would love to experiment again!

For the near future, I would look into periodic compaction on etcd storage and run defragmentation to reclaim the disk space. There are plans to make this better as well etcd-io/etcd#9222.

/cc @jpbetz @joelegasse

Update: didn't mean to trigger request review bot :0

k8s-ci-robot · 2018-05-15T21:35:29Z

@gyuho: GitHub didn't allow me to request PR reviews from the following users: joelegasse.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

Just note:

Delta compression in etcd would be interesting, but just not sure how it will be implemented when etcd also stores Protocol Buffer encoded values. Same for other compression algorithms. We once evaluated transport-layer compression etcd-io/etcd#5186, but not much benefit. Would love to experiment again!

For the near future, I would look into periodic compaction on etcd storage and run defragmentation to reclaim the disk space. There are plans to make this better as well (etcd-io/etcd#9222).

/cc @jpbetz @joelegasse

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

derekwaynecarr

I have a number of questions that would help to clarify.

It is not clear to me if the kubelet updates node conditions at a frequency less than the node status update frequency in the case of meaningful change. If so, at what frequency?

Its not clear why this is not behind a feature gate of some kind as its evaluated rather than immediately moving to beta.

I also prefer a built-in API because I don't know if there is sufficient validation in place server side in the proposed solution. I don't always trust the kubelet.

EDITED - I prefer a built in API over CRD

derekwaynecarr · 2018-05-16T23:36:10Z

keps/0008-node-heartbeat.md

+
+## Proposal
+
+We propose introducing a new `Lease` CRD. To make it easily


I agree with @liggitt that using a CRD for this does not seem right. If it's core to the function of Kubernetes, it should be core to the API.

derekwaynecarr · 2018-05-16T23:40:25Z

keps/0008-node-heartbeat.md

+
+The Spec is effectively of already existing (and thus proved) [LeaderElectionRecord][].
+The only difference is using `MicroTime` instead of `Time` for better precision.
+That would hopefully allow us go get directly to Beta.


I think whether the API goes directly from inception to beta is orthogonal to the design, but seems to possibly influence the usage of CRDs versus built-in API.

I think there's a question of how we validate the new API other than in Kubemark and other synthetic tests if it starts as alpha and disabled by default.

I think whether the API goes directly from inception to beta is orthogonal to the design,

It's orthogonal to the design, it influences timeline when we can start using it.

but seems to possibly influence the usage of CRDs versus built-in API.

Why? I consider CRD vs built-in API a separate thing from versioning.

CRDs only currently support a single version per api group.

Got it.
That said, with a decision to proceed with built-in api, this part is no longer valid.

derekwaynecarr · 2018-05-16T23:47:12Z

keps/0008-node-heartbeat.md

+We will use that object to represent node heartbeat - for each Node
+there will be corresponding `Lease` object with Name equal
+to Node name in a newly created dedicated namespace (we considered
+using `kube-system` namespace but decided that it's already too overloaded).


I agree a net new namespace is preferred if the node name to lease name mapping is required.
Is the controller responsible for creating the namespace if needed so nodes do not require that permission?
I would prefer the namespace is immortal similar to default, kube-system, etc.

Yes - I completely agree - added to the doc.

derekwaynecarr · 2018-05-16T23:55:28Z

keps/0008-node-heartbeat.md

+1. Kubelet creates and periodically updates its own Lease object and frequency
+   of those updates is independent from NodeStatus update frequency.
+
+In the meantime, we will change `NodeController` to treat both updates of NodeStatus


to clarify, non-renewal of a lease corresponds to NodeReady=ConditionUnknown, but for tainting purposes we still require reading the conditions?

Not sure if I understand - can you please clarify?

i think the updated text lets me answer my own question. non renewal of a lease will cause the node controller to mark the node ready condition to condition unknown. any other conditions source from the node will still be set when there is meaningful change, and the node controller will convert those conditions to taints as usual.

Yes - that is correct.

derekwaynecarr · 2018-05-17T00:03:43Z

keps/0008-node-heartbeat.md

+1. announce that we are going to reduce frequency of NodeStatus updates further
+   and give people 1-2 releases to switch their code to use `Lease`
+   object (if they relied on frequent NodeStatus changes)
+1. further reduce NodeStatus updates frequency to not less often than once per


doesn't this impact how frequently the node can set conditions that inform scheduling?

if a node is not ready, has disk/memory pressure, etc. will a node report that condition at the normal 10s window, and only if there is no "meaningful" change it will report every 1m? i think 1m is far too long, as there are other services that would want to monitor node health status conditions separate from the lease.

That sentence from above is supposed to explain that:

Kubelet is reporting NodeStatus if: - there was a meaningful change in it (initially we can probably assume that every change is meaningful, including e.g. images on the node) - or it didn’t report it over last `node-status-update-period` seconds

So if any condition changed (no matter if the node is ready or not), Kubelet should report its status every 10s.
Only when all conditions (and other stuff) didn't change over last 10s (or sth), we are allowed to not report it up to 1m (or whatever period we will decide for).

@derekwaynecarr Is that more clear now? Do you have any suggestion to make that initial senstence I pasted above more cleaner?

further iterations of this proposal will need to debate what is meaningful or not, but yes, this is clear enough for now.

derekwaynecarr · 2018-05-17T00:08:22Z

keps/0008-node-heartbeat.md

+1. We may consider reducing frequency of NodeStatus updates to once every 5 minutes
+   (instead of 1 minute). That would help with performance/scalability even more.
+   Caveats:
+   - NodeProblemDetector is currently updating (some) node conditions every 1 minute


NPD is one consumer, but the other node conditions (disk pressure, memory pressure) that relate to eviction pressure transition period setting are also important. it seems to me we need the ability to report those condition changes at a shorter interval than how you propose to scale node status update frequency. maybe we need a separate value for condition changes versus other changes.

derekwaynecarr · 2018-05-17T00:10:01Z

keps/0008-node-heartbeat.md

+     Note that if we keep adding node conditions owned by other components, the
+     number of writes of Node object will go up. But that issue is separate from
+     that proposal.
+1. Increasing default frequency of NodeStatus updates may potentially break clients


this is repeated later in the risks section. would just remove it.

derekwaynecarr · 2018-05-17T00:20:18Z

keps/0008-node-heartbeat.md

+
+## Graduation Criteria
+
+This can be immediately promoted to Beta, as the API is effectively a copy of


This seems odd. Are there reasons a feature gate is not chosen as well?

It feels like there are two distinct topics:

graduation criteria of a Lease API

modification of the kubelet and related components

Why is this different than TaintNodesByCondition or TaintBasedEvictions?

Regarding the feature gate: I agree that the kubelet and node controller behavior should be configurable. That should be decoupled from the API.

The API itself is effectively just data. It should be largely equivalent to the long-standing annotation-based implementation on ConfigMap and other resources. There isn't a controller side to the API implementation.

Yeah - that's a good point.
I made that explicit that I was thinking about API (to promote it directly to Beta).

I added a point to put the changes to kubelet/NodeController logic behind feature gate. I think it should be enabled by default once done, but that's probably a separate discussion (though put that into doc).

wojtek-t · 2018-05-17T08:00:01Z

I have a number of questions that would help to clarify.

@derekwaynecarr I clarified those in the comments. TL;DR

It is not clear to me if the kubelet updates node conditions at a frequency less than the node status
update frequency in the case of meaningful change. If so, at what frequency?

Yes - if something changes it will be updated (with the same frequency as it is today - 10s).

Its not clear why this is not behind a feature gate of some kind as its evaluated rather than immediately
moving to beta.

Yes - it should be behind feature gate - I added to the doc. I think it should be enabled by default once implemented (I mean the feature gate for it), but that's a separate discussion.

wojtek-t

PTAL

wojtek-t · 2018-05-18T08:06:03Z

keps/0008-node-heartbeat.md

+
+## Proposal
+
+We propose introducing a new `Lease` CRD. To make it easily


Thanks Brian!

I changed the doc to talk about built-in API (and the coordination.k8s.io api group). And added some explanation for future reference that we lso considered CRDs and why we decided for built-in api.

wojtek-t · 2018-05-18T08:07:24Z

keps/0008-node-heartbeat.md

+
+The Spec is effectively of already existing (and thus proved) [LeaderElectionRecord][].
+The only difference is using `MicroTime` instead of `Time` for better precision.
+That would hopefully allow us go get directly to Beta.


Got it.
That said, with a decision to proceed with built-in api, this part is no longer valid.

derekwaynecarr

a few minor nits, but this is good for me.

/lgtm
/approve

derekwaynecarr · 2018-05-21T14:55:39Z

keps/0008-node-heartbeat.md

+As a result, we decided to proceed with built-in API.
+
+
+We this new API in place, we will change Kubelet so that:


Fixed in #2172

derekwaynecarr · 2018-05-21T14:58:14Z

keps/0008-node-heartbeat.md

+1. Kubelet creates and periodically updates its own Lease object and frequency
+   of those updates is independent from NodeStatus update frequency.
+
+In the meantime, we will change `NodeController` to treat both updates of NodeStatus


i think the updated text lets me answer my own question. non renewal of a lease will cause the node controller to mark the node ready condition to condition unknown. any other conditions source from the node will still be set when there is meaningful change, and the node controller will convert those conditions to taints as usual.

derekwaynecarr · 2018-05-21T14:59:50Z

keps/0008-node-heartbeat.md

+
+Once all the code changes are done, we will:
+
+1. start updating `Lease` object every 10s by default, at the same time


nit: i know this is clarified earlier, but this is just changing the default frequency, but does not impact the frequency by which the node sends "meaningful" changes (i.e. changes in node condition) to the apiserver. it only impacts the frequency of non-meaningful changes being updated.

Done in follow up PR.

derekwaynecarr · 2018-05-21T15:00:27Z

keps/0008-node-heartbeat.md

+1. announce that we are going to reduce frequency of NodeStatus updates further
+   and give people 1-2 releases to switch their code to use `Lease`
+   object (if they relied on frequent NodeStatus changes)
+1. further reduce NodeStatus updates frequency to not less often than once per


further iterations of this proposal will need to debate what is meaningful or not, but yes, this is clear enough for now.

derekwaynecarr · 2018-05-21T15:01:09Z

keps/0008-node-heartbeat.md

+1. We may consider reducing frequency of NodeStatus updates to once every 5 minutes
+   (instead of 1 minute). That would help with performance/scalability even more.
+   Caveats:
+   - NodeProblemDetector is currently updating (some) node conditions every 1 minute


k8s-ci-robot · 2018-05-21T15:04:17Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: derekwaynecarr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/OWNERS~~ [derekwaynecarr]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

wojtek-t

Follow up PR: #2175

wojtek-t · 2018-05-22T09:25:50Z

keps/0008-node-heartbeat.md

+As a result, we decided to proceed with built-in API.
+
+
+We this new API in place, we will change Kubelet so that:


Fixed in #2172

wojtek-t · 2018-05-22T09:26:35Z

keps/0008-node-heartbeat.md

+1. Kubelet creates and periodically updates its own Lease object and frequency
+   of those updates is independent from NodeStatus update frequency.
+
+In the meantime, we will change `NodeController` to treat both updates of NodeStatus


Yes - that is correct.

wojtek-t · 2018-05-22T09:38:19Z

keps/0008-node-heartbeat.md

+
+Once all the code changes are done, we will:
+
+1. start updating `Lease` object every 10s by default, at the same time


Done in follow up PR.

KEP: efficient node heartbeats

k8s-ci-robot assigned dchen1107 and derekwaynecarr Apr 30, 2018

k8s-ci-robot requested review from bgrant0607, liggitt, smarterclayton and yujuhong April 30, 2018 07:45

liggitt reviewed Apr 30, 2018

View reviewed changes

zhouhaibing089 reviewed Apr 30, 2018

View reviewed changes

liggitt reviewed May 1, 2018

View reviewed changes

smarterclayton reviewed May 1, 2018

View reviewed changes

wojtek-t force-pushed the node_heartbeat_kep branch from d867040 to 38bab24 Compare May 1, 2018 05:55

resouer reviewed May 2, 2018

View reviewed changes

bgrant0607 reviewed May 3, 2018

View reviewed changes

wojtek-t force-pushed the node_heartbeat_kep branch 2 times, most recently from 75e0666 to 35463b3 Compare May 7, 2018 12:30

bgrant0607 reviewed May 7, 2018

View reviewed changes

wojtek-t commented May 8, 2018

View reviewed changes

bgrant0607 reviewed May 11, 2018

View reviewed changes

smarterclayton reviewed May 11, 2018

View reviewed changes

wojtek-t force-pushed the node_heartbeat_kep branch from 7fa7866 to 02359f0 Compare May 14, 2018 09:08

liggitt reviewed May 15, 2018

View reviewed changes

k8s-ci-robot requested a review from jpbetz May 15, 2018 21:35

wojtek-t force-pushed the node_heartbeat_kep branch from 02359f0 to 52a5ab8 Compare May 16, 2018 14:09

derekwaynecarr suggested changes May 17, 2018

View reviewed changes

wojtek-t force-pushed the node_heartbeat_kep branch from 52a5ab8 to cd37b1a Compare May 17, 2018 07:57

KEP: efficient node heartbeats

5465e31

wojtek-t force-pushed the node_heartbeat_kep branch from cd37b1a to 5465e31 Compare May 18, 2018 08:08

wojtek-t commented May 18, 2018

View reviewed changes

wojtek-t mentioned this pull request May 18, 2018

Increase in 'list nodes' latency on large cluster kubernetes/kubernetes#64022

Closed

derekwaynecarr approved these changes May 21, 2018

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 21, 2018

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 21, 2018

k8s-ci-robot merged commit 8f7aae4 into kubernetes:master May 21, 2018

wojtek-t commented May 22, 2018

View reviewed changes

liggitt mentioned this pull request May 30, 2018

Reduce getting node api calls when updating node status kubernetes/kubernetes#64065

Closed

bgrant0607 mentioned this pull request Jun 11, 2018

Create "Lease" API in the new "coordination.k8s.io" api group kubernetes/kubernetes#64246

Merged

wojtek-t deleted the node_heartbeat_kep branch July 3, 2018 12:54

jdumars mentioned this pull request Jul 12, 2018

KEP #8(dup): efficient node heartbeats kubernetes-retired/architecture-tracking#31

Closed

calebamiles pushed a commit to calebamiles/community that referenced this pull request Sep 5, 2018

Merge pull request kubernetes#2090 from wojtek-t/node_heartbeat_kep

9fd531c

KEP: efficient node heartbeats


		We will use CRD (as opposed to built-in API), because:

		- CRDs are `the` way to created new APIs

		This proposal presents a proper solution for that problem.


		Note that currently:


		Other notes:

		1. Additional advantage of using LeaderElection for that purpose would be the ability


		## Proposal

		We propose introducing a new `LeaderElection` CRD with the following schema:


		Note that currently (by default):

		- Lack of NodeStatus update for 40s results in NodeController marking node


		### Risks and Mitigations

		Increasing default frequency of of NodeStatus updates may potential break customers


		## Proposal

		We propose introducing a new `Lease` CRD. To make it easily


		## Graduation Criteria

		This can be immediately promoted to Beta, as the API is effectively a copy of

		As a result, we decided to proceed with built-in API.


		We this new API in place, we will change Kubelet so that:


		Once all the code changes are done, we will:

		1. start updating `Lease` object every 10s by default, at the same time

KEP: efficient node heartbeats #2090

KEP: efficient node heartbeats #2090

Conversation

wojtek-t commented Apr 30, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt May 1, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt May 1, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt May 1, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

resouer May 2, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

resouer May 4, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wojtek-t left a comment

Choose a reason for hiding this comment

dchen1107 commented May 10, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt May 1, 2018 •

edited

Loading

liggitt May 1, 2018 •

edited

Loading

liggitt May 1, 2018 •

edited

Loading

resouer May 2, 2018 •

edited

Loading

resouer May 4, 2018 •

edited

Loading