KEP-1669: promote ProxyTerminatingEndpoints to beta in v1.23 #2952

andrewsykim · 2021-09-07T21:31:20Z

One-line PR description: Promote ProxyTerminatingEndpoints to beta in v1.23

Issue link: Proxy Terminating Endpoints #1669

Other comments: initial PR was opened in [WIP] Promote KEP-1672 to GA #2938. Splitting each KEP into it's own PR as requested by @wojtek-t

thockin

Thanks!

/lgtm
/approve

wojtek-t

Readding my questions from the original PR - neither of those were answered.

wojtek-t · 2021-09-08T07:56:25Z

keps/sig-network/1669-graceful-termination-local-external-traffic-policy/README.md

-->
-
-TBD for beta.
+Roll out can fail if there are pods receiving traffic during termination but are unable to handle it.


L182 - have those been added? If so, can you please link them?

wojtek-t · 2021-09-08T07:57:00Z

keps/sig-network/1669-graceful-termination-local-external-traffic-policy/README.md

-->
-
-TBD for beta.
+Application-level metrics should be used to determine if traffic received during termination is causing issues.


That's far from ideal, as often cluster admins may not understand application-level metrics.

I agree it might be hard to reflect the exact user-oriented behavior, but can we e.g. at least expose some kube-proxy level metric that will be showing how many:
(a) terminating & ready
(b) terminating & not-ready
endpoints can in theory be targetted?
[i.e. counters]

wojtek-t · 2021-09-08T07:57:17Z

keps/sig-network/1669-graceful-termination-local-external-traffic-policy/README.md

-->
-
-TBD for beta.
+No, but upgrade testing should be done prior to Beta.


Automated or just manual? [I guess manual - if so, let's make it explicit]

Manual tests are still to be run.

Also - please ensure that the results will be added here to the KEP before the actual graduation will happen in k/k code [something like: https://github.com//pull/2538/files ]

ack, will update. I think @aojea mentioend that openshift does have some automated testing around this, but I'm not sure we'll get an upstream signal for it.

@smarterclayton @danwinship I think that we ended switching to externalTrafficPolicy=Cluster for the PDB tests , do you remember if there are others with externalTrafficPolicy=Local ?
if you don't remember out of your head I'll dig into current tests to find out

wojtek-t · 2021-09-08T07:58:27Z

keps/sig-network/1669-graceful-termination-local-external-traffic-policy/README.md

- [ ] Other (treat as last resort)
-  - Details:
+- [X] Other (treat as last resort)
+  - Details: SLIs are difficult to measure for this feature since health of a service is dependant on the underlying process in the Pod as well as the load balancer implementation fronting the service.


I think that we're mixing two things here:
(a) what is the end-user experience [i.e. if user requests are being served correctly]
(b) if the feature itself works correctly at k8s level [e.g. if there is a terminating endpoint we send traffic to it instead of black-holing the traffic]

Your answer is generally about (a). But if answering (a) is hard, we should at least try to answer (b).
And answering (b) sounds possible to me.

As an example - this SLI (and corresponding SLO) should actually serve the purpose relatively well - and should require just adjusting some labels in the reported metrics....

wojtek-t · 2021-09-08T07:58:40Z

keps/sig-network/1669-graceful-termination-local-external-traffic-policy/README.md

-TBD for beta.
+* A Service Type=LoadBalancer sets externalTrafficPolicy=Local.
+* The load balancer implementation uses `spec.healthCheckNodePort` for node health checking.
+* A Pod can receive traffic during termination.


Those conditions are non-trivial to determine for cluster operators.

How about having the metrics I suggested above [this won't allow the answer for specific service, but at least for conjunctions of all services.

wojtek-t · 2021-09-08T07:58:56Z

keps/sig-network/1669-graceful-termination-local-external-traffic-policy/README.md

-
-TBD for beta.
+We may consider adding metrics for total endpoints that are in the terminating state -- this will be evaluated based on the
+cardinality of such metrics.


I think I'm not fully following - can you clarify?

This is saying that we can add some level of metrics for terminating endpoints, but we need to be careful about the labels we apply to them. For example, if we included the endpoint as part of the metrics label, we are exploding metrics cardinality because every endpoint is unique. So maybe per endpoint metrics is not possible but total endpoints is. But I'm on the fence about how useful total endpoints is if you can't map it back to which nodes and pods it applies to

regardless I'm going to take a stab at trying to add metrics for this feature, but I think it wouldn't be that useful if the metric does not surface per pod / endpoint level details.

thockin · 2021-11-08T23:00:02Z

Should we hold on moving this forward until we sort kubernetes/kubernetes#100313 and kubernetes/kubernetes#106030 (comment) ?

@danwinship

danwinship · 2021-11-09T13:59:33Z

Yeah, I think we should at least make sure we have a path forward that we're happy with, even if we haven't started to implement it yet.

Signed-off-by: Andrew Sy Kim <[email protected]>

k8s-ci-robot · 2022-01-21T16:52:16Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andrewsykim, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/sig-network/OWNERS~~ [thockin]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2022-01-21T16:53:15Z

@andrewsykim: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-enhancements-verify	`1c219bd`	link	true	`/test pull-enhancements-verify`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

andrewsykim · 2022-01-21T16:58:06Z

/hold

andrewsykim · 2022-01-21T17:58:02Z

/close

Closing this in favor of #3174

k8s-ci-robot · 2022-01-21T17:58:27Z

@andrewsykim: Closed this PR.

In response to this:

/close

Closing this in favor of #3174

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot requested review from caseydavenport and thockin September 7, 2021 21:31

thockin reviewed Sep 7, 2021

View reviewed changes

k8s-ci-robot assigned thockin Sep 7, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 7, 2021

wojtek-t reviewed Sep 8, 2021

View reviewed changes

wojtek-t self-assigned this Sep 8, 2021

thockin removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 8, 2021

KEP-1669: updates for v1.24

1c219bd

Signed-off-by: Andrew Sy Kim <[email protected]>

andrewsykim force-pushed the kep-1669 branch from f5e3e8f to 1c219bd Compare January 21, 2022 16:51

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 21, 2022

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 21, 2022

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 21, 2022

k8s-ci-robot closed this Jan 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KEP-1669: promote ProxyTerminatingEndpoints to beta in v1.23 #2952

KEP-1669: promote ProxyTerminatingEndpoints to beta in v1.23 #2952

andrewsykim commented Sep 7, 2021

thockin left a comment

wojtek-t left a comment

wojtek-t Sep 8, 2021

wojtek-t Sep 8, 2021

wojtek-t Sep 8, 2021

andrewsykim Sep 8, 2021

aojea Sep 9, 2021

wojtek-t Sep 8, 2021

wojtek-t Sep 8, 2021

wojtek-t Sep 8, 2021

andrewsykim Sep 8, 2021

andrewsykim Sep 8, 2021

thockin commented Nov 8, 2021

danwinship commented Nov 9, 2021

k8s-ci-robot commented Jan 21, 2022

k8s-ci-robot commented Jan 21, 2022

andrewsykim commented Jan 21, 2022

andrewsykim commented Jan 21, 2022

k8s-ci-robot commented Jan 21, 2022

KEP-1669: promote ProxyTerminatingEndpoints to beta in v1.23 #2952

KEP-1669: promote ProxyTerminatingEndpoints to beta in v1.23 #2952

Conversation

andrewsykim commented Sep 7, 2021

thockin left a comment

Choose a reason for hiding this comment

wojtek-t left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thockin commented Nov 8, 2021

danwinship commented Nov 9, 2021

k8s-ci-robot commented Jan 21, 2022

k8s-ci-robot commented Jan 21, 2022

andrewsykim commented Jan 21, 2022

andrewsykim commented Jan 21, 2022

k8s-ci-robot commented Jan 21, 2022