sidecar: Add design callouts and alternatives discussed in the past 2 years #1913

rata · 2020-07-29T20:01:51Z

Hi!

This PR adds a "Proposal decisions to discuss" section to discuss some design issues/callouts from the KEP, as discussed on SIG-node meetings. It also adds the "Alternatives" section with the alternatives considered in the past 2+ years this KEP is open.

The motivation, goals, etc. sections are also improved to clarify the KEP scope and motivation. As this PR is quite long, however, I didn't add more sections that will make it longer. Let me know if you think it is worth adding some specific section (as part of this PR).

My idea of the "Proposal decisions to discuss" section was to use it only to discuss and agree alternatives, and completely remove it afterwards (updating the KEP accordingly). We can capture what is relevant from the decision making process in the alternatives section also. This is why, in part, this section is long: I tried to raise all the concerns I could find, to make sure we are all at the same page.

The "Alternatives" section part discussing the other design alternatives was purely done by @Joseph-Irving and @mrbobbytables . The rest of the commits were kindly reviewed by @Joseph-Irving too :)

Please let me know if I can help in any way. Thanks for your time and help so far! :)

Tagging @derekwaynecarr as talked on SIG-node

/cc @sjenning @SergeyKanzhelev

rata · 2020-07-29T20:05:33Z

Ups, forgot to cc derek to the bot too.

/cc @derekwaynecarr

howardjohn · 2020-07-29T20:29:04Z

keps/sig-apps/0753-sidecarcontainers.md

+
+An example in the open source world about this is the [Istio CNI
+plugin][istio-cni-plugin]. This was created as an alternative for the
+initContainer hack that the service mesh needs to do. The initContainer will


The init container is not just a hack to avoid the startup problem, but also a permissions issue. Even if we could guarantee the service mesh starts first, we would still retain the init container, to avoid running as root.

Oh, I see. The capabilities for NET_ADMIN and NET_RAW are added to the initContainer only, so the sidecar proxy can run just without them. Thanks for the clarification, will update the text

howardjohn · 2020-07-29T20:29:31Z

keps/sig-apps/0753-sidecarcontainers.md

+alternative requires that nodes have the CNI plugin installed, effectively
+coupling the service mesh app with the infrastructure.
+
+This KEP removes the need for a service mesh to use either an initContainer or a


This is not true, see above comment.

Thanks a lot for pointing it out.

I think it still makes sense to discuss the other callouts in the PR, so will update this part later to reflect what you pointed out. Thanks! :)

howardjohn · 2020-07-29T20:33:42Z

keps/sig-apps/0753-sidecarcontainers.md

+ * Recommend users to delay starting their apps by using a script to wait for
+   the service mesh to be ready. The goal of a service mesh to augment the app
+   functionality without modifying it, that goal is lost in this case.
+ * To guarantee that traffic goes via the services mesh, an initContainer is


My view of the problem is close but slightly different

If we set up iptables first (before pod starts with istio-cni, for example, or as the first init container), all init containers will fail as traffic is blackholed since the sidecar is not running

If we set up iptables as last init container, then previous init containers will not have traffic blackholed but will also not use the mesh. In many clusters, this means they cannot connect to any services, which require mTLS, which will no longer happen

Neither of these are addressed by the KEP. A third option is potentially running the iptables in the actual sidecar, but this is not feasible due to permissions.

Note - I would be very (very!) happy with the changes in the KEP. But I don't want to confuse that this solves all service mesh startup ordering issues

Good point, yes, none of these are addressed (sorry, I wasn't aware that was an important problem). It is a non-goal currently to run initContainer with sidecars containers.

The third option you mention, besides capabilities needed for the redirect, will not run concurrent with initContainers, so traffic is either blackholed or not, but initContainers traffic is not going through the istio sidecar either.

The question is, then, do we want to have initContainers with sidecar containers in-scope or not? What do you think @howardjohn ?

The trade-off, from my POV, is:

Having it in-scope for this will solve/improve that problem (making this KEP way more complicated to move forward)
vs

Having it out of scope will not solve that problem but will make advances with the rest

IMHO, it makes sense to move forward with this proposal (i.e. not solve the sidecars running concurrently with initContainers) but worries me a little that, if we later want to solve the initContainers with sidecar problem, the API might end up messy.

If we implement this KEP and in the future want to extend this in some way to address it, we can technically do it using other container types. For example, type: daemon (probably a better name :-D) can be used to mean that (those containers starts before initContainers too). Or type: OrderedPhase with an Order: <int> and, let's say, using Order: -1 can mean "before initContainers". However, it starts to get messy if you have containers in the Containers array that start before initContainers.

So, that seems like a messy thing to me. If we want to go the initContainer+sidecars route, we might want to have a new sidecarContainers array that start before initContainers too. That was discussed in the past (only the array, I think, not starting before initContainers IIRC), it is mentioned in the alternatives section and why it was discarded, though.

At the same time, I'm not sure there are many sidecar types that might want to run concurrent with initContainers. Are there any other use case? Maybe if you want to collect metrics/logs from the the initContainers using a sidecar? Not sure there are many users for this, though. Although, maybe it doesn't matter/hurt to most of sidecars. What do others think?

Taking a step back, a question from someone out of the service mesh world: does the service mesh really wants to be part of the pod, if the life-cycle it needs is so different? I guess probably yes for latency and to easily transparently redirect traffic that might not be possible otherwise. But just making sure there are no other solutions and we really want to think about this problem when the service mesh lives in the pod :)

Trying to be pragmatic, we can move to alpha stage without "initContainers + sidecars" (i.e. as it is proposed now) and gather feedback from users. We can gather feedback to see if it is worth doing a sidecarContainers array (or any other alternative that seems best to address the feedback)?

What do others think?

If we implemented the KEP as is I would be very happy, it would improve a lot of issues for a lot of people. I would be even more happy if we also fixed the init container issue, but not at the expense of not fixing the startup/shutdown. no need to let perfect be the enemy of good.

It would probably make sense to future proof this so it can be done, but not worry about it right now. Having a new type seems feasible to me.

As far as other use cases - all service meshes require this, as far as I know, and logging seems like something you would want for init containers as well. Honestly most sidecars are feasible to run with the init containers, for all the same reasons you want them to run with the main container. For example, some people have SQL proxies - same is needed if you use SQL in init container.

Taking a step back, a question from someone out of the service mesh world: does the service mesh really wants to be part of the pod, if the life-cycle it needs is so different

I don't think we need it to run in the pod, but there are no other existing alternatives that would work.

Commenting from other thread on

It was nice to have, but not many real use cases nor a big deal.

This claim is basically saying init containers that do networking is not a real use case, which isn't true in my opinion. There are very clear use cases for doing so, otherwise init containers would not likely exist 🙂. I think its more accurate to say that the people you talked to happen to not have users complain about them much, but I can't imagine that generalizes well to the broader Kubernetes community

Thanks! I agree that perfect is the enemy of good, I'm open to opinions whether sidecars make more sense before initContainers or after.

But yeah, IMHO, the best solution might be not so "theoretical": implement either one of them and gather user feedback :-)

@derekwaynecarr What do you think?

@SergeyKanzhelev the biggest change is semantics and how we want to reflect that. But the whole KEP needs to be changed, IMHO, if we want that instead.

Furthermore, those are not sidecars as we know them today, so we need to define how these "new" kind of sidecars behave a little bit. For example, some things we may need to decide from the top of my head:

Will we use the current containers array for that? In that case, the initContainers array will start before the containers not type sidecar only (they are meant to start before any other container in the pod, the name initContainers might become confugsing. We might be able to clarify with docs). That might be unexpected, so we might want to consider using a sidecarContainers array to avoid confusions. Not really sure, something to sleep on it.

Today sidecar containers (as a concept, not the type as it is not repesnt) start after initContainers have run. If someone is using the initContainers to initialize something for sidecar containers (totally legit), another way to provide that functionality might be needed. Another initSidecarContainers thingy or something else? This can become weird

What to do if a sidecar crashes while some initContainers are running? Just restarts, I guess? Then initContainers don't have the guarantee of a sidecar running either (as it can crash)

Thinking about this, I have mixed feelings, but I think I lean towards improving sidecars as we know them: sidecars that start after initContainers and use the alpha phase to gather feedback and revisit that. If we see that, for example, both things make sense, we might want to add a different type that starts before initContainers.
What do you think?

But I'm very open to thoughts :)

sorry for delayed response. Somehow all notifications broke for me. It's great to have a ping at a meeting.

I though about these concerns. I understand the main concern is that sidecar containers that has a dependency on init containers would not work as expected when one simply adds a new type to the container? The assumption that everything will just work once conatiner marked as sidecar, may not be true. What if existing sidecar container has some synchronization logic that checks for payload to start and keeps it in "not started" state. This will break the entire pod initialization.

The same time both important scenarios - logs and mesh would need an init containers support. If we will conclude that the way to support init containers is to introduce a similar type field on a container, why wouldn't we do it now. Sidecars which needs to work before init containers will just work. Sidecars which needs to work after init container may wait for the first readiness probe and treat it as an indication that initialization was complete.

Same as @howardjohn I believe that this KEP is great in addressing many issues. I just want to make sure there is no quick win by simply changing startup order for sidecars to run before init. And figuring out details like what to do when sidecars crashed during the initialization.

Perhaps there may be a section in the doc saying why we do not want to run sidecars before init.

sorry for delayed response. Somehow all notifications broke for me. It's great to have a ping at a meeting.

Ups, thanks! :)

I though about these concerns. I understand the main concern is that sidecar containers that has a dependency on init containers would not work as expected when one simply adds a new type to the container? The assumption that

Yes, exactly, that is one concern I raised.

everything will just work once conatiner marked as sidecar, may not be true. What if existing sidecar container has some synchronization logic that checks for payload to start and keeps it in "not started" state. This will break the entire pod initialization.

Not sure if we are saying the same here. That is exactly my point, if the sidecar has a dependency on an initContainer doing something before it starts, it will be completely broken.

Or am I misunderstanding you?

The same time both important scenarios - logs and mesh would need an init containers support. If we will conclude that the way to support init containers is to introduce a similar type field on a container, why wouldn't we do it now.

I think both options can work: we can do it now or we can have an alpha without initContainers, see if what we did effectively solves the problems that we think it solves or needs something else. After that, try to expand to initContainers and just focus on the problems of doing that, as we know what we have mostly works and the problems of extending it to that are quite different.

I lean towards to keep the current scope (more on why later in this comment, this really doesn't seem like a quick win as you say). We can try in the next few days if we figure out a way of extending it so we have sidecars + initContianer running concurrently, but if we can't find something obvious that will work in the next few days I'd fall back to the current scope.

What do you think?

Sidecars which needs to work before init containers will just work.

No, they won't on all cases. This is why it is not trivial and I'd rather focus on one problem at a time, and gather feedback.

Istio is one example, they mention in this thread that the service mesh sidecar should be running before initContainers but that itself needs some initialization first. They said:

A third option is potentially running the iptables in the actual sidecar, but this is not feasible due to permissions.

(To add some more context: the problem they refer to is the permissions needed in the container to run the iptables rules. They don't want to have those permissions on the sidecar pod, and instead have them in the initContainer or CNI plugin.)

They either need the CNI plugin to work in this case (for the iptables thing) or an initContainer that runs before sidecars, to create the iptables rules, then start the sidecars and then start the regular initContainer (as they want the sidecar service mesh to be running for other initContainers) and then continue with the initialization.

But yeah, Istio has a way out with the CNI plugin, as they only want some iptables rules. That is, of course, not the case of all sidecars AFAIK (like populating volumes, etc.). Any sidecar relying on an initContainer won't work if started before initContainers, if they also want to run during initContainer phase (as a service mesh does).

Some hacks can be done, like auto-inject the initContainer to be run first. But that has also several problems. Because then several applications want to be "the first" and there is a race injecting there. The analogous is alread happening to inject the last initContainers. Here is an example of such a bug report (it is also in some part of the text this PR adds :)): linkerd/linkerd2#4758 (comment).

So, those workarounds don't seem reliable and will create issues.

Maybe it can be a limitation: if your sidecar runs concurrently during other initContainers run, then you can't have initContainers. 🤔

Sidecars which needs to work after init container may wait for the first readiness probe and treat it as an indication that initialization was complete.

Not sure what you mean exactly with this, can you please elaborate? Wait for which readiness? How that would be explicit in the pod spec?

I just want to make sure there is no quick win by simply changing startup order for sidecars to run before init

IMHO, it doesn't seem like a quick win, but a path to explore that will take time. I think people will be opinionated, with very valid reasons, about having containers in the containers array starting before initContainers vs creating a new array (sidecarContainers), etc.

The more we talk about it, the more I think it will be just easier if we keep the current scope for the alpha phase (sidecars don't run during initContainers), see if the mechanism we add for that are effective for those use cases and then focus on extending this to initContainers (we can make it a show-stopper for beta graduation if that makes anyone more comfortable).

If extending this for sidecar+initContainer is difficult, it might be easier once we gather feedback and benefit users that don't need that shortly. After all, all sidecar containers today just run after initContainers (it is not possible otherwise), they will benefit. And we can just focus on the problem of extending it afterwards and how it is best solved, after having solved the "easier" problem (that is not so easy either, IMHO) and have a better understanding and feedback.

If extending it is easy, then we either figure it out in the next days (and we add it into the scope) or we add it later, it is easy :)

What do you think?

Istio is one example, they mention in this thread that the service mesh sidecar should be running before initContainers but that itself needs some initialization first.

It would actually work fine I think. The sidecar does not need iptables set up. Our only requirement is the iptables must be set up before the app container. So:

init, then app/sidecar in parallel is ok (but not ideal - hence the kep)

sidecar, then init, then app is ok

sidecar, then app, then init is not ok

I can see hypothetical scenarios that do require init containers to run first though, like if we use init containers to provision certificates. It could be worked around though, for example have the sidecar get ready without the certs and have the init container poll to ensure it doesn't exit until the sidecar has picked up the certs. Note this is still a much better scenario because it involves the sidecar/init container to make a change (2 images, likely owned by Istio/Linkerd/) rather than every single application container.

@rata I think we are on the same page understanding pros and cons of init and sidecars running in parallel. I think we prioritize different things while weighting options. So my questions is mostly to understand your priorities and understand the complexity.

The way I see it - sidecar containers are something (as you mentioned earlier) that almost like "outside the pod" feature that represent an ambient state of a pod. Something that always exists and does not block termination. So developer should not think of synchronizing with it. And sidecar owner has guarantees that all app logic will be executed when sidecar is active. And I see init and regular containers as a developer's created app logic.

If developer wants to create it's own sidecar that is an app logic (like queue management or config synchronization), and it depends on some Init containers, this will be a little bit more involved as it will require a bit of synchronization with Init containers. I don't see the need for this synchronization implementation as a big deal. Today sidecars like this has are probably implementing even more logic.

As for semantic - we can call this KEP "daemon" or "ambient" containers instead of "sidecar" if this will make semantic clearer.

Finally for the implementation complexity - are you talking about kubelet logic complexity or complexity for end user?

I also pinged you on Slack in case you want to discuss it in-person before writing more text =).

howardjohn · 2020-07-29T20:34:45Z

keps/sig-apps/0753-sidecarcontainers.md

+This proposal aims to:
+ * Allow Kubernetes Jobs to have sidecar containers that run continuously
+   without any coupling between containers in the pod.
+ * Allow pods to start a subset of containers first and, only when those are


I assume this is starting all of the contains at once right? Its not ordered like init containers?

Yes, exactly

Let me know if you think I should clarify that :)

howardjohn · 2020-07-29T20:55:08Z

keps/sig-apps/0753-sidecarcontainers.md

+the `containers` array.
+
+For most users we talked about (some big companies and linkerd, will try to
+contact istio soon) this doesn't seem like a problem. But wanted to make this


See my comment above on how this impacts Istio (and I suspect linkerd - I am surprised by this sentence).

Yes, I had a call with some Linkerd devs and my understanding was that it wasn't very important to them. It was nice to have, but not many real use cases nor a big deal. For other companies using a Kubernetes fork with sidecars I talked to, they just don't need the sidecars to run with initContainers either. That functionality is not present on their fork.

But let's continue the initContainers + sidecars conversation there :-)

howardjohn · 2020-07-29T20:55:38Z

keps/sig-apps/0753-sidecarcontainers.md

+> ...
+> The Istio CNI plugin performs the Istio mesh pod traffic redirection in the Kubernetes pod lifecycle’s network setup phase, thereby removing the requirement for the NET_ADMIN and NET_RAW capabilities for users deploying pods into the Istio mesh. The Istio CNI plugin replaces the functionality provided by the istio-init container.
+
+In other words, when using the CNI plugin it seems that InitContainer don't use


Its worse - the init container cannot do any networking at all

Oh, didn't know that drawback from the CNI plugin. Thanks!

rata · 2020-07-30T21:06:19Z

keps/sig-apps/0753-sidecarcontainers.md

+1. Run preStop hooks for non-sidecars
+1. Send SIGTERM for non-sidecars
+1. Run preStop hooks for sidecars
+1. Send SIGTERM for sidecars


@howardjohn Another question, if it is not too much trouble: will this alternative work for you guys?

To make it short: the current KEP proposes to run preStop hooks for sidecars as the first step in the shutdown sequence, to let service mesh drain traffic. This PR proposes (the reasons are explained ^) to modify that behavior and have a TerminationHook that is run first. It will be the same, just under a new field called TerminationHook.

IIUC, this is needed for some service mesh (I'm working with a client with a in-house service mesh that needs this, for example) and I think Istio needs it, but wanted to confirm.

Does Istio needs this and would it work for you guys?

PS: Linkerd, for example, doesn't need this as they use kubernetes services and pods in terminating state are removed from the endpoints objects (except for services with externalTrafficPolicy: Local, but that is addressed in another KEP), so that alone drains the service mesh connections. Not 100% sure in which case Istio currently is.

I don't think we have a need to run any prestop hooks. I am not sure why we would start draining traffic. Isn't that what the KEP is fixing? The application shuts down, at which point all connections are closed. Then we can just terminate the sidecar immediately.

We certainly need this draining today but I don't think we do if this is implemented.

Don't you need to de-register the pod from service discovery or something, when pod is about to terminate, so the app can just finish in-flight request and stop, without receiving new incoming requests via the service mesh?

On shutdown just having:

SIGTERM non-sidecars containers (using preStop hooks and all as expected)

SIGTERM sidecars containers (using preStop hooks and all as expected)

will work for istio?

I mean, the current KEP says:

PreStop Hooks will be sent to sidecars before containers are terminated. This will be useful in scenarios such as when your sidecar is a proxy so that it knows to no longer accept inbound requests but can continue to allow outbound ones until the the primary containers have shut down.

Istio doesn't need that at all? (either being called preStop hooks or TerminationHook)

Without Istio when pod termination is triggered:

Pod starts to terminate

Pod is removed from Endpoint

Endpoint update is sent to kube-proxy, stops sending traffic to this pod

Pod fully shuts down

Hopefully in that order, but not guaranteed!! There is no guarantee of zero downtime here. See https://blog.sebastian-daschner.com/entries/zero-downtime-updates-kubernetes for more detailed explaination.

With Istio when pod termination is triggered:

Pod starts to terminate

Pod is removed from Endpoint

Endpoint update is sent to istio-proxy, stops sending traffic to this pod

Pod fully shuts down

You'll notice this is really the same.

A valid solution to making the race conditions mentioned in the blog is adding a preStop hook - it doesn't have to be the sidecars preStop hook though. Any of:

Telling users to add preStop hook if they want this behavior (this doesn't violate the transparency goal of service mesh, as the same issue is present without service mesh)

Add the preStop hook to our sidecar

Add the preStop hook to the users container (we control injection, we can modify whatever we want).

Thanks for the detailed answer!

@Joseph-Irving do you remember why that part (running preStop hook twice) was added to the KEP?

I mean, I know use cases where that is needed for the very same reason the KEP says, but thought Istio/Linkerd would need it too, but it doesn't seem to be the case. I'm okay keeping it, though, I know use cases for this.

@rata to be clear, I'm not saying that TerminationHook wouldn't be useful, I suspect it would be. It just isn't clear that it would be absolutely necessary for Istio, assuming the sidecar is guaranteed to be up and ready for the duration of the application container.

Effectively, I'm echoing @howardjohn's "valid solution", which just uses preStop hooks. Specifically, the option where we inject a preStop hook into the application container, which effectively puts the sidecar in lameduck mode. This would basically just be a curl command.

If TerminationHook were available, however, it would be cleaner WRT injection since we could handle lameducking in the sidecar's TerminationHook rather than mucking about with the application container.

@nmittler Cool, thanks a lot! A curl command seems like a trade-off worth considering. Thanks again for the input :-)

For other reviewers, here goes a tl; dr: The TerminationHook field is a proposal to remove calling two times the preStop hooks, due to the problems that doing that creates (see this very same section the comment is on for additional context). Given that implementation-wise is quite similar to what we already have in the code, I think keeping it can be simple too.

Just let us know if you think it will be simpler to remove it for now (either from an implementation POV or from a KEP design POV) so we settle on one or the other.

The main reason to add this 2 years ago was istio, but there are today some other in-house service mesh apps I'm working with that must have this to work properly. Although Istio now is fine without it, having it seems better too.

The concept of a Termination hook may be broadly useful so perhaps it makes sense to remove it from this KEP and develop it independently? It has no dependency on this proposal. It would also simplify this KEP if we removed the pre-stop behaviour and just focus on startup/shutdown ordering.

+1 ... that's really the thrust of much of my discussion above. I think a separate KEP makes a lot of sense here.

I'm fine with either of the choices (keep it or leave it in this KEP) as long as we move forward with this KEP :-D. I'd like to know what @derekwaynecarr or others from SIG-node think?

Removing this from the KEP effectively removes a step from the shutdown sequence and that changes how we might want to split the time in the shutdown sequence. For example, this affects other callouts like this one). So, if this needs to be added as a different KEP, it will change the time for the shutdown sequence for sidecar containers as described in this KEP. IMHO, we can remove it for now but if we want to add it later, we may want to add it to this KEP.

rata · 2020-07-30T21:12:18Z

keps/sig-apps/0753-sidecarcontainers.md

+
+Rodrigo will reach out to Istio devs to see if the situation changed since 2018.
+
+[istio-bug-report]: https://github.com/kubernetes/kubernetes/issues/65502


@howardjohn one last question to address this ^ section, if it is not too much to ask.

Do you still need to use, as the bug says, some scripts for other applications in the pod to wait for the service mesh to be ready? Or was the situation changed in these 2 years?

If it changed a lot and you can share your current pain points and which ones this KEP effectively address, it would be great.

cc @nmittler that opened the bug, maybe? :)

alpeb · 2020-07-31T21:03:57Z

keps/sig-apps/0753-sidecarcontainers.md

+##### Suggestion
+
+Confirm with users that is okay to not have sidecars during the initContainers
+phase and they don't foresee any reason to add them in the near future.
+
+It seems likely that is not a problem and this is a win for them, as they can
+remove most of the hacks. However, it seems worth investigating if the CNI
+plugin is a viable alternative for most service mesh and, in that case, how much
+they will benefit from this sidecar KEP.
+
+It seems likely that they will benefit for two reason: (a) this might
+be simpler or model better what service mesh need during startup, (b) they still
+need to solve the problem on shutdown, where the service mesh needs to drain
+connections first and be running until others terminate. These are needed for
+graceful shutdown and allowing other containers to use the network on shutdown,
+respectively.


I got a little confused by this section. The CNI plugin is only being used just for setting up the iptables forwarding. The main goal of this KEP (IMO), i.e. setting the proper startup and shutdown ordering among sidecars and regular containers, remains a critical need regardless of whether the forwarding rules are set through CNI or through an initContainer.

Yes, sorry, it was a misunderstanding from my side. As it was explained in this comment, will update the text regarding this :)

keps/sig-apps/0753-sidecarcontainers.md

alpeb · 2020-07-31T22:39:39Z

keps/sig-apps/0753-sidecarcontainers.md

+##### Alternative 1: Add a per container fatalToPod field
+
+One option is to add a `fatalToPod` bool field _per container_. This will mean
+that if the given container crashes, that is fatal to the pod so the pod is
+killed.


That sounds like a useful feature, no just for sidecars, that can be tackled separately 👍

Yes, but if we don't add it now, you can create jobs that will be stalled there forever. Kubernetes won't have any built-in mechanism to prevent them from collecting those jobs.

I mentioned that alternative too.

Any reasons to prefer to have "stalled pods"?

I don't prefer stalled pods; just thinking on keeping the KEP as focused as possible. I wouldn't mind if stalled pods were a known inconvenient edge case say during alpha. And if the fatalToPod type has broader applicability than just sidecars IMHO it makes more sense to implement it separately.

SergeyKanzhelev · 2020-08-01T00:17:18Z

keps/sig-apps/0753-sidecarcontainers.md

+
+Pods with sidecar containers only change the behaviour of the startup and
+shutdown sequence of a pod: sidecar containers are started before non-sidecars
+and stopped after non-sidecars.


I'd emphasize here that they forcefully stopped when all other containers finished. E.g.:

Suggested change

and stopped after non-sidecars.

and stopped after non-sidecars. Sidecar containers are also terminated when no non-sidecar containers finished.

Good point, thanks! Will amend to current patches, if that is okay to you :)

SergeyKanzhelev · 2020-08-01T05:19:50Z

keps/sig-apps/0753-sidecarcontainers.md

+ * Reimplement an init-system alike semantics for pod containers
+   startup/shutdown
+ * Allow sidecar containers to run concurrently with initContainers
+
 Allowing multiple containers to run at once during the init phase - this could be solved using the same principal but can be implemented separately. //TODO write up how we could solve the init problem with this proposal


Is this a duplication of an item above Allow sidecar containers to run concurrently with initContainers?

Yes, I just left it as it had a "TODO" note. I can remove it as part of this PR, if you think it is better. Let me know :)

SergeyKanzhelev · 2020-08-01T05:30:37Z

keps/sig-apps/0753-sidecarcontainers.md

+Aim to use alternative 1, while in parallel we collect more feedback from the
+community.
+
+It will be nice to do nothing (alternative 2), but probably the first step


but probably the first step

It can always be added after the feedback. So we can go with do nothing and add exactly what is required later.

Sure. If all agree on that, LGTM :)

This was added by request of istio devs, though. Let's see what they say here: #1913 (comment)

And then decide :)

derekwaynecarr · 2020-08-11T17:14:52Z

can we move this under a sig-node directory? implementation of this feels largely oriented to sig-node domain.

rata · 2020-08-11T17:20:53Z

@derekwaynecarr done! :)

SergeyKanzhelev · 2020-08-11T22:26:31Z

nit: @rata good way to rename files is to use git mv command. This will preserve the history of the original file even after the move.

rata · 2020-08-12T08:27:58Z

nit: @rata good way to rename files is to use git mv command. This will preserve the history of the original file even after the move.

I did that: git mv sig-apps/0753-sidecarcontainers.md sig-node/ (from my history). Is there something not working as expected?

Reviewed-by: Joseph-Irving <[email protected]> Signed-off-by: Rodrigo Campos <[email protected]>

As suggested by derekwaynecarr here: kubernetes#1874 (comment) Reviewed-by: Joseph-Irving <[email protected]> Signed-off-by: Rodrigo Campos <[email protected]>

This was amended by Rodrigo Campos to use 80 columns and address Joseph suggestions, while also adding some minor tweaks.

rata · 2020-09-02T17:14:22Z

@SergeyKanzhelev @derekwaynecarr can you pelase lgtm and approve again? Merge was not possible due to a silly conflict (this PR #1939 was merged and somehow caused a conflict), so I rebased without changes.

Thanks again for the reviews! :)

SergeyKanzhelev · 2020-09-02T17:25:37Z

/lgtm

thank you @rata!

This was requested by Sergey here: kubernetes#1913 (comment) Signed-off-by: Rodrigo Campos <[email protected]>

As requested by Sergey: kubernetes#1913 (comment) Signed-off-by: Rodrigo Campos <[email protected]>

One decision to revisit was to see if really wanted to change the pod phase. It was clearly agreed here that we won't: kubernetes#1913 (comment) This commit just removes the section to discuss that (it is already agreed) and updates the KEP to reflect that. Signed-off-by: Rodrigo Campos <[email protected]>

This was added due to a comment from Istio long ago[1], but they don't need this anymore[2]. Furthermore, our use cases at Kinvolk also work just fine without this. [1]: kubernetes/community#2148 (comment) [2]: kubernetes#1913 (comment) Signed-off-by: Rodrigo Campos <[email protected]>

This was requested by Sergey here: kubernetes#1913 (comment) Signed-off-by: Rodrigo Campos <[email protected]>

As requested by Sergey: kubernetes#1913 (comment) Signed-off-by: Rodrigo Campos <[email protected]>

One decision to revisit was to see if really wanted to change the pod phase. It was clearly agreed here that we won't: kubernetes#1913 (comment) This commit just removes the section to discuss that (it is already agreed) and updates the KEP to reflect that. Signed-off-by: Rodrigo Campos <[email protected]>

This was added due to a comment from Istio long ago[1], but they don't need this anymore[2]. Furthermore, our use cases at Kinvolk also work just fine without this. [1]: kubernetes/community#2148 (comment) [2]: kubernetes#1913 (comment) Signed-off-by: Rodrigo Campos <[email protected]>

This was suggested by Sergey here: kubernetes#1913 (comment) Sadly, that branch is merged and can't click on commit suggestion now. Credits goes to Sergey anyways :-) Signed-off-by: Rodrigo Campos <[email protected]>

In the previous PR, istio devs commented that some things were not accurate. This commit just updates the text to (hopefully) correctly reflect it now. Removed the paragraph about this removing the need for an initContainer due to comment here: kubernetes#1913 (comment) I thought it was an okay to insert the iptables rules within the sidecar proxy container, but it is not okay as that requires more permissions (capabilities) on the sidecar proxy container which is not considered accetable by Istio devs. Signed-off-by: Rodrigo Campos <[email protected]>

This was requested by Sergey here: kubernetes#1913 (comment) Signed-off-by: Rodrigo Campos <[email protected]>

As requested by Sergey: kubernetes#1913 (comment) Signed-off-by: Rodrigo Campos <[email protected]>

One decision to revisit was to see if really wanted to change the pod phase. It was clearly agreed here that we won't: kubernetes#1913 (comment) This commit just removes the section to discuss that (it is already agreed) and updates the KEP to reflect that. Signed-off-by: Rodrigo Campos <[email protected]>

This was added due to a comment from Istio long ago[1], but they don't need this anymore[2]. Furthermore, our use cases at Kinvolk also work just fine without this. [1]: kubernetes/community#2148 (comment) [2]: kubernetes#1913 (comment) Signed-off-by: Rodrigo Campos <[email protected]>

This was suggested by Sergey here: kubernetes#1913 (comment) Sadly, that branch is merged and can't click on commit suggestion now. Credits goes to Sergey anyways :-) Signed-off-by: Rodrigo Campos <[email protected]>

In the previous PR, istio devs commented that some things were not accurate. This commit just updates the text to (hopefully) correctly reflect it now. Removed the paragraph about this removing the need for an initContainer due to comment here: kubernetes#1913 (comment) I thought it was an okay to insert the iptables rules within the sidecar proxy container, but it is not okay as that requires more permissions (capabilities) on the sidecar proxy container which is not considered accetable by Istio devs. Signed-off-by: Rodrigo Campos <[email protected]>

k8s-ci-robot requested review from SergeyKanzhelev and sjenning July 29, 2020 20:01

rata force-pushed the rata/sidecar-kep branch from 38d3a36 to a0127ad Compare July 29, 2020 20:04

k8s-ci-robot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Jul 29, 2020

k8s-ci-robot requested a review from derekwaynecarr July 29, 2020 20:05

rata force-pushed the rata/sidecar-kep branch from a0127ad to 68b4c42 Compare July 29, 2020 20:08

howardjohn reviewed Jul 29, 2020

View reviewed changes

rata commented Jul 30, 2020

View reviewed changes

rata mentioned this pull request Jul 30, 2020

Support startup dependencies between containers on the same Pod kubernetes/kubernetes#65502

Closed

alpeb reviewed Jul 31, 2020

View reviewed changes

keps/sig-apps/0753-sidecarcontainers.md Outdated Show resolved Hide resolved

alpeb reviewed Jul 31, 2020

View reviewed changes

SergeyKanzhelev reviewed Aug 1, 2020

View reviewed changes

howardjohn mentioned this pull request Aug 4, 2020

ephemeral csi driver istio/istio#21981

Closed

derekwaynecarr self-assigned this Aug 11, 2020

k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Aug 11, 2020

rata and others added 3 commits September 2, 2020 19:11

sidecar: update TOC

8caebcd

Reviewed-by: Joseph-Irving <[email protected]> Signed-off-by: Rodrigo Campos <[email protected]>

sidecar: Update Beta migration criteria

d91a4e1

As suggested by derekwaynecarr here: kubernetes#1874 (comment) Reviewed-by: Joseph-Irving <[email protected]> Signed-off-by: Rodrigo Campos <[email protected]>

sidecar: Fix typos and minor tweaks

354fad5

This was amended by Rodrigo Campos to use 80 columns and address Joseph suggestions, while also adding some minor tweaks.

rata force-pushed the rata/sidecar-kep branch from bd6e981 to 354fad5 Compare September 2, 2020 17:11

k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Sep 2, 2020

k8s-ci-robot assigned SergeyKanzhelev Sep 2, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 2, 2020

k8s-ci-robot merged commit 6928d9f into kubernetes:master Sep 2, 2020

k8s-ci-robot added this to the v1.20 milestone Sep 2, 2020

rata deleted the rata/sidecar-kep branch September 2, 2020 17:27

rata added a commit to kinvolk/kubernetes-enhancements that referenced this pull request Sep 9, 2020

sidecar: Remove old phrase in non-goals

ec992ca

This was requested by Sergey here: kubernetes#1913 (comment) Signed-off-by: Rodrigo Campos <[email protected]>

rata added a commit to kinvolk/kubernetes-enhancements that referenced this pull request Sep 9, 2020

sidecar: Clarify all sidecars start in parallel

9eef7cf

As requested by Sergey: kubernetes#1913 (comment) Signed-off-by: Rodrigo Campos <[email protected]>

rata added a commit to kinvolk/kubernetes-enhancements that referenced this pull request Sep 10, 2020

sidecar: Remove old phrase in non-goals

ebd059c

This was requested by Sergey here: kubernetes#1913 (comment) Signed-off-by: Rodrigo Campos <[email protected]>

rata added a commit to kinvolk/kubernetes-enhancements that referenced this pull request Sep 10, 2020

sidecar: Clarify all sidecars start in parallel

c8469da

As requested by Sergey: kubernetes#1913 (comment) Signed-off-by: Rodrigo Campos <[email protected]>

SergeyKanzhelev mentioned this pull request Sep 24, 2020

sidecar: Address all concerns raised #1980

Merged

SergeyKanzhelev pushed a commit to SergeyKanzhelev/enhancements that referenced this pull request Jan 8, 2021

sidecar: Remove old phrase in non-goals

5dca8e3

This was requested by Sergey here: kubernetes#1913 (comment) Signed-off-by: Rodrigo Campos <[email protected]>

SergeyKanzhelev pushed a commit to SergeyKanzhelev/enhancements that referenced this pull request Jan 8, 2021

sidecar: Clarify all sidecars start in parallel

878f536

As requested by Sergey: kubernetes#1913 (comment) Signed-off-by: Rodrigo Campos <[email protected]>


		Rodrigo will reach out to Istio devs to see if the situation changed since 2018.

		[istio-bug-report]: https://github.com/kubernetes/kubernetes/issues/65502

	and stopped after non-sidecars.
	and stopped after non-sidecars. Sidecar containers are also terminated when no non-sidecar containers finished.

sidecar: Add design callouts and alternatives discussed in the past 2 years #1913

sidecar: Add design callouts and alternatives discussed in the past 2 years #1913

Conversation

rata commented Jul 29, 2020 • edited Loading

rata commented Jul 29, 2020

Choose a reason for hiding this comment

rata Jul 30, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rata Jul 30, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rata Jul 30, 2020 • edited Loading

Choose a reason for hiding this comment

rata Aug 3, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rata Aug 12, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rata Jul 30, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rata Jul 30, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rata Aug 4, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rata Jul 30, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rata Aug 3, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

derekwaynecarr commented Aug 11, 2020

rata commented Aug 11, 2020

SergeyKanzhelev commented Aug 11, 2020

rata commented Aug 12, 2020

rata commented Sep 2, 2020

SergeyKanzhelev commented Sep 2, 2020

rata commented Jul 29, 2020 •

edited

Loading

rata Jul 30, 2020 •

edited

Loading

rata Jul 30, 2020 •

edited

Loading

rata Jul 30, 2020 •

edited

Loading

rata Aug 3, 2020 •

edited

Loading

rata Aug 12, 2020 •

edited

Loading

rata Jul 30, 2020 •

edited

Loading

rata Jul 30, 2020 •

edited

Loading

rata Aug 4, 2020 •

edited

Loading

rata Jul 30, 2020 •

edited

Loading

rata Aug 3, 2020 •

edited

Loading