Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EKS] [CRI]: Support for Containerd CRI #313

Closed
paavan98pm opened this issue Jun 5, 2019 · 42 comments
Closed

[EKS] [CRI]: Support for Containerd CRI #313

paavan98pm opened this issue Jun 5, 2019 · 42 comments
Labels
EKS Amazon Elastic Kubernetes Service

Comments

@paavan98pm
Copy link

Tell us about your request
What do you want us to build?
Support for Containerd CRI

Which service(s) is this request for?
EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
Currently, EKS nodes run dockerd. Containerd is a popular CRI that is more efficient.

Are you currently working around this issue?
How are you currently solving this problem?
AL2 nodes fail when containerd is installed.

Additional context
Anything else we should know?
This will enable customers customise and configure kubelet parameters to select their preferred Container Runtime.

Attachments
If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)

@paavan98pm paavan98pm added the Proposed Community submitted issue label Jun 5, 2019
@tabern tabern added the EKS Amazon Elastic Kubernetes Service label Jul 2, 2019
@rverma-nikiai
Copy link

No updates in past 2 months, even no acknowledgement. This looks bad.

@lgg42
Copy link

lgg42 commented Aug 9, 2019

Yep, time to think about it right?

@jtoberon
Copy link

We're excited to support containerd, too. We are making sure that we have the right test, security, and release tools in place before we officially recommend it to our customers.

It would be useful to know if folks have specific thoughts about how the runtime should be configured:

  • Should we install both docker-ce and containerd in the AMI? Should both start automatically upon instance launch?
  • Do customers want to use a runtimeClassName config to pick a runtime dynamically?

@whereisaaron
Copy link

Great @jtoberon. If there is just one AMI, then both runtimes with runtimeClassName sounds like a good transition plan. But for us the end goal is for the cluster/AMIs to be docker-free; the past grief caused by the unstable development practices, the kitchen-sinking of swarm, the ‘moby’ mess and other junk changes make us wary of that upstream project, we are ready to leave it behind.

@owenthereal
Copy link

👍 to being able to run both runtimes with runtimeClassName. Is there a ETA for the support?

@edify42
Copy link

edify42 commented Sep 13, 2019

Is it as simple as making changes to the worker nodes AMI https://github.com/awslabs/amazon-eks-ami to run containerd? Or are the masters/control-plane also in need of changes?

@jtoberon
Copy link

jtoberon commented Sep 16, 2019

Yes, we intend to change that AMI build to install the containerd software.

To support that change, we need to do a bunch of other things. Here are a few examples:

  • Build the containerd binary into the Amazon Linux yum repositories.
  • Establish a process for learning about and responding to embargoed CVEs in any new software that we build.
  • Do performance testing to make sure we understand the performance implications of this runtime change for our customers.
  • Set up automated tests to ensure that all of the software that we package for our customers continues to work together over time.

@whereisaaron
Copy link

Thanks @jtoberon yes I imagine it is not trivial. And I expect it will need a healthy period of 'developer preview' too.

One other possible chore for your list is ensuring the container logging and log rotation works well, and that you have a solid fluentd/cloudwatch configuration. Since that works quite differently for containerd compared to dockerd.

@kvidhya
Copy link

kvidhya commented Sep 24, 2019

Hello, I am trying to get runsc running on EKS workers. Is there a way to do it today? Appreciate any pointers

@kvidhya
Copy link

kvidhya commented Sep 24, 2019

Also when will the containerd support be released?

@Thutm
Copy link

Thutm commented Nov 13, 2019

Now that docker enterprise was acquired could this also be scoped out for the ECS platform? Not really sure what goes into that other than maybe updating ecs optimized ami and the ecs-agent itself? What is everyone thoughts on that?

@kr3cj
Copy link

kr3cj commented Jan 9, 2020

We would be interested in switching out docker for containerd in our EKS nodes as well, in hopes that it might help with things like awslabs/amazon-eks-ami#195

@inductor
Copy link

inductor commented Feb 4, 2020

In design, Fargate nodes seem to use containerd as their container runtime.

@mvisonneau
Copy link

With a custom AMI setup, at first sight it seems to work correctly 👍

eks.1 - 1.16

# Install containerd
~# wget https://storage.googleapis.com/cri-containerd-release/cri-containerd-1.3.4.linux-amd64.tar.gz
~# tar --no-overwrite-dir -C / -xzf /tmp/cri-containerd-1.3.4.linux-amd64.tar.gz
~# containerd config default > /etc/containerd/config.toml
~# systemctl containerd start

# Added couple of necessary flags on the kubelet
--container-runtime-endpoint=unix:///run/containerd/containerd.sock
--container-runtime=remote
# Node info
System Info:
  Machine ID:                 ec24d43f57c1054dbf44887269f36c5a
  System UUID:                ec24d43f-57c1-054d-bf44-887269f36c5a
  Boot ID:                    976f4d4e-07df-4d7a-94a4-9ff7a661ed70
  Kernel Version:             5.4.0-1009-aws
  OS Image:                   Ubuntu 20.04 LTS
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.3.4
  Kubelet Version:            v1.16.8
  Kube-Proxy Version:         v1.16.8

# Ready event
  Normal   NodeReady                9m1s                   kubelet, ip-10-0-0-1.eu-west-1.compute.internal  Node ip-10-0-0-1.eu-west-1.compute.internal status is now: NodeReady

@kferrone
Copy link

EKS with some NodeGroups as Docker and some as ContainerD? Is this possible with K8S? Would be nice . . .

@hendrikhalkow
Copy link

Hi EKS team, it‘s 2020 and containerd is a CNCF graduated project. It shouldn‘t be only installed on the EKS AMIs, it should be the default runtime.

When will we see containerd support on EKS?

@midN
Copy link

midN commented May 28, 2020

Any updates on that?

@mikestef9
Copy link
Contributor

EKS/Fargate uses the containerd runtime, so that is a production ready option today.

Our plan for containerd on worker nodes is to add official EKS support for Bottlerocket, once the project graduates from a public preview.

Bottlerocket is a Linux based operating system purpose-built to run containers. We encourage you to try out the public preview and leave any feedback on the GitHub project. You can learn more about Bottlerocket here.

@mikestef9 mikestef9 removed the Proposed Community submitted issue label Jul 15, 2020
@ecrousseau
Copy link

Hi @mikestef9 and/or EKS team - we are trying to work out which 'sandboxing' features (e.g. gVisor, Firecracker) are available in EKS - as far as I can tell, none are currently supported (but please correct me if that is wrong).

Bottlerocket 1.0.0 was released ~15 hours ago - what does the timeline look like for it being available in EKS? Will it be supported when using managed node groups? And - the big question for me - given that it uses containerd, does that mean we'll be able to plug in Firecracker/similar?

@mreferre
Copy link

mreferre commented Sep 1, 2020

@ecrousseau can you clarify what you mean by "sandboxing"? I'd argue EKS/Fargate does provide "sandboxing" in that each Kubernetes pod runs in its dedicated OS/kernel (or VM/instance if you will). When the user deploys a pod we source on the fly a dedicated vm/instance from the Fargate pool and use that dedicated vm/instance to run that specific pod. Rinse and repeat. This vm/instance could be an EC2 instance that is part of the Fargate pool or could be a micro-VM running on Firecracker. This is an implementation detail and would not be something that a user should be aware of. They both implement the same deployment pattern (1 pod per instance/vm).

Is this the "sandboxing" you are alluding to?

@ecrousseau
Copy link

Thanks @mreferre - yes, that kind of separation is what I was talking about. I will have a look at Fargate.

@matthewhembree
Copy link

EKS/Fargate isn't a perfect solution for sandboxing in my environment. We use the Datadog agent running as a daemonset to collect logs and metrics (and maybe soon APM) for our workloads.

I haven't tested this yet: To accomplish those collections with EKS/Fargate, we would need to run the Datadog agent as a sidecar container in every EKS/Fargate pod. This is based on the assumption that the Datadog agent could still do the collections as a non-privileged container (e.g. access /var/logs/pods possibly as a readOnly mount).

Assuming that we could access the logs and metrics, we'd still have to exponentially increase our resource utilization (i.e. costs) to run the sidecars versus the daemonset.

@mreferre
Copy link

mreferre commented Nov 5, 2020

@matthewhembree refer to the Datadog has this documentation to see how it's implemented in details. You are correct that, with the current model, you'd need to have an agent sidecar per each pod. Unless you are consuming nearly all resources you are sizing your pod for, I would speculate that resource consumption isn't as relevant as having to change the operational model to inject these sidecars to make logging work on EKS/Fargate. We want to solve for this by way of this feature that we are working on. The idea would be to have a router embedded into the Fargate service that you transparently use to ship logs (and more) to an external endpoint with a centralized configuration. Not only would you not to have to inject a sidecar into every pod but you would not have to deal with DeamonSets either (given with Fargate there are no nodes by definition). If that would be of interest to you please subscribe to that issue to get updates as we progress.

@phenri00
Copy link

phenri00 commented Dec 2, 2020

Docker is now deprecated(v1.20).

https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#changes-by-kind-1

@Saziba
Copy link

Saziba commented Dec 2, 2020

Calm down... it's not like Docker will be broken:

kubernetes/kubernetes#94624 (comment)

@TBBle
Copy link

TBBle commented Dec 3, 2020

As I understand, both Fargate and Bottlerocket are generally available and use containerd CRI, and are supported by tools like eksctl, c.f., Bottlerocket and Fargate.

Managed Nodegroups don't specifically support Bottlerocket yet. That's a different issue, #950. There is support for custom AMIs through a Launch Template, which could certainly be a Bottlerocket AMI, if you want this combination working now.

Is there anything left that is going to happen in the future for this ticket? Some things trawled from the comments on this ticket:

If those aren't part of the current plan, then the plan at #313 (comment) has been achieved, so maybe resolve this ticket and track separate requests like the above two separately.

@phenri00
Copy link

phenri00 commented Dec 3, 2020

Calm down... it's not like Docker will be broken:

kubernetes/kubernetes#94624 (comment)

What if you are running dind in your CI? This will not work anymore, right?

(yes, I know you should use something like kaniko for building images)

@TBBle
Copy link

TBBle commented Dec 3, 2020

What if you are running dind in your CI? This will not work anymore, right?

You can still run Docker on the node for dind DooD use, it's just that you'll also need a containerd CRI (or cri-o, or other CRI implementation) on the node for k8s to use. It's like a GPU then, it's up to the cluster administrator to make Docker available as a system resource if necessary.

For a dind use-case, you should be able to run dockerd as a privileged non-sandboxed daemonset image, so you don't have to install it manually on your nodes, like we do with kube-proxy and many CSI/CNI plugins for example. (It's possible that even a fully-privileged pod doesn't have the access it needs for this, I acknowledge).

It's also possible that by the time this happens, Docker's bundled containerd (on Linux) will have CRI available, so you could still install Docker on the node, and then point k8s at the Docker-bundled containerd's CRI. I'm not certain that will work, but if you have no choice but dind DooD, then it'll be worth exploring sooner rather than later.

Of course, if your workflow was assuming that the node would have access to dind DooD-built images without pushing them to a registry, that will break irretrievably. It's already a pretty-risky approach now though, so hopefully no one's still relying on that a year from now.

Edit: Actually, I think we're talking about DooD, "Docker outside of Docker", where a container has access to the host's /var/run/docker.sock. That's the flow that breaks when users are relying on the k8s install automatically including a Docker daemon. That flow also breaks now if you use Bottlerocket today, which doesn't use Docker, or Fargate, where you can't run privileged containers at all (and is also not using Docker). So this was already a bad idea on k8s clusters.

Docker-in-Docker is where you have a Docker daemon running in a privileged container with one of the docker-dind container images. I believe that will still work, as it just needs to have access to the host, and doesn't rely on the external container runtime also being Docker, so I expect that will work on Bottlerocket today.

So in short: Move to Bottlerocket and test your stuff. If anything breaks around Docker, move back and you have 3 or so k8s releases to fix that before the whole ecosystem becomes like that. And you'll have a less-brittle pipeline as a bonus.

@Saziba
Copy link

Saziba commented Dec 3, 2020

Calm down... it's not like Docker will be broken:
kubernetes/kubernetes#94624 (comment)

What if you are running dind in your CI? This will not work anymore, right?

(yes, I know you should use something like kaniko for building images)

https://kubernetes.io/blog/2020/12/02/dont-panic-kubernetes-and-docker/

@phenri00
Copy link

phenri00 commented Dec 3, 2020

Calm down... it's not like Docker will be broken:
kubernetes/kubernetes#94624 (comment)

What if you are running dind in your CI? This will not work anymore, right?
(yes, I know you should use something like kaniko for building images)

https://kubernetes.io/blog/2020/12/02/dont-panic-kubernetes-and-docker/

Thanks. Great stuff. So it seems like dind will be broken then(if you are mounting the socket):

One thing to note: If you are relying on the underlying docker socket (/var/run/docker.sock) as part of a workflow within your cluster today, moving to a different runtime will break your ability to use it. This pattern is often called Docker in Docker. There are lots of options out there for this specific use case including things like kaniko, img, and buildah

Kaniko looks great so will probably move to that.

@MarcusNoble
Copy link

I think you can still make use of the Docker socket by having the docker runtime on your host OS but using a different runtime with Kubernetes. I'd only go for that if updating your applications to something like Kaniko isn't possible right now though.

@mikestef9
Copy link
Contributor

mikestef9 commented May 4, 2021

Hey all some updates here,

We will be adding containerd as a container runtime to the EKS optimized Amazon Linux 2 AMI.

The current rollout plan is as follows:

  1. Add a bootstrap flag to the EKS AMI allowing users to toggle between containerd and Docker as the container runtime. You can see a draft PR here. Note that Docker will remain the default runtime.
  2. In a future EKS AMI Kubernetes minor version (currently targeting v1.22), change the default to containerd. You will no longer be able to use the Docker runtime.

We will have a blog with more info and recommendations on how to prepare for the removal of Docker as a supported container runtime coming in the next month or two.

@mo-saeed
Copy link

Hi @mikestef9 what about Amazon EKS optimized Ubuntu Linux AMIs

Thanks

@TBBle
Copy link

TBBle commented May 10, 2021

@mo-saeed: According to that link, that's a question for Canonical, as they provide those images. I don't see anything about a switch to containerd in their changelog, so you might want to open a feature request at https://bugs.launchpad.net/cloud-images for this.

@sean-keane25
Copy link

@mikestef9 when and where will this be officially announced/confirmed:

"In a future EKS AMI Kubernetes minor version (currently targeting v1.21), change the default to containerd. You can still manually switch back to Docker with the bootstrap flag."

@robert-heinzmann-logmein

It seems that containerd is now mentioned in the changelog since AMI Release v20210519

https://github.com/awslabs/amazon-eks-ami/blob/master/CHANGELOG.md

Does this mean containerd is now supported ?

@stevehipwell
Copy link

@robert-heinzmann-logmein it's been mentioned longer than that but that's because it's a dependency of Docker.

@ulm0
Copy link

ulm0 commented Jul 6, 2021

Hi, is there any time estimation for this to come live?

@dany74q
Copy link

dany74q commented Jul 19, 2021

I'd just add that it would be great if changing the CRI would be possible from the API for managed node groups, alongside the availability in eksctl which might change it in the ASG launch template 🙏

@mikestef9
Copy link
Contributor

mikestef9 commented Jul 20, 2021

The EKS optimized AMI now contains a bootstrap flag where you can optionally enable containerd as the runtime. See the v1.21 release blog for details as well as EKS documentation

Note that starting with EKS support for Kubernetes v1.22, containerd will become the default, and only available container runtime to use in the EKS optimized Amazon Linux 2 AMI.

Also, keep in mind, Bottlerocket is a production ready AMI optimized for EKS that already ships with containerd by default (and will soon be added as native option to EKS managed node groups #950)

@dany74q
Copy link

dany74q commented Jul 20, 2021

That's terrific ! @mikestef9 - quick q - I think a common use case would be provisioning node groups via terraform/pulumi or alike, where one can't override the launch template / bootstrap args directly (only by supplying a name + version).

Would it be possible to either have a dedicated launch template version for the crd runtime, or somehow integrate this to the creation API ?

@artificial-aidan
Copy link

Will containerd be updated to 1.5.0? This is a crucial feature for a lot of workloads, and the last AMI I pulled was on 1.4.6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EKS Amazon Elastic Kubernetes Service
Projects
None yet
Development

No branches or pull requests