Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k3s --disable-agent flag never starts kube-scheduler in newer k3s versions #5118

Closed
FabianKramm opened this issue Feb 12, 2022 · 11 comments
Closed
Assignees
Milestone

Comments

@FabianKramm
Copy link

FabianKramm commented Feb 12, 2022

Environmental Info:
K3s Version:

k3s version v1.22.6+k3s1 (https://github.com/k3s-io/k3s/commit/3228d9cb9a4727d48f60de4f1ab472f7c50df904)
go version go1.16.10

Node(s) CPU architecture, OS, and Version:

Linux test-0 5.10.76-linuxkit #1 SMP Mon Nov 8 10:21:19 UTC 2021 x86_64 GNU/Linux

Cluster Configuration:

container k3s, single server, no agents

Describe the bug:
Hello! Thanks again a lot for the great project! This is a problem related to using --disable-agent with k3s.

The PR #4345 changed that the kube scheduler is only started if the nodeConfig in the embedded executer is set, which never happens, because the agent is never started, which in turn leads to kube-scheduler never starting.

You can see the problematic code at:

// wait for Bootstrap to set nodeConfig
for e.nodeConfig == nil {
runtime.Gosched()
}
// If we're running the embedded cloud controller, wait for it to untaint at least one
// node (usually, the local node) before starting the scheduler to ensure that it
// finds a node that is ready to run pods during its initial scheduling loop.
if !e.nodeConfig.AgentConfig.DisableCCM {
if err := waitForUntaintedNode(ctx, e.nodeConfig.AgentConfig.KubeConfigKubelet); err != nil {
logrus.Fatalf("failed to wait for untained node: %v", err)
}
}

And the agent bootstrap that is skipped at:

if cfg.DisableAgent {
close(agentReady)
<-ctx.Done()
return nil
}

I know --disable-agent its an unsupported flag, but since we are relying on it for correct functionality in vcluster, I hope you could consider fixing this as it has worked before and it would be in my opinion a minimal non-invasive change to k3s. If you decide to go forward with this change, I'm willing to submit a PR that fixes this as well.

Steps To Reproduce:
Specify the --disable-agent flag and notice kube-scheduler is never getting started

Expected behavior:
Kube scheduler starting up if --disable-agent is set to true.

Actual behavior:
Kube scheduler never starting up

@brandond
Copy link
Member

Hmm, in your vcluster use case you're using the default scheduler, but do not ever have any nodes? How does that work exactly - wouldn't that leave the scheduler without any nodes to schedule to?

@FabianKramm
Copy link
Author

FabianKramm commented Feb 13, 2022

@brandond thanks for the reply! Currently we use the scheduler of the underlying host cluster to decide where a pod should be scheduled on and then sync back the node into the virtual k3s cluster.

In an effort to allow users to taint and label nodes within the virtual cluster and move vcluster closer to the behaviour of a real Kubernetes cluster on the scheduling features, we actually want to enable the scheduler inside the virtual k3s cluster, let it decide on which node a pod should be scheduled and then create the pod in the underlying host cluster bound to the scheduled node already.

This works because we sync the nodes from the host cluster into the virtual one by creating the node objects in there without actually installing a separate kubelet or kube proxy on them, which is why we don't need the agent of k3s at all. Rather we only need the control plane part (kube api server, storage, controller manager and scheduler) which is virtualized completely in vcluster and works like in a normal Kubernetes cluster, while the workloads will then be executed on the host cluster nodes where we create pods in the host cluster that map to pods in the virtual cluster.

@brandond
Copy link
Member

This works because we sync the nodes from the host cluster into the virtual one by creating the node objects in there without actually installing a separate kubelet or kube proxy on them

In that case, I'm not sure we need to change anything - kube-scheduler (in its current state) should start up as soon as an untainted node is sync'd into the virtual K3s cluster.

@FabianKramm
Copy link
Author

FabianKramm commented Feb 14, 2022

@brandond but this condition will never be true if you use --disable-agent as the agent is never started and therefore this will never be non nil:

// wait for Bootstrap to set nodeConfig
for e.nodeConfig == nil {
runtime.Gosched()
}

@brandond
Copy link
Member

Ahh, I see. Sorry, I'd missed that part; I thought it was just waiting on a node to show up.

@brandond
Copy link
Member

See if that PR fixes it for you?

@FabianKramm
Copy link
Author

FabianKramm commented Feb 15, 2022

@brandond thanks a lot for the quick PR! It works for me without running k3s in an unprivileged docker container, but if I run k3s inside a container I get the following errors:

INFO[0091] Waiting to retrieve agent configuration; server is not ready: "overlayfs" snapshotter cannot be enabled for "/data/agent/containerd", try using "fuse-overlayfs" or "native": failed to mount overlay: operation not permitted

Seems like the problem is this part here that is now executed to retrieve the agent node config:

if !nodeConfig.Docker && nodeConfig.ContainerRuntimeEndpoint == "" {
switch nodeConfig.AgentConfig.Snapshotter {
case "overlayfs":
if err := containerd.OverlaySupported(nodeConfig.Containerd.Root); err != nil {
return nil, errors.Wrapf(err, "\"overlayfs\" snapshotter cannot be enabled for %q, try using \"fuse-overlayfs\" or \"native\"",
nodeConfig.Containerd.Root)
}
case "fuse-overlayfs":
if err := containerd.FuseoverlayfsSupported(nodeConfig.Containerd.Root); err != nil {
return nil, errors.Wrapf(err, "\"fuse-overlayfs\" snapshotter cannot be enabled for %q, try using \"native\"",
nodeConfig.Containerd.Root)
}
case "stargz":
if err := containerd.StargzSupported(nodeConfig.Containerd.Root); err != nil {
return nil, errors.Wrapf(err, "\"stargz\" snapshotter cannot be enabled for %q, try using \"overlayfs\" or \"native\"",
nodeConfig.Containerd.Root)
}
nodeConfig.AgentConfig.ImageServiceSocket = "/run/containerd-stargz-grpc/containerd-stargz-grpc.sock"
}
}

I'm no expert here, but wouldn't it be much easier to use the kube scheduler kube config here instead of initializing the whole agent config? Or is the node kube config required here?

I though about something like this:

func (e *Embedded) Scheduler(ctx context.Context, disableCCM bool, apiReady <-chan struct{}, args []string) error {
	command := sapp.NewSchedulerCommand()
	command.SetArgs(args)

	go func() {
		<-apiReady
		// If we're running the embedded cloud controller, wait for it to untaint at least one
		// node (usually, the local node) before starting the scheduler to ensure that it
		// finds a node that is ready to run pods during its initial scheduling loop.
		if !disableCCM {
			kubeconfig := ""
			for _, arg := range args {
				if strings.HasPrefix(arg, "--kubeconfig") {
					kubeconfig = arg[len("--kubeconfig"):] 
					break
				}
			}
			if kubeconfig != "" {
				if err := waitForUntaintedNode(ctx, kubeconfig); err != nil {
					logrus.Fatalf("failed to wait for untained node: %v", err)
				}
			}
		}
		defer func() {
			if err := recover(); err != nil {
				logrus.Fatalf("scheduler panic: %v", err)
			}
		}()
		logrus.Fatalf("scheduler exited: %v", command.ExecuteContext(ctx))
	}()

	return nil
}

@brandond
Copy link
Member

I was hoping to avoid having to pass that in explicitly since the nodeconfig already has all the various bits of information we need filled in properly, if we properly bootstrap the executor before using it.

@brandond
Copy link
Member

brandond commented Feb 15, 2022

Should be sorted now; I am able to run the server in an unprivileged container. Even rootless should work, if you give it a writable path for $HOME:

docker run --rm -it --user 1000:1000 -e HOME=/tmp/k3s rancher/k3s server --disable-agent --token=token --rootless

@FabianKramm
Copy link
Author

@brandond just verified it and it works perfectly now, thanks so much for the quick fix!

@ShylajaDevadiga
Copy link
Contributor

Validated on v1.23.5-rc1+k3s1
Installed node1 passing --disable-agent

$ kubectl get pods -A
NAMESPACE     NAME                                      READY   STATUS    RESTARTS   AGE
kube-system   helm-install-traefik-crd-xjkjh            0/1     Pending   0          7m17s
kube-system   helm-install-traefik-4nrvz                0/1     Pending   0          7m17s
kube-system   local-path-provisioner-6c79684f77-c97zn   0/1     Pending   0          7m17s
kube-system   metrics-server-7cd5fcb6b7-gcjdr           0/1     Pending   0          7m17s
kube-system   coredns-d76bd69b-p86fc                    0/1     Pending   0          7m17s

Joined an agent node

$ kubectl get nodes
NAME               STATUS   ROLES    AGE     VERSION
ip-172-31-15-177   Ready    <none>   6m44s   v1.23.5-rc1+k3s1
$ kubectl get pods -A
NAMESPACE     NAME                                      READY   STATUS      RESTARTS   AGE
kube-system   coredns-d76bd69b-p86fc                    1/1     Running     0          12m
kube-system   local-path-provisioner-6c79684f77-c97zn   1/1     Running     0          12m
kube-system   helm-install-traefik-crd-xjkjh            0/1     Completed   0          12m
kube-system   helm-install-traefik-4nrvz                0/1     Completed   1          12m
kube-system   svclb-traefik-fxbth                       2/2     Running     0          4m13s
kube-system   metrics-server-7cd5fcb6b7-gcjdr           1/1     Running     0          12m
kube-system   traefik-58b759688b-zjx4j                  1/1     Running     0          4m13s

Metrics server fails to fetch metrics in the above setup as shared in the issue #5330

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants