Pull request jobs are flaking with k3s #322

jsturtevant · 2023-09-18T20:53:57Z

Wasmer has failed in e2es in several PR's recently:

#320
#319
#318

Looking at logs it doesn't give a ton of info, looks like the pods are stuck in pending state

fyi @0xE282B0 @dierbei

jprendes · 2023-09-18T21:10:48Z

I'm not sure it's related to wasmer, I think it's rather flakiness from k3s.

#1191: wasmer
#1190: wasmer
#1189: wasmer
#1186: wasmedge
#1185: wasmer
#1184: wasmedge
#1182: wasmedge
#1181: wasmer + wasmedge
#1175: wasmedge

jprendes · 2023-09-18T22:21:57Z

I'm hoping #323 can give us further insight into the issue.

It has already failed once with kind + wasmer, but passed with k3s + wasmer + ubuntu-22.04 in #1195
It has failed once with k3s + wasmer + (ubuntu-20.04 + ubuntu-22.04) in #1197

jsturtevant · 2023-09-18T23:00:44Z

Failed with wasmtime on this round: https://github.com/containerd/runwasi/actions/runs/6228704779/job/16906088261?pr=323

dierbei · 2023-09-18T23:55:17Z

@jsturtevant @jprendes I'll take a look at that.

Right now according to my thoughts it is:

look at the Containerd logs in Pending state
if all are in Pending state, are there any conflicts (e.g. as in unit tests)

I'm going to go ahead and build a k3s environment and go ahead and test it.

Mossaka · 2023-09-19T12:39:19Z

Anyone tried reproducing it locally?

dierbei · 2023-09-19T13:34:15Z

Anyone tried reproducing it locally?

I've been a little busy the last couple of days, I'll give it a try as soon as I can.

dierbei · 2023-09-20T09:14:03Z

Anyone tried reproducing it locally?

Unfortunately, I tested make test-wasmer 13 times without any problems.

My OS is ubuntu 22.04.

I'm continuing to try.

0xE282B0 · 2023-09-20T11:46:27Z

Hi, I noticed that the first shim with sidecar that comes up after installation is stuck. When deleting it, it starts without problems.
Often it is Wasmer because it is the first one that starts but observed it with Lunatic and WasmEdge as well.
Like in this test report: KWasm/kwasm-node-installer#43 (comment)

dierbei · 2023-09-20T12:45:30Z

@jprendes @jsturtevant @Mossaka @0xE282B0 I'm experiencing a Pending status, but right now I'm not quite sure what the problem is.

https://github.com/dierbei/runwasi/actions/runs/6248281600/job/16963026527#step:7:28

0xE282B0 · 2023-09-20T13:27:02Z

Not sure if it is the same problem but in my case I get this error message on the linux container whit kubectl describe pod ...:

Error: failed to create containerd task: 
  failed to start shim: start failed: io.containerd.wasmtime.v1: 
    Other("failed to setup namespaces: 
      Other: could not open network namespace /proc/0/ns/net: No such file or directory (os error 2)")
: exit status 1: unknown

Then the sidecar container is in a restart loop.

dierbei · 2023-09-21T01:23:00Z

I took a closer look at the logs and realized that it seems to be because kubectl apply -f deploy.yaml is executing too fast.

This is because the --all-namespace fetch does not fetch the pods in the k8s kube-system namespace.

What should happen now is that the k8s kube-system pod is not started yet.

sudo bin/k3s kubectl get pods --all-namespaces
NAMESPACE   NAME                        READY   STATUS    RESTARTS   AGE
default     wasi-demo-79d9475fd-p8r7k   0/2     Pending   0          107s
default     wasi-demo-79d9475fd-p7sqz   0/2     Pending   0          107s
default     wasi-demo-79d9475fd-c848k   0/2     Pending   0          107s

jprendes · 2023-09-21T23:30:12Z

I took a closer look at the logs and realized that it seems to be because kubectl apply -f deploy.yaml is executing too fast.

I think you are right.
I've added some mitigation here, which runs k3s test for all runtimes:

That is compared to before, where I didn't manage to get a single clean run.

Now, if we check the failed run, it failed because the kube-system pods never came up after 1 minute, even before involving the shims, so it's not a problem with the shims.

jsturtevant · 2023-10-03T21:23:19Z

some additional logs and investigations here: #346 (comment)

jsturtevant · 2023-10-04T00:17:05Z

I've opened containerd/rust-extensions#210 since our logs don't have timing info in them

jsturtevant · 2023-10-06T18:50:00Z

besides #347 seeing the following

cp: cannot stat '/var/lib/rancher/k3s/agent/etc/containerd/config.toml': No such file or directory

jprendes · 2023-10-06T18:58:57Z

I think the best in that case, as wheel as when k3s's kube-system pods fall to star, is to uninstall k3s and install it again.

That looks like a corrupted k3s installation.

jprendes · 2023-10-27T14:30:14Z

This has been fixed by #353

jsturtevant changed the title ~~Wasmer is flaking in Pull request jobs~~ Pull request jobs are flaking Sep 18, 2023

jsturtevant changed the title ~~Pull request jobs are flaking~~ Pull request jobs are flaking with k3s Sep 18, 2023

Mossaka mentioned this issue Sep 19, 2023

CI failed at wasmedge shim's integration tests #152

Closed

jsturtevant mentioned this issue Oct 3, 2023

Upload k3s/kind logs on failure #346

Merged

jsturtevant mentioned this issue Oct 4, 2023

Sporadic failures in E2E CI #347

Closed

jsturtevant mentioned this issue Oct 6, 2023

Handle Wasm Images with wasm media types #147

Merged

jprendes mentioned this issue Oct 11, 2023

Retry bootstrapping k3s on failure #353

Merged

jprendes closed this as completed Oct 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pull request jobs are flaking with k3s #322

Pull request jobs are flaking with k3s #322

jsturtevant commented Sep 18, 2023

jprendes commented Sep 18, 2023 •

edited

Loading

jprendes commented Sep 18, 2023 •

edited

Loading

jsturtevant commented Sep 18, 2023

dierbei commented Sep 18, 2023 •

edited

Loading

Mossaka commented Sep 19, 2023

dierbei commented Sep 19, 2023

dierbei commented Sep 20, 2023 •

edited

Loading

0xE282B0 commented Sep 20, 2023

dierbei commented Sep 20, 2023 •

edited

Loading

0xE282B0 commented Sep 20, 2023

dierbei commented Sep 21, 2023 •

edited

Loading

jprendes commented Sep 21, 2023

jsturtevant commented Oct 3, 2023 •

edited

Loading

jsturtevant commented Oct 4, 2023

jsturtevant commented Oct 6, 2023 •

edited

Loading

jprendes commented Oct 6, 2023 •

edited

Loading

jprendes commented Oct 27, 2023

Pull request jobs are flaking with k3s #322

Pull request jobs are flaking with k3s #322

Comments

jsturtevant commented Sep 18, 2023

jprendes commented Sep 18, 2023 • edited Loading

jprendes commented Sep 18, 2023 • edited Loading

jsturtevant commented Sep 18, 2023

dierbei commented Sep 18, 2023 • edited Loading

Mossaka commented Sep 19, 2023

dierbei commented Sep 19, 2023

dierbei commented Sep 20, 2023 • edited Loading

0xE282B0 commented Sep 20, 2023

dierbei commented Sep 20, 2023 • edited Loading

0xE282B0 commented Sep 20, 2023

dierbei commented Sep 21, 2023 • edited Loading

jprendes commented Sep 21, 2023

jsturtevant commented Oct 3, 2023 • edited Loading

jsturtevant commented Oct 4, 2023

jsturtevant commented Oct 6, 2023 • edited Loading

jprendes commented Oct 6, 2023 • edited Loading

jprendes commented Oct 27, 2023

jprendes commented Sep 18, 2023 •

edited

Loading

jprendes commented Sep 18, 2023 •

edited

Loading

dierbei commented Sep 18, 2023 •

edited

Loading

dierbei commented Sep 20, 2023 •

edited

Loading

dierbei commented Sep 20, 2023 •

edited

Loading

dierbei commented Sep 21, 2023 •

edited

Loading

jsturtevant commented Oct 3, 2023 •

edited

Loading

jsturtevant commented Oct 6, 2023 •

edited

Loading

jprendes commented Oct 6, 2023 •

edited

Loading