Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error status for first pod on k3s 1.0.1 #2064

Closed
4 tasks done
bertbesser opened this issue Jan 25, 2020 · 7 comments
Closed
4 tasks done

Error status for first pod on k3s 1.0.1 #2064

bertbesser opened this issue Jan 25, 2020 · 7 comments
Labels

Comments

@bertbesser
Copy link

Checklist:

  • I've included the version.
  • I've included reproduction steps.
  • I've included the workflow YAML.
  • I've included the logs.

What happened:
A fresh install of argo (like given in quick start) and submit of the hello world work flow produces an error status with the message.

This is the work flow yaml (copied from git repo):

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: hello-world-
spec:
entrypoint: whalesay
templates:

  • name: whalesay
    container:
    image: docker/whalesay:latest
    command: [cowsay]
    args: ["hello world"]

What you expected to happen:
I expected a success message to appear.

How to reproduce it (as minimally and precisely as possible):

  • Installe argo 2.4.3 like given in the quick start.
  • Into a k3s cluster version 1.0.1.
  • Installed the 2.4.3 argo command line client.
  • Submit the hello world workflow.
  • The error message appears e.g. by using --watch on the submit or by a subsequent argo get.

Anything else we need to know?:
kubectl logs <failedpodname> -c init fails, see below for error message.

The container ran successfully:
kubectl logs <failedpodname> -c main


< hello world >

\
 \
  \     
                ##        .            
          ## ## ##       ==            
       ## ## ## ##      ===            
   /""""""""""""""""___/ ===        
     \______ o          __/            
      \    \        __/             
        \____\______/   

Other example workflows, like e.g. steps or dag-diamond, fail after the first step.

Environment:

  • Argo version:
$ argo version

argo: v2.4.3
BuildDate: 2019-12-06T03:36:01Z
GitCommit: cfe5f37
GitTreeState: clean
GitTag: v2.4.3
GoVersion: go1.11.5
Compiler: gc
Platform: linux/amd64

  • Kubernetes version :
$ kubectl version -o yaml

clientVersion:
buildDate: "2019-11-18T18:31:23Z"
compiler: gc
gitCommit: e7e6a3c4e9a7d80b87793612730d10a863a25980
gitTreeState: clean
gitVersion: v1.16.3-k3s.2
goVersion: go1.13.4
major: "1"
minor: "16"
platform: linux/amd64
serverVersion:
buildDate: "2019-11-18T18:31:23Z"
compiler: gc
gitCommit: e7e6a3c4e9a7d80b87793612730d10a863a25980
gitTreeState: clean
gitVersion: v1.16.3-k3s.2
goVersion: go1.13.4
major: "1"
minor: "16"
platform: linux/amd64

Other debugging information (if applicable):

  • workflow result:
argo get <workflowname>

Name: hello-world-cfzz7
Namespace: default
ServiceAccount: default
Status: Error
Message: failed to save outputs: Error response from daemon: No such container: 42df0c4f40893d10a04c7d92afa56bf8ac930ceed7a12227444ac932666805ac
Created: Sat Jan 25 17:21:56 +0100 (3 minutes ago)
Started: Sat Jan 25 17:21:56 +0100 (3 minutes ago)
Finished: Sat Jan 25 17:22:03 +0100 (3 minutes ago)
Duration: 7 seconds

STEP PODNAME DURATION MESSAGE
⚠ hello-world-cfzz7 (whalesay) hello-world-cfzz7 7s failed to save outputs: Error response from daemon: No such container: 42df0c4f40893d10a04c7d92afa56bf8ac930ceed7a12227444ac932666805ac

  • executor logs:
kubectl logs <failedpodname> -c init
kubectl logs <failedpodname> -c wait

init:

Error from server (BadRequest): container init is not valid for pod hello-world-cfzz7

wait:

time="2020-01-25T16:21:59Z" level=info msg="Creating a docker executor"
time="2020-01-25T16:21:59Z" level=info msg="Executor (version: v2.4.3, build_date: 2019-12-06T03:35:39Z) initialized (pod: default/hello-world-cfzz7) with template:\n{"name":"whalesay","arguments":{},"inputs":{},"outputs":{},"metadata":{},"container":{"name":"","image":"docker/whalesay:latest","command":["cowsay"],"args":["hello world"],"resources":{}}}"
time="2020-01-25T16:21:59Z" level=info msg="Waiting on main container"
time="2020-01-25T16:22:02Z" level=info msg="main container started with container ID: 42df0c4f40893d10a04c7d92afa56bf8ac930ceed7a12227444ac932666805ac"
time="2020-01-25T16:22:02Z" level=info msg="Starting annotations monitor"
time="2020-01-25T16:22:02Z" level=info msg="Starting deadline monitor"
time="2020-01-25T16:22:02Z" level=info msg="docker wait 42df0c4f40893d10a04c7d92afa56bf8ac930ceed7a12227444ac932666805ac"
time="2020-01-25T16:22:03Z" level=error msg="docker wait 42df0c4f40893d10a04c7d92afa56bf8ac930ceed7a12227444ac932666805ac failed: Error response from daemon: No such container: 42df0c4f40893d10a04c7d92afa56bf8ac930ceed7a12227444ac932666805ac\n"
time="2020-01-25T16:22:03Z" level=warning msg="Failed to wait for container id '42df0c4f40893d10a04c7d92afa56bf8ac930ceed7a12227444ac932666805ac': Error response from daemon: No such container: 42df0c4f40893d10a04c7d92afa56bf8ac930ceed7a12227444ac932666805ac"
time="2020-01-25T16:22:03Z" level=error msg="executor error: Error response from daemon: No such container: 42df0c4f40893d10a04c7d92afa56bf8ac930ceed7a12227444ac932666805ac\ngithub.com/argoproj/argo/errors.New\n\t/go/src/github.com/argoproj/argo/errors/errors.go:49\ngithub.com/argoproj/argo/errors.InternalError\n\t/go/src/github.com/argoproj/argo/errors/errors.go:60\ngithub.com/argoproj/argo/workflow/common.RunCommand\n\t/go/src/github.com/argoproj/argo/workflow/common/util.go:406\ngithub.com/argoproj/argo/workflow/executor/docker.(*DockerExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/docker/docker.go:95\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).Wait.func1\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:892\nk8s.io/apimachinery/pkg/util/wait.ExponentialBackoff\n\t/go/src/k8s.io/apimachinery/pkg/util/wait/wait.go:265\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:891\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:40\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/src/github.com/spf13/cobra/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/github.com/spf13/cobra/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/src/github.com/spf13/cobra/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:201\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1333"
time="2020-01-25T16:22:03Z" level=info msg="No output parameters"
time="2020-01-25T16:22:03Z" level=info msg="No output artifacts"
time="2020-01-25T16:22:03Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2020-01-25T16:22:03Z" level=info msg="Killing sidecars"
time="2020-01-25T16:22:03Z" level=info msg="Annotations monitor stopped"
time="2020-01-25T16:22:03Z" level=info msg="Alloc=4852 TotalAlloc=11343 Sys=70078 NumGC=4 Goroutines=12"
time="2020-01-25T16:22:03Z" level=fatal msg="Error response from daemon: No such container: 42df0c4f40893d10a04c7d92afa56bf8ac930ceed7a12227444ac932666805ac\ngithub.com/argoproj/argo/errors.New\n\t/go/src/github.com/argoproj/argo/errors/errors.go:49\ngithub.com/argoproj/argo/errors.InternalError\n\t/go/src/github.com/argoproj/argo/errors/errors.go:60\ngithub.com/argoproj/argo/workflow/common.RunCommand\n\t/go/src/github.com/argoproj/argo/workflow/common/util.go:406\ngithub.com/argoproj/argo/workflow/executor/docker.(*DockerExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/docker/docker.go:95\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).Wait.func1\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:892\nk8s.io/apimachinery/pkg/util/wait.ExponentialBackoff\n\t/go/src/k8s.io/apimachinery/pkg/util/wait/wait.go:265\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:891\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:40\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/src/github.com/spf13/cobra/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/github.com/spf13/cobra/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/src/github.com/spf13/cobra/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:201\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1333"

  • workflow-controller logs:
kubectl logs -n argo $(kubectl get pods -l app=workflow-controller -n argo -o name)

time="2020-01-25T16:18:25Z" level=warning msg="ConfigMap 'workflow-controller-configmap' does not have key 'config'"
time="2020-01-25T16:18:25Z" level=info msg="Starting workflow TTL controller (resync 20m0s)"
time="2020-01-25T16:18:25Z" level=info msg="Workflow Controller (version: v2.4.3) starting"
time="2020-01-25T16:18:25Z" level=info msg="Workers: workflow: 8, pod: 8"
time="2020-01-25T16:18:25Z" level=info msg="Watch Workflow controller config map updates"
time="2020-01-25T16:18:25Z" level=info msg="Detected ConfigMap update. Updating the controller config."
time="2020-01-25T16:18:25Z" level=warning msg="ConfigMap 'workflow-controller-configmap' does not have key 'config'"
time="2020-01-25T16:18:25Z" level=info msg="Started workflow TTL worker"
time="2020-01-25T16:21:56Z" level=info msg="Processing workflow" namespace=default workflow=hello-world-cfzz7
time="2020-01-25T16:21:56Z" level=info msg="Updated phase -> Running" namespace=default workflow=hello-world-cfzz7
time="2020-01-25T16:21:56Z" level=info msg="Pod node hello-world-cfzz7 (hello-world-cfzz7) initialized Pending" namespace=default workflow=hello-world-cfzz7
time="2020-01-25T16:21:56Z" level=info msg="Created pod: hello-world-cfzz7 (hello-world-cfzz7)" namespace=default workflow=hello-world-cfzz7
time="2020-01-25T16:21:56Z" level=info msg="Workflow update successful" namespace=default workflow=hello-world-cfzz7
time="2020-01-25T16:21:57Z" level=info msg="Processing workflow" namespace=default workflow=hello-world-cfzz7
time="2020-01-25T16:21:57Z" level=info msg="Updating node hello-world-cfzz7 (hello-world-cfzz7) message: ContainerCreating"
time="2020-01-25T16:21:57Z" level=info msg="Skipped pod hello-world-cfzz7 (hello-world-cfzz7) creation: already exists" namespace=default workflow=hello-world-cfzz7
time="2020-01-25T16:21:57Z" level=info msg="Workflow update successful" namespace=default workflow=hello-world-cfzz7
time="2020-01-25T16:21:58Z" level=info msg="Processing workflow" namespace=default workflow=hello-world-cfzz7
time="2020-01-25T16:21:58Z" level=info msg="Skipped pod hello-world-cfzz7 (hello-world-cfzz7) creation: already exists" namespace=default workflow=hello-world-cfzz7
time="2020-01-25T16:22:02Z" level=info msg="Processing workflow" namespace=default workflow=hello-world-cfzz7
time="2020-01-25T16:22:02Z" level=info msg="Updating node hello-world-cfzz7 (hello-world-cfzz7) status Pending -> Running"
time="2020-01-25T16:22:02Z" level=info msg="Skipped pod hello-world-cfzz7 (hello-world-cfzz7) creation: already exists" namespace=default workflow=hello-world-cfzz7
time="2020-01-25T16:22:02Z" level=info msg="Workflow update successful" namespace=default workflow=hello-world-cfzz7
time="2020-01-25T16:22:03Z" level=info msg="Processing workflow" namespace=default workflow=hello-world-cfzz7
time="2020-01-25T16:22:03Z" level=info msg="Updating node hello-world-cfzz7 (hello-world-cfzz7) status Running -> Error"
time="2020-01-25T16:22:03Z" level=info msg="Updating node hello-world-cfzz7 (hello-world-cfzz7) message: failed to save outputs: Error response from daemon: No such container: 42df0c4f40893d10a04c7d92afa56bf8ac930ceed7a12227444ac932666805ac"
time="2020-01-25T16:22:03Z" level=info msg="Updated phase Running -> Error" namespace=default workflow=hello-world-cfzz7
time="2020-01-25T16:22:03Z" level=info msg="Updated message -> failed to save outputs: Error response from daemon: No such container: 42df0c4f40893d10a04c7d92afa56bf8ac930ceed7a12227444ac932666805ac" namespace=default workflow=hello-world-cfzz7
time="2020-01-25T16:22:03Z" level=info msg="Marking workflow completed" namespace=default workflow=hello-world-cfzz7
time="2020-01-25T16:22:03Z" level=info msg="Checking daemoned children of " namespace=default workflow=hello-world-cfzz7
time="2020-01-25T16:22:03Z" level=info msg="Workflow update successful" namespace=default workflow=hello-world-cfzz7
time="2020-01-25T16:22:04Z" level=info msg="Labeled pod default/hello-world-cfzz7 completed"
time="2020-01-25T16:23:25Z" level=info msg="Alloc=6519 TotalAlloc=17690 Sys=70078 NumGC=7 Goroutines=80"
time="2020-01-25T16:28:25Z" level=info msg="Alloc=4398 TotalAlloc=17779 Sys=70078 NumGC=10 Goroutines=80"
time="2020-01-25T16:33:25Z" level=info msg="Alloc=4398 TotalAlloc=17817 Sys=70078 NumGC=12 Goroutines=80"

Logs

argo get <workflowname>
kubectl logs <failedpodname> -c init
kubectl logs <failedpodname> -c wait
kubectl logs -n argo $(kubectl get pods -l app=workflow-controller -n argo -o name)

see above


Message from the maintainers:

If you are impacted by this bug please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

@bertbesser bertbesser changed the title Failing on k3s 1.0.1 Error status for each pod on k3s 1.0.1 Jan 25, 2020
@bertbesser bertbesser changed the title Error status for each pod on k3s 1.0.1 Error status for first pod on k3s 1.0.1 Jan 25, 2020
@alexec
Copy link
Contributor

alexec commented Jan 25, 2020

Can you try and “pns” executor with k3s?

@bertbesser
Copy link
Author

bertbesser commented Jan 26, 2020

Worked. Thank you, sir!

Since I'm new to argo, could you give brief motivation for why there's a need to have different executors? (Imo, this info would improve the documentation in https://github.com/argoproj/argo/blob/master/docs/workflow-executors.md.)

Also, since pns is declared immature and I do not have tight performance requirements, should I use k8sapi instead?

Thanks again.

PS: For anybody in my situation:

KUBE_EDITOR=nano kubectl edit cm workflow-controller-configmap -n argo
# append the following yaml code (at the top level of the yaml tree, see )
# data:
#   config: |
#     containerRuntimeExecutor: pns

Further info:

@alexec
Copy link
Contributor

alexec commented Jan 26, 2020

Each executor gives a trade off, with Docker being the most widely supported. I'd note you can run K3S on Docker.

@bertbesser
Copy link
Author

bertbesser commented Jan 31, 2020

Hi,

after upgrading to argo 2.5rc6 via

k -n argo apply -f https://raw.githubusercontent.com/argoproj/argo/release-2.5/manifests/install.yaml

I get the following error

argo submit https://raw.githubusercontent.com/argoproj/argo/master/examples/steps.yaml
argo list
...
argo watch steps-jxzn6
Name:                steps-jxzn6
Namespace:           default
ServiceAccount:      default
Status:              Failed
Message:             child 'steps-jxzn6-1801548666' failed
Created:             Fri Jan 31 15:52:17 +0100 (2 minutes ago)
Started:             Fri Jan 31 15:52:17 +0100 (2 minutes ago)
Finished:            Fri Jan 31 15:52:22 +0100 (2 minutes ago)
Duration:            5 seconds

STEP                                PODNAME                 DURATION  MESSAGE
 ✖ steps-jxzn6 (hello-hello-hello)                                    child 'steps-jxzn6-1801548666' failed
 └---⚠ hello1 (whalesay)            steps-jxzn6-1801548666  5s        failed to save outputs: Failed to establish pod watch: unknown (get pods)

I.e., the workflow fails after the first of several steps. Even though logs of the container main show the expected whale.

The error message failed to save outputs: Failed to establish pod watch: unknown (get pods) occurs for all executors (docker, kubelet, k8sapi, and pns).

Regards.

@alexec
Copy link
Contributor

alexec commented Jan 31, 2020

Thank you for a good detailed comment.

Failing to establish watches maybe caused by the controller not having permission, can you check the getting started guide for this?

https://github.com/argoproj/argo/blob/master/docs/getting-started.md

You might want to try out the quick-start-*.yaml manifests if you're experimenting. This include all the roles and permissions and are what we use for our testing:

https://github.com/argoproj/argo/tree/release-2.5/manifests

@bertbesser
Copy link
Author

bertbesser commented Feb 1, 2020

Thanks for the snappy response!

As you indicated, the default:default SA did not have sufficient permissions. I ran

k apply -f https://raw.githubusercontent.com/argoproj/argo/release-2.5/manifests/quick-start/base/workflow-role.yaml
k apply -f https://raw.githubusercontent.com/argoproj/argo/release-2.5/manifests/quick-start/base/workflow-default-rolebinding.yaml

and (multi-step) workflows succeed :-)

Thank you!

@ChaosInTheCRD
Copy link
Contributor

I just experienced the same issue on GKE v1.19.7-gke.1500 - Not sure why it fixes it but setting the executor to be pns did the trick. I thought that Docker was the executor by default, should that not have worked fine? Maybe someone has some views/advice on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants