Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PNS Executor input/output artifacts "Failed to determine pid for containerID" #4230

Closed
iamcnx opened this issue Oct 7, 2020 · 10 comments
Closed

Comments

@iamcnx
Copy link

iamcnx commented Oct 7, 2020

Summary

I'm using the PNS executor in order to execute a workflow that only copies a file from input to output bucket in minio.
The Workflow copies the file correctly into the output bucket but the wait container fails with the following error:

Failed to determine pid for containerID xxxxxxxxxxx: container may have exited too quickly

Even the workaround with sleep and an emptyDir volume doesn't seems to work in this case.
I added a sleep of 60 seconds for the container and workdir as an emptyDir

Environment

Argo: v2.11.1
Kubernetes: v1.11.0+d4cacc0
Openshift: v3.11.0+1c3e643-87

Workflow

apiVersion: argoproj.io/v1alpha1
kind: Sensor
metadata:
  finalizers:
  - sensor-controller
  generation: 1
  name: minio
spec:
  dependencies:
  - eventName: input
    eventSourceName: minio
    name: dep-minio
  template:
    metadata: {}
    serviceAccountName: argo-events-sa
  triggers:
  - template:
      name: minio-workflow-trigger
      k8s:
        version: v1alpha1
        group: argoproj.io
        operation: create
        parameters:
        - dest: spec.templates.1.inputs.artifacts.0.s3.key
          src:
            dataKey: notification.0.s3.object.key
            dependencyName: dep-minio
        - dest: spec.templates.1.outputs.artifacts.0.s3.key
          src:
            dataKey: notification.0.s3.object.key
            dependencyName: dep-minio
        resource: workflows
        source:
          resource:
            apiVersion: argoproj.io/v1alpha1
            kind: Workflow
            metadata:
              generateName: artifact-workflow-
              namespace: argo
            spec:
              entrypoint: file-move
              serviceAccountName: argo-workflow
              shareProcessNamespace: true
              volumes:
              - emptyDir: {}
                name: workdir
              templates:
              - name: file-move
                steps:
                - - name: input-output
                    template: input-output
              - container:
                  args:
                  - sleep 60; cowsay Moved file to output
                  command:
                  - sh
                  - -c
                  image: docker/whalesay:latest
                  name: whalesay
                  volumeMounts:
                  - mountPath: /files
                    name: workdir
                inputs:
                  artifacts:
                  - name: get-artifact
                    path: /files/artifact.file
                    s3:
                      accessKeySecret:
                        key: accesskey
                        name: minio
                      bucket: input
                      endpoint: minio:9000
                      insecure: true
                      key: THIS_WILL_BE_REPLACED
                      secretKeySecret:
                        key: secretkey
                        name: minio
                name: input-output
                outputs:
                  artifacts:
                  - name: save-artifact
                    path: /files/artifact.file
                    s3:
                      accessKeySecret:
                        key: accesskey
                        name: minio
                      bucket: output
                      endpoint: minio:9000
                      insecure: true
                      key: THIS_WILL_BE_REPLACED
                      secretKeySecret:
                        key: secretkey
                        name: minio

Logs

Pod wait container

oc logs artifact-workflow-wp2fd-788754573 -c wait
time="2020-10-07T12:17:45.341Z" level=info msg="Starting Workflow Executor" version=v2.11.1
time="2020-10-07T12:17:45.346Z" level=info msg="Creating PNS executor (namespace: argo, pod: artifact-workflow-wp2fd-788754573, pid: 29, hasOutputs: true)"
time="2020-10-07T12:17:45.346Z" level=info msg="Executor (version: v2.11.1, build_date: 2020-09-29T17:32:49Z) initialized (pod: argo/artifact-workflow-wp2fd-788754573) with template:\n{\"name\":\"input-output\",\"arguments\":{},\"inputs\":{\"artifacts\":[{\"name\":\"get-artifact\",\"path\":\"/files/artifact.file\",\"s3\":{\"endpoint\":\"minio:9000\",\"bucket\":\"input\",\"insecure\":true,\"accessKeySecret\":{\"name\":\"minio\",\"key\":\"accesskey\"},\"secretKeySecret\":{\"name\":\"minio\",\"key\":\"secretkey\"},\"key\":\"testfile\"}}]},\"outputs\":{\"artifacts\":[{\"name\":\"save-artifact\",\"path\":\"/files/artifact.file\",\"s3\":{\"endpoint\":\"minio:9000\",\"bucket\":\"output\",\"insecure\":true,\"accessKeySecret\":{\"name\":\"minio\",\"key\":\"accesskey\"},\"secretKeySecret\":{\"name\":\"minio\",\"key\":\"secretkey\"},\"key\":\"testfile\"}}]},\"metadata\":{},\"container\":{\"name\":\"whalesay\",\"image\":\"docker/whalesay:latest\",\"command\":[\"sh\",\"-c\"],\"args\":[\"sleep 60; cowsay Moved file to output\"],\"resources\":{},\"volumeMounts\":[{\"name\":\"workdir\",\"mountPath\":\"/files\"}]}}"
time="2020-10-07T12:17:45.346Z" level=info msg="Waiting on main container"
time="2020-10-07T12:17:45.347Z" level=warning msg="Polling root processes (1m0s)"
time="2020-10-07T12:17:47.842Z" level=info msg="pid 44: &{root 224 2147484013 {87002644 63683956716 0x308cd00} {64768 64 17 16749 0 0 0 0 224 4096 0 {1601987853 692010637} {1548359916 87002644} {1548359916 87002644} [0 0 0]}}"
time="2020-10-07T12:17:47.842Z" level=info msg="Secured filehandle on /proc/44/root"
time="2020-10-07T12:17:47.842Z" level=info msg="containerID docker-ca132c75cb328d11851cb4d817c2534c2a34aa4ac742bb3ffed56fd5d1bb6704 mapped to pid 44"
time="2020-10-07T12:17:47.893Z" level=info msg="pid 44: &{root 30 2147484141 {827879797 63737669867 0x308cd00} {1048687 302223069 1 16877 0 0 0 0 30 4096 0 {1602073067 473870237} {1602073067 827879797} {1602073067 829879851} [0 0 0]}}"
time="2020-10-07T12:17:47.893Z" level=info msg="Secured filehandle on /proc/44/root"
time="2020-10-07T12:17:47.944Z" level=info msg="pid 44: &{root 30 2147484141 {827879797 63737669867 0x308cd00} {1048687 302223069 1 16877 0 0 0 0 30 4096 0 {1602073067 941882875} {1602073067 827879797} {1602073067 829879851} [0 0 0]}}"
time="2020-10-07T12:17:47.995Z" level=info msg="pid 44: &{root 30 2147484141 {827879797 63737669867 0x308cd00} {1048687 302223069 1 16877 0 0 0 0 30 4096 0 {1602073067 941882875} {1602073067 827879797} {1602073067 829879851} [0 0 0]}}"
time="2020-10-07T12:17:48.046Z" level=info msg="pid 44: &{root 30 2147484141 {827879797 63737669867 0x308cd00} {1048687 302223069 1 16877 0 0 0 0 30 4096 0 {1602073067 941882875} {1602073067 827879797} {1602073067 829879851} [0 0 0]}}"
time="2020-10-07T12:17:48.097Z" level=info msg="pid 44: &{root 30 2147484141 {827879797 63737669867 0x308cd00} {1048687 302223069 1 16877 0 0 0 0 30 4096 0 {1602073067 941882875} {1602073067 827879797} {1602073067 829879851} [0 0 0]}}"
time="2020-10-07T12:17:48.111Z" level=info msg="main container started with container ID: ca132c75cb328d11851cb4d817c2534c2a34aa4ac742bb3ffed56fd5d1bb6704"
time="2020-10-07T12:17:48.111Z" level=info msg="Starting annotations monitor"
time="2020-10-07T12:17:48.148Z" level=info msg="pid 44: &{root 30 2147484141 {827879797 63737669867 0x308cd00} {1048687 302223069 1 16877 0 0 0 0 30 4096 0 {1602073067 941882875} {1602073067 827879797} {1602073067 829879851} [0 0 0]}}"
time="2020-10-07T12:17:48.156Z" level=info msg="Starting deadline monitor"
time="2020-10-07T12:17:48.156Z" level=info msg="pid 44: &{root 30 2147484141 {827879797 63737669867 0x308cd00} {1048687 302223069 1 16877 0 0 0 0 30 4096 0 {1602073067 941882875} {1602073067 827879797} {1602073067 829879851} [0 0 0]}}"
time="2020-10-07T12:17:48.156Z" level=warning msg="Failed to wait for container id 'ca132c75cb328d11851cb4d817c2534c2a34aa4ac742bb3ffed56fd5d1bb6704': Failed to determine pid for containerID ca132c75cb328d11851cb4d817c2534c2a34aa4ac742bb3ffed56fd5d1bb6704: container may have exited too quickly"
time="2020-10-07T12:17:48.156Z" level=error msg="executor error: Failed to determine pid for containerID ca132c75cb328d11851cb4d817c2534c2a34aa4ac742bb3ffed56fd5d1bb6704: container may have exited too quickly\ngithub.com/argoproj/argo/errors.New\n\t/go/src/github.com/argoproj/argo/errors/errors.go:49\ngithub.com/argoproj/argo/errors.Errorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:55\ngithub.com/argoproj/argo/errors.InternalErrorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:65\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).getContainerPID\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:302\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:164\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).Wait.func1\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:922\nk8s.io/apimachinery/pkg/util/wait.ExponentialBackoff\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:292\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:921\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:40\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:846\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:950\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:887\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357"
time="2020-10-07T12:17:48.157Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2020-10-07T12:17:48.157Z" level=info msg="Capturing script exit code"
time="2020-10-07T12:17:48.157Z" level=info msg="Getting exit code of ca132c75cb328d11851cb4d817c2534c2a34aa4ac742bb3ffed56fd5d1bb6704"
time="2020-10-07T12:17:48.156Z" level=info msg="Annotations monitor stopped"
time="2020-10-07T12:17:48.160Z" level=info msg="No output parameters"
time="2020-10-07T12:17:48.160Z" level=info msg="Saving output artifacts"
time="2020-10-07T12:17:48.160Z" level=info msg="Staging artifact: save-artifact"
time="2020-10-07T12:17:48.160Z" level=info msg="Staging /files/artifact.file from mirrored volume mount /mainctrfs/files/artifact.file"
time="2020-10-07T12:17:48.160Z" level=info msg="Taring /mainctrfs/files/artifact.file"
time="2020-10-07T12:17:48.161Z" level=info msg="Successfully staged /files/artifact.file from mirrored volume mount /mainctrfs/files/artifact.file"
time="2020-10-07T12:17:48.161Z" level=info msg="S3 Save path: /tmp/argo/outputs/artifacts/save-artifact.tgz, key: testfile"
time="2020-10-07T12:17:48.161Z" level=info msg="Creating minio client minio:9000 using static credentials"
time="2020-10-07T12:17:48.161Z" level=info msg="Saving from /tmp/argo/outputs/artifacts/save-artifact.tgz to s3 (endpoint: minio:9000, bucket: output, key: testfile)"
time="2020-10-07T12:17:48.198Z" level=info msg="Successfully saved file: /tmp/argo/outputs/artifacts/save-artifact.tgz"
time="2020-10-07T12:17:48.198Z" level=info msg="Annotating pod with output"
time="2020-10-07T12:17:48.199Z" level=info msg="pid 44: &{root 30 2147484141 {827879797 63737669867 0x308cd00} {1048687 302223069 1 16877 0 0 0 0 30 4096 0 {1602073067 941882875} {1602073067 827879797} {1602073067 829879851} [0 0 0]}}"
time="2020-10-07T12:17:48.214Z" level=info msg="Killing sidecars"
time="2020-10-07T12:17:48.233Z" level=info msg="Alloc=7224 TotalAlloc=18328 Sys=70848 NumGC=5 Goroutines=12"
time="2020-10-07T12:17:48.249Z" level=info msg="pid 44: &{root 30 2147484141 {827879797 63737669867 0x308cd00} {1048687 302223069 1 16877 0 0 0 0 30 4096 0 {1602073067 941882875} {1602073067 827879797} {1602073067 829879851} [0 0 0]}}"
time="2020-10-07T12:17:48.250Z" level=fatal msg="Failed to determine pid for containerID ca132c75cb328d11851cb4d817c2534c2a34aa4ac742bb3ffed56fd5d1bb6704: container may have exited too quickly\ngithub.com/argoproj/argo/errors.New\n\t/go/src/github.com/argoproj/argo/errors/errors.go:49\ngithub.com/argoproj/argo/errors.Errorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:55\ngithub.com/argoproj/argo/errors.InternalErrorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:65\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).getContainerPID\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:302\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:164\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).Wait.func1\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:922\nk8s.io/apimachinery/pkg/util/wait.ExponentialBackoff\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:292\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:921\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:40\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:846\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:950\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:887\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357"

Describe Pod

oc describe pod artifact-workflow-wp2fd-788754573
Name:               artifact-workflow-wp2fd-788754573
Namespace:          argo
Priority:           0
PriorityClassName:  <none>
Start Time:         Wed, 07 Oct 2020 14:17:40 +0200
Labels:             workflows.argoproj.io/completed=true
                    workflows.argoproj.io/workflow=artifact-workflow-wp2fd
Annotations:        openshift.io/scc=privileged
                    workflows.argoproj.io/node-message=Failed to determine pid for containerID ca132c75cb328d11851cb4d817c2534c2a34aa4ac742bb3ffed56fd5d1bb6704: container may have exited too quickly
                    workflows.argoproj.io/node-name=artifact-workflow-wp2fd[0].input-output
                    workflows.argoproj.io/outputs={"artifacts":[{"name":"save-artifact","path":"/files/artifact.file","s3":{"endpoint":"minio:9000","bucket":"output","insecure":true,"accessKeySecret":{"name":"minio","key...
                    workflows.argoproj.io/template={"name":"input-output","arguments":{},"inputs":{"artifacts":[{"name":"get-artifact","path":"/files/artifact.file","s3":{"endpoint":"minio:9000","bucket":"input","insecur...
Status:             Failed
IP:                 10.141.1.25
Controlled By:      Workflow/artifact-workflow-wp2fd
Init Containers:
  init:
    Container ID:  docker://b7242f8915d6e4085c2a2ab41362b76a03968052f7a7a07d0c8f0a37565f9d20
    Image:         argoproj/argoexec:v2.11.1
    Image ID:      docker-pullable://docker.io/argoproj/argoexec@sha256:574f8eb926820149bc98c4fc6b3c3b48ecdf6046f4f325643024e5684a0653a0
    Port:          <none>
    Host Port:     <none>
    Command:
      argoexec
      init
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 07 Oct 2020 14:17:43 +0200
      Finished:     Wed, 07 Oct 2020 14:17:43 +0200
    Ready:          True
    Restart Count:  0
    Environment:
      ARGO_POD_NAME:                    artifact-workflow-wp2fd-788754573 (v1:metadata.name)
      ARGO_CONTAINER_RUNTIME_EXECUTOR:  pns
    Mounts:
      /argo/inputs/artifacts from input-artifacts (rw)
      /argo/podmetadata from podmetadata (rw)
      /argo/secret/minio from minio (ro)
      /mainctrfs/files from workdir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from argo-workflow-token-wrlgv (ro)
Containers:
  wait:
    Container ID:  docker://576c5ff1bad9bd303783033d3de0a166002083a07c04d80e2bbb67d75330f10c
    Image:         argoproj/argoexec:v2.11.1
    Image ID:      docker-pullable://docker.io/argoproj/argoexec@sha256:574f8eb926820149bc98c4fc6b3c3b48ecdf6046f4f325643024e5684a0653a0
    Port:          <none>
    Host Port:     <none>
    Command:
      argoexec
      wait
    State:          Terminated
      Reason:       Error
      Message:      Failed to determine pid for containerID ca132c75cb328d11851cb4d817c2534c2a34aa4ac742bb3ffed56fd5d1bb6704: container may have exited too quickly
      Exit Code:    1
      Started:      Wed, 07 Oct 2020 14:17:45 +0200
      Finished:     Wed, 07 Oct 2020 14:17:48 +0200
    Ready:          False
    Restart Count:  0
    Environment:
      ARGO_POD_NAME:                    artifact-workflow-wp2fd-788754573 (v1:metadata.name)
      ARGO_CONTAINER_RUNTIME_EXECUTOR:  pns
    Mounts:
      /argo/podmetadata from podmetadata (rw)
      /argo/secret/minio from minio (ro)
      /mainctrfs/files from workdir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from argo-workflow-token-wrlgv (ro)
  main:
    Container ID:  docker://ca132c75cb328d11851cb4d817c2534c2a34aa4ac742bb3ffed56fd5d1bb6704
    Image:         docker/whalesay:latest
    Image ID:      docker-pullable://docker.io/docker/whalesay@sha256:178598e51a26abbc958b8a2e48825c90bc22e641de3d31e18aaf55f3258ba93b
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -c
    Args:
      sleep 60; cowsay Moved file to output
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 07 Oct 2020 14:17:47 +0200
      Finished:     Wed, 07 Oct 2020 14:18:47 +0200
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /files from workdir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from argo-workflow-token-wrlgv (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  podmetadata:
    Type:  DownwardAPI (a volume populated by information about the pod)
    Items:
      metadata.annotations -> annotations
  workdir:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
  minio:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  minio
    Optional:    false
  input-artifacts:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
  argo-workflow-token-wrlgv:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  argo-workflow-token-wrlgv
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  node-role.kubernetes.io/compute=true
Tolerations:     <none>

Any ideas why the workaround is not working or how to fix the problem?

@alexec
Copy link
Contributor

alexec commented Oct 7, 2020

Duplicates #1256

@alexec
Copy link
Contributor

alexec commented Oct 7, 2020

I'm confused. Does the node fail?

@iamcnx
Copy link
Author

iamcnx commented Oct 8, 2020

Yes it fails and as a consequence the whole workflow is considered as failed. But as I said, file was copied and even the whalesay after the sleep was executed correctly.

@alexec
Copy link
Contributor

alexec commented Oct 8, 2020

I'm confused by your config:

shareProcessNamespace: true

Can you share your workflow-controller-configmap?

@iamcnx
Copy link
Author

iamcnx commented Oct 8, 2020

This is an orphaned parameter that I used when I trail and error with PNS the first time until I reconized that this param is never translated to the Pod and is not documented in argo api. I only missed to remove it. It doesn't affect the problem.

oc describe configmap argo-workflow-controller-configmap
Name:         argo-workflow-controller-configmap
Namespace:    argo
Labels:       app.kubernetes.io/managed-by=Helm
              chart=argo-0.12.1
              heritage=Helm
              release=argo
Annotations:  meta.helm.sh/release-name=argo
              meta.helm.sh/release-namespace=argo

Data
====
config:
----
containerRuntimeExecutor: pns

Events:  <none>

@iamcnx
Copy link
Author

iamcnx commented Oct 8, 2020

pod.yaml of workflow pod

apiVersion: v1
kind: Pod
metadata:
  annotations:
    openshift.io/scc: privileged
    workflows.argoproj.io/node-message: 'Failed to determine pid for containerID 29b5c3161bfeaa0680ae119da42275486b61e275b33bb02dbfa48b263cce5058:
      container may have exited too quickly'
    workflows.argoproj.io/node-name: artifact-workflow-pzfsz[0].input-output
    workflows.argoproj.io/outputs: '{"artifacts":[{"name":"save-artifact","path":"/files/artifact.file","s3":{"endpoint":"minio:9000","bucket":"output","insecure":true,"accessKeySecret":{"name":"minio","key":"accesskey"},"secretKeySecret":{"name":"minio","key":"secretkey"},"key":"testfile"}}]}'
    workflows.argoproj.io/template: '{"name":"input-output","arguments":{},"inputs":{"artifacts":[{"name":"get-artifact","path":"/files/artifact.file","s3":{"endpoint":"minio:9000","bucket":"input","insecure":true,"accessKeySecret":{"name":"minio","key":"accesskey"},"secretKeySecret":{"name":"minio","key":"secretkey"},"key":"testfile"}}]},"outputs":{"artifacts":[{"name":"save-artifact","path":"/files/artifact.file","s3":{"endpoint":"minio:9000","bucket":"output","insecure":true,"accessKeySecret":{"name":"minio","key":"accesskey"},"secretKeySecret":{"name":"minio","key":"secretkey"},"key":"testfile"}}]},"metadata":{},"container":{"name":"whalesay","image":"docker/whalesay:latest","command":["sh","-c"],"args":["sleep
      60; cowsay Moved file to output"],"resources":{},"volumeMounts":[{"name":"workdir","mountPath":"/files"}]}}'
  creationTimestamp: 2020-10-08T16:24:14Z
  labels:
    workflows.argoproj.io/completed: "true"
    workflows.argoproj.io/workflow: artifact-workflow-pzfsz
  name: artifact-workflow-pzfsz-1999039533
  namespace: argo
  ownerReferences:
  - apiVersion: argoproj.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: Workflow
    name: artifact-workflow-pzfsz
    uid: b447e17a-0982-11eb-bc9a-000c293cee32
  resourceVersion: "110495748"
  selfLink: /api/v1/namespaces/argo/pods/artifact-workflow-pzfsz-1999039533
  uid: b44d8fbf-0982-11eb-bc9a-000c293cee32
spec:
  containers:
  - command:
    - argoexec
    - wait
    env:
    - name: ARGO_POD_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.name
    - name: ARGO_CONTAINER_RUNTIME_EXECUTOR
      value: pns
    image: argoproj/argoexec:v2.11.1
    imagePullPolicy: IfNotPresent
    name: wait
    resources: {}
    securityContext:
      capabilities:
        add:
        - SYS_PTRACE
        - SYS_CHROOT
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /argo/podmetadata
      name: podmetadata
    - mountPath: /argo/secret/minio
      name: minio
      readOnly: true
    - mountPath: /mainctrfs/files
      name: workdir
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: argo-workflow-token-wrlgv
      readOnly: true
  - args:
    - sleep 60; cowsay Moved file to output
    command:
    - sh
    - -c
    image: docker/whalesay:latest
    imagePullPolicy: Always
    name: main
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /files
      name: workdir
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: argo-workflow-token-wrlgv
      readOnly: true
  dnsPolicy: ClusterFirst
  imagePullSecrets:
  - name: argo-workflow-dockercfg-q9grf
  initContainers:
  - command:
    - argoexec
    - init
    env:
    - name: ARGO_POD_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.name
    - name: ARGO_CONTAINER_RUNTIME_EXECUTOR
      value: pns
    image: argoproj/argoexec:v2.11.1
    imagePullPolicy: IfNotPresent
    name: init
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /argo/podmetadata
      name: podmetadata
    - mountPath: /argo/secret/minio
      name: minio
      readOnly: true
    - mountPath: /argo/inputs/artifacts
      name: input-artifacts
    - mountPath: /mainctrfs/files
      name: workdir
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: argo-workflow-token-wrlgv
      readOnly: true
  nodeSelector:
    node-role.kubernetes.io/compute: "true"
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: argo-workflow
  serviceAccountName: argo-workflow
  shareProcessNamespace: true
  terminationGracePeriodSeconds: 30
  volumes:
  - downwardAPI:
      defaultMode: 420
      items:
      - fieldRef:
          apiVersion: v1
          fieldPath: metadata.annotations
        path: annotations
    name: podmetadata
  - emptyDir: {}
    name: workdir
  - name: minio
    secret:
      defaultMode: 420
      items:
      - key: accesskey
        path: accesskey
      - key: secretkey
        path: secretkey
      secretName: minio
  - emptyDir: {}
    name: input-artifacts
  - name: argo-workflow-token-wrlgv
    secret:
      defaultMode: 420
      secretName: argo-workflow-token-wrlgv
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: 2020-10-08T16:24:17Z
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: 2020-10-08T16:24:22Z
    message: 'containers with unready status: [wait main]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: null
    message: 'containers with unready status: [wait main]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: 2020-10-08T16:24:14Z
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://29b5c3161bfeaa0680ae119da42275486b61e275b33bb02dbfa48b263cce5058
    image: docker.io/docker/whalesay:latest
    imageID: docker-pullable://docker.io/docker/whalesay@sha256:178598e51a26abbc958b8a2e48825c90bc22e641de3d31e18aaf55f3258ba93b
    lastState: {}
    name: main
    ready: false
    restartCount: 0
    state:
      terminated:
        containerID: docker://29b5c3161bfeaa0680ae119da42275486b61e275b33bb02dbfa48b263cce5058
        exitCode: 0
        finishedAt: 2020-10-08T16:25:20Z
        reason: Completed
        startedAt: 2020-10-08T16:24:20Z
  - containerID: docker://dd8cba39cf76562619d757e2b238186028917be902c8b453d41db453bd73d340
    image: docker.io/argoproj/argoexec:v2.11.1
    imageID: docker-pullable://docker.io/argoproj/argoexec@sha256:574f8eb926820149bc98c4fc6b3c3b48ecdf6046f4f325643024e5684a0653a0
    lastState: {}
    name: wait
    ready: false
    restartCount: 0
    state:
      terminated:
        containerID: docker://dd8cba39cf76562619d757e2b238186028917be902c8b453d41db453bd73d340
        exitCode: 1
        finishedAt: 2020-10-08T16:24:21Z
        message: 'Failed to determine pid for containerID 29b5c3161bfeaa0680ae119da42275486b61e275b33bb02dbfa48b263cce5058:
          container may have exited too quickly'
        reason: Error
        startedAt: 2020-10-08T16:24:18Z
  initContainerStatuses:
  - containerID: docker://f1170413aa906154d1cd4f98b0999d7aef4664e0de35f57b1889a0e15ce93640
    image: docker.io/argoproj/argoexec:v2.11.1
    imageID: docker-pullable://docker.io/argoproj/argoexec@sha256:574f8eb926820149bc98c4fc6b3c3b48ecdf6046f4f325643024e5684a0653a0
    lastState: {}
    name: init
    ready: true
    restartCount: 0
    state:
      terminated:
        containerID: docker://f1170413aa906154d1cd4f98b0999d7aef4664e0de35f57b1889a0e15ce93640
        exitCode: 0
        finishedAt: 2020-10-08T16:24:17Z
        reason: Completed
        startedAt: 2020-10-08T16:24:17Z
  phase: Failed
  podIP: 10.141.1.28
  qosClass: BestEffort
  startTime: 2020-10-08T16:24:14Z

@cy-zheng
Copy link
Contributor

cy-zheng commented Oct 26, 2020

There is a strange prefix docker- before containerId:

time="2020-10-07T12:17:47.842Z" level=info msg="containerID docker-ca132c75cb328d11851cb4d817c2534c2a34aa4ac742bb3ffed56fd5d1bb6704 mapped to pid 44"

time="2020-10-07T12:17:48.156Z" level=warning msg="Failed to wait for container id 'ca132c75cb328d11851cb4d817c2534c2a34aa4ac742bb3ffed56fd5d1bb6704': Failed to determine pid for containerID ca132c75cb328d11851cb4d817c2534c2a34aa4ac742bb3ffed56fd5d1bb6704: container may have exited too quickly"

@iamcnx can you show me the content of file /proc/{container-main-process-pid}/cgroup inside container? It seems the issue is related to #4302

@iamcnx
Copy link
Author

iamcnx commented Oct 26, 2020

@cy-zheng It seems that you are right.

oc exec artifact-workflow-8hghb-3357359043 -c main -- ps -ef
UID         PID   PPID  C STIME TTY          TIME CMD
1001          1      0  0 15:12 ?        00:00:00 /usr/bin/pod
root         46      0  0 15:12 ?        00:00:00 sh -c sleep 120; cowsay Moved file to output
root         52     46  0 15:12 ?        00:00:00 sleep 120
root         54      0  0 15:13 ?        00:00:00 ps -ef
oc exec artifact-workflow-8hghb-3357359043 -c main -- cat /proc/46/cgroup
11:devices:/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podb1fc495b_179d_11eb_bc9a_000c293cee32.slice/docker-b60ef56f3a848675e6e3e8e6a3a0540bb47ca860b46ec27ad3b5c4fcadd19516.scope
10:pids:/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podb1fc495b_179d_11eb_bc9a_000c293cee32.slice/docker-b60ef56f3a848675e6e3e8e6a3a0540bb47ca860b46ec27ad3b5c4fcadd19516.scope
9:blkio:/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podb1fc495b_179d_11eb_bc9a_000c293cee32.slice/docker-b60ef56f3a848675e6e3e8e6a3a0540bb47ca860b46ec27ad3b5c4fcadd19516.scope
8:memory:/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podb1fc495b_179d_11eb_bc9a_000c293cee32.slice/docker-b60ef56f3a848675e6e3e8e6a3a0540bb47ca860b46ec27ad3b5c4fcadd19516.scope
7:cpuset:/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podb1fc495b_179d_11eb_bc9a_000c293cee32.slice/docker-b60ef56f3a848675e6e3e8e6a3a0540bb47ca860b46ec27ad3b5c4fcadd19516.scope
6:freezer:/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podb1fc495b_179d_11eb_bc9a_000c293cee32.slice/docker-b60ef56f3a848675e6e3e8e6a3a0540bb47ca860b46ec27ad3b5c4fcadd19516.scope
5:perf_event:/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podb1fc495b_179d_11eb_bc9a_000c293cee32.slice/docker-b60ef56f3a848675e6e3e8e6a3a0540bb47ca860b46ec27ad3b5c4fcadd19516.scope
4:net_prio,net_cls:/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podb1fc495b_179d_11eb_bc9a_000c293cee32.slice/docker-b60ef56f3a848675e6e3e8e6a3a0540bb47ca860b46ec27ad3b5c4fcadd19516.scope
3:hugetlb:/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podb1fc495b_179d_11eb_bc9a_000c293cee32.slice/docker-b60ef56f3a848675e6e3e8e6a3a0540bb47ca860b46ec27ad3b5c4fcadd19516.scope
2:cpuacct,cpu:/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podb1fc495b_179d_11eb_bc9a_000c293cee32.slice/docker-b60ef56f3a848675e6e3e8e6a3a0540bb47ca860b46ec27ad3b5c4fcadd19516.scope
1:name=systemd:/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podb1fc495b_179d_11eb_bc9a_000c293cee32.slice/docker-b60ef56f3a848675e6e3e8e6a3a0540bb47ca860b46ec27ad3b5c4fcadd19516.scope

@alexec
Copy link
Contributor

alexec commented Oct 26, 2020

I think this could be fixed coincidentally by #4253. Could you please try argoproj/argoexec:latest?

@iamcnx
Copy link
Author

iamcnx commented Oct 26, 2020

Nice! It works. Thank you very much :)

Name:               artifact-workflow-tx8x8-3539101276
Namespace:          argo
Priority:           0
PriorityClassName:  <none>
Start Time:         Mon, 26 Oct 2020 18:11:34 +0100
Labels:             workflows.argoproj.io/completed=true
                    workflows.argoproj.io/workflow=artifact-workflow-tx8x8
Annotations:        openshift.io/scc=privileged
                    workflows.argoproj.io/node-name=artifact-workflow-tx8x8[0].input-output
                    workflows.argoproj.io/outputs={"artifacts":[{"name":"save-artifact","path":"/files/artifact.file","s3":{"endpoint":"minio:9000","bucket":"output","insecure":true,"accessKeySecret":{"name":"minio","key...
                    workflows.argoproj.io/template={"name":"input-output","arguments":{},"inputs":{"artifacts":[{"name":"get-artifact","path":"/files/artifact.file","s3":{"endpoint":"minio:9000","bucket":"input","insecur...
Status:             Succeeded
IP:                 10.141.1.104
Controlled By:      Workflow/artifact-workflow-tx8x8
Init Containers:
  init:
    Container ID:  docker://7966f26e6b1d50238b17c57875ce3a06a89f1137a01d343c202a65f0c6ca5066
    Image:         argoproj/argoexec:latest
    Image ID:      docker-pullable://docker.io/argoproj/argoexec@sha256:319cdb0b4c09b948cfe6ef081baedd351f86f170758474f2089c94ec2b1a0a25
    Port:          <none>
    Host Port:     <none>
    Command:
      argoexec
      init
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 26 Oct 2020 18:11:48 +0100
      Finished:     Mon, 26 Oct 2020 18:11:49 +0100
    Ready:          True
    Restart Count:  0
    Environment:
      ARGO_POD_NAME:                    artifact-workflow-tx8x8-3539101276 (v1:metadata.name)
      ARGO_CONTAINER_RUNTIME_EXECUTOR:  pns
    Mounts:
      /argo/inputs/artifacts from input-artifacts (rw)
      /argo/podmetadata from podmetadata (rw)
      /argo/secret/minio from minio (ro)
      /mainctrfs/files from workdir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from argo-workflow-sa-token-nttrj (ro)
Containers:
  wait:
    Container ID:  docker://2529a1c72b4b3802a3b8de5983961024e158f003d8070646054cc4bf06a3c40f
    Image:         argoproj/argoexec:latest
    Image ID:      docker-pullable://docker.io/argoproj/argoexec@sha256:319cdb0b4c09b948cfe6ef081baedd351f86f170758474f2089c94ec2b1a0a25
    Port:          <none>
    Host Port:     <none>
    Command:
      argoexec
      wait
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 26 Oct 2020 18:11:52 +0100
      Finished:     Mon, 26 Oct 2020 18:13:54 +0100
    Ready:          False
    Restart Count:  0
    Environment:
      ARGO_POD_NAME:                    artifact-workflow-tx8x8-3539101276 (v1:metadata.name)
      ARGO_CONTAINER_RUNTIME_EXECUTOR:  pns
    Mounts:
      /argo/podmetadata from podmetadata (rw)
      /argo/secret/minio from minio (ro)
      /mainctrfs/files from workdir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from argo-workflow-sa-token-nttrj (ro)
  main:
    Container ID:  docker://611c45c91105837aa758af83079d831c8c67be5965ab4a1581262d15cf944543
    Image:         docker/whalesay:latest
    Image ID:      docker-pullable://docker.io/docker/whalesay@sha256:178598e51a26abbc958b8a2e48825c90bc22e641de3d31e18aaf55f3258ba93b
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -c
    Args:
      sleep 120; cowsay Moved file to output
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 26 Oct 2020 18:11:53 +0100
      Finished:     Mon, 26 Oct 2020 18:13:53 +0100
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /files from workdir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from argo-workflow-sa-token-nttrj (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  podmetadata:
    Type:  DownwardAPI (a volume populated by information about the pod)
    Items:
      metadata.annotations -> annotations
  workdir:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
  minio:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  minio
    Optional:    false
  input-artifacts:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
  argo-workflow-sa-token-nttrj:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  argo-workflow-sa-token-nttrj
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  node-role.kubernetes.io/compute=true
Tolerations:     <none>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants