Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added quotes to the Agent command to prevent defaulting to the sleep #768

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

papi83dm
Copy link
Contributor

@papi83dm papi83dm commented Dec 8, 2022

What this PR does / why we need it

it removes the sleep command from the agent PodTemplate when the command value is null.
By adding quotes to the template, it prevents from adding the sleep command and the command defaults into a blank command.

Which issue this PR fixes

(optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged)

  • fixes #

Special notes for your reviewer

Checklist

  • DCO signed
  • Chart Version bumped
  • CHANGELOG.md was updated

…command when the command variable was null.

Signed-off-by: papi83dm <[email protected]>
Signed-off-by: papi83dm <[email protected]>
Conflicts:
	charts/jenkins/CHANGELOG.md

Signed-off-by: papi83dm <[email protected]>
Conflicts:
	charts/jenkins/CHANGELOG.md
	charts/jenkins/Chart.yaml

Signed-off-by: papi83dm <[email protected]>
@papi83dm papi83dm requested a review from a team as a code owner December 8, 2022 16:15
Signed-off-by: papi83dm <[email protected]>
@papi83dm
Copy link
Contributor Author

papi83dm commented Dec 8, 2022

@timja @torstenwalter How do I submit this PR for review? does it gets submitted automatic ?

@torstenwalter
Copy link
Member

What's the benefit of this? If you are executing nothing the agent will stop before it can execute any CI job which you want to run on it.

@papi83dm
Copy link
Contributor Author

papi83dm commented Dec 8, 2022

What's the benefit of this? If you are executing nothing the agent will stop before it can execute any CI job which you want to run on it.

For some reason the Agent for me doesn't run and it dies and and it goes into a loop to create another agent and die. I have to manually go into the pod template in jenkins gui and remove the sleep command.

On the job it shows this

14:34:01  Still waiting to schedule task
14:34:01  Waiting for next available executor

Jenkins Logs

2022-12-08 19:34:16.049+0000 [id=2864]  INFO    o.c.j.p.k.KubernetesSlave#_terminate: Terminating Kubernetes instance for agent jenkins-agent-sxvl1
2022-12-08 19:34:16.058+0000 [id=77]    INFO    o.c.j.p.k.KubernetesSlave#deleteSlavePod: Terminated Kubernetes instance for agent <mynamespace>/jenkins-agent-sxvl1
2022-12-08 19:34:16.059+0000 [id=77]    INFO    o.c.j.p.k.KubernetesSlave#_terminate: Disconnected computer jenkins-agent-sxvl1
2022-12-08 19:34:16.061+0000 [id=2864]  SEVERE  o.c.j.p.k.KubernetesSlave#_terminate: Computer for agent is null: jenkins-agent-sxvl1
2022-12-08 19:34:34.316+0000 [id=34]    INFO    hudson.slaves.NodeProvisioner#update: jenkins-agent-w7rkg provisioning successfully completed. We have now 2 computer(s)
2022-12-08 19:34:34.342+0000 [id=2864]  INFO    o.c.j.p.k.KubernetesLauncher#launch: Created Pod: kubernetes <mynamespace>/jenkins-agent-w7rkg
2022-12-08 19:34:35.054+0000 [id=77]    INFO    o.c.j.p.k.p.r.Reaper$TerminateAgentOnContainerTerminated#lambda$onEvent$1: <mynamespace>/jenkins-agent-w7rkg Container jnlp was just terminated, so removing the corresponding Jenkins agent
2022-12-08 19:34:35.062+0000 [id=77]    INFO    o.c.j.p.k.KubernetesSlave#_terminate: Terminating Kubernetes instance for agent jenkins-agent-w7rkg
2022-12-08 19:34:35.078+0000 [id=77]    INFO    o.c.j.p.k.KubernetesSlave#deleteSlavePod: Terminated Kubernetes instance for agent <mynamespace>/jenkins-agent-w7rkg
2022-12-08 19:34:35.079+0000 [id=77]    INFO    o.c.j.p.k.KubernetesSlave#_terminate: Disconnected computer jenkins-agent-w7rkg
2022-12-08 19:34:54.317+0000 [id=42]    INFO    hudson.slaves.NodeProvisioner#update: jenkins-agent-832xw provisioning successfully completed. We have now 2 computer(s)
2022-12-08 19:34:54.334+0000 [id=2947]  INFO    o.c.j.p.k.KubernetesLauncher#launch: Created Pod: kubernetes <mynamespace>/jenkins-agent-832xw
2022-12-08 19:34:55.111+0000 [id=77]    INFO    o.c.j.p.k.p.r.Reaper$TerminateAgentOnContainerTerminated#lambda$onEvent$1: <mynamespace>/jenkins-agent-832xw Container jnlp was just terminated, so removing the corresponding Jenkins agent
2022-12-08 19:34:55.121+0000 [id=77]    INFO    o.c.j.p.k.KubernetesSlave#_terminate: Terminating Kubernetes instance for agent jenkins-agent-832xw
2022-12-08 19:34:55.137+0000 [id=77]    INFO    o.c.j.p.k.KubernetesSlave#deleteSlavePod: Terminated Kubernetes instance for agent <mynamespace>/jenkins-agent-832xw
2022-12-08 19:34:55.138+0000 [id=77]    INFO    o.c.j.p.k.KubernetesSlave#_terminate: Disconnected computer jenkins-agent-832xw
2022-12-08 19:35:14.318+0000 [id=38]    INFO    hudson.slaves.NodeProvisioner#update: jenkins-agent-jdn17 provisioning successfully completed. We have now 2 computer(s)
2022-12-08 19:35:14.343+0000 [id=2978]  INFO    o.c.j.p.k.KubernetesLauncher#launch: Created Pod: kubernetes <mynamespace>/jenkins-agent-jdn17
2022-12-08 19:35:15.152+0000 [id=77]    INFO    o.c.j.p.k.p.r.Reaper$TerminateAgentOnContainerTerminated#lambda$onEvent$1: <mynamespace>/jenkins-agent-jdn17 Container jnlp was just terminated, so removing the corresponding Jenkins agent
2022-12-08 19:35:15.162+0000 [id=77]    INFO    o.c.j.p.k.KubernetesSlave#_terminate: Terminating Kubernetes instance for agent jenkins-agent-jdn17
2022-12-08 19:35:15.178+0000 [id=77]    INFO    o.c.j.p.k.KubernetesSlave#deleteSlavePod: Terminated Kubernetes instance for agent <mynamespace>/jenkins-agent-jdn17
2022-12-08 19:35:15.179+0000 [id=77]    INFO    o.c.j.p.k.KubernetesSlave#_terminate: Disconnected computer jenkins-agent-jdn17

k8s event logs

lled                   pod/jenkins-agent-72wnv                               Container image "<myecrrepo>/docker-jenkins-agent:1.0.0-0" already present on machine
99s         Normal   Scheduled                pod/jenkins-agent-72wnv                               Successfully assigned <mynamespace>/jenkins-agent-72wnv to ip-10-99-65-79.ec2.internal
99s         Normal   Created                  pod/jenkins-agent-72wnv                               Created container jnlp
79s         Normal   Started                  pod/jenkins-agent-rfscj                               Started container jnlp
79s         Normal   Pulled                   pod/jenkins-agent-rfscj                               Container image "<myecrrepo>/docker-jenkins-agent:1.0.0-0" already present on machine
79s         Normal   Scheduled                pod/jenkins-agent-rfscj                               Successfully assigned <mynamespace>/jenkins-agent-rfscj to ip-10-99-65-79.ec2.internal
79s         Normal   Created                  pod/jenkins-agent-rfscj                               Created container jnlp
59s         Normal   Scheduled                pod/jenkins-agent-pkmmf                               Successfully assigned <mynamespace>/jenkins-agent-pkmmf to ip-10-99-65-79.ec2.internal
59s         Normal   Started                  pod/jenkins-agent-pkmmf                               Started container jnlp
59s         Normal   Pulled                   pod/jenkins-agent-pkmmf                               Container image "<myecrrepo>/docker-jenkins-agent:1.0.0-0" already present on machine
59s         Normal   Created                  pod/jenkins-agent-pkmmf                               Created container jnlp
39s         Normal   Scheduled                pod/jenkins-agent-5fj67                               Successfully assigned <mynamespace>/jenkins-agent-5fj67 to ip-10-99-65-79.ec2.internal
39s         Normal   Pulled                   pod/jenkins-agent-5fj67                               Container image "<myecrrepo>/docker-jenkins-agent:1.0.0-0" already present on machine
39s         Normal   Created                  pod/jenkins-agent-5fj67                               Created container jnlp
39s         Normal   Started                  pod/jenkins-agent-5fj67                               Started container jnlp
19s         Normal   Started                  pod/jenkins-agent-697js                               Started container jnlp
19s         Normal   Scheduled                pod/jenkins-agent-697js                               Successfully assigned <mynamespace>/jenkins-agent-697js to ip-10-99-65-79.ec2.internal
19s         Normal   Pulled                   pod/jenkins-agent-697js                               Container image "<myecrrepo>/docker-jenkins-agent:1.0.0-0" already present on machine
19s         Normal   Created                  pod/jenkins-agent-697js                               Created container jnlp

@torstenwalter
Copy link
Member

Could it be that the image you are using does not have a sleep command? I am not sure what is used as default if this is empty.
You could try to figure that out if you remove the command in tbe UI and start an agent. You can then check it's pod in kubernetes.

Just guessing but it could be a cat or something similar which runs forever?

@papi83dm
Copy link
Contributor Author

papi83dm commented Dec 8, 2022

Could it be that the image you are using does not have a sleep command? I am not sure what is used as default if this is empty. You could try to figure that out if you remove the command in tbe UI and start an agent. You can then check it's pod in kubernetes.

Just guessing but it could be a cat or something similar which runs forever?

not sure why the k8s plugins has to default it to sleep when is null

@torstenwalter
Copy link
Member

I am not sure if there was a change in the plugin, but I remember tbe situation where pods being killed if a command was specified which exited right away. It's a bit hard to speculate here without knowing which image you are using and which CMD / ENTRYPOINT that specifies.

@papi83dm
Copy link
Contributor Author

papi83dm commented Dec 8, 2022

Currently the helm chart doesn’t let you remove the default command when you configure an agent, and this pr does that.

Another issue I found is, we limit the amount of concurrence agents that can run and there isn’t an option to limit the instanceCap in the agents

@papi83dm
Copy link
Contributor Author

papi83dm commented Dec 8, 2022

@torstenwalter this is the image i'm using jenkins/inbound-agent:4.11-1-alpine-jdk11 none of my pipelines run until I remove the sleep command from the podTemplate configuration.

https://i.imgur.com/R0xWgaj.png

@torstenwalter
Copy link
Member

@timja are you aware of any change which defaults to sleep?

@papi83dm
Copy link
Contributor Author

papi83dm commented Dec 8, 2022

This default seems to exist for quite a while:

https://github.com/jenkinsci/kubernetes-plugin/blame/be6dd58c2a6d53411d2e6388f4483672c989fd11/src/main/resources/org/csanchez/jenkins/plugins/kubernetes/ContainerTemplate/config.jelly#L26

Correct, but shouldn’t a user have the option to override from the helm chart ?

@torstenwalter
Copy link
Member

That's fine with me. I just want to avoid any negative impact where previously the default value was used and it breaks for users if we set it to empty.

@papi83dm
Copy link
Contributor Author

papi83dm commented Dec 9, 2022

That's fine with me. I just want to avoid any negative impact where previously the default value was used and it breaks for users if we set it to empty.

I completely understand, I wonder if this is related to just my eks setup or others are experiencing the same.

I tried it with these 4 images and I still have the same behavior.

jenkins/inbound-agent:4.11-1-alpine-jdk11
jenkins/inbound-agent:4.13.3-1-alpine-jdk11
jenkins/inbound-agent:4.13.3-1-jdk11
jenkins/inbound-agent:4.11.2-4

EKS Version: 1.24
kubelet version: v1.24.6-eks-4360b32

@timja
Copy link
Member

timja commented Dec 9, 2022

What version of Jenkins are you running? those agent images are quite old

@papi83dm
Copy link
Contributor Author

papi83dm commented Dec 9, 2022

I'm using the latest LTS jenkins/jenkins:2.375.1-lts-alpine

@timja
Copy link
Member

timja commented Dec 9, 2022

@papi83dm
Copy link
Contributor Author

papi83dm commented Dec 9, 2022

Same error

2022-12-09 14:42:58.181+0000 [id=39]    INFO    hudson.slaves.NodeProvisioner#update: jenkins-agent-f103c provisioning successfully completed. We have now 2 computer(s)
2022-12-09 14:42:58.258+0000 [id=2216]  INFO    o.c.j.p.k.KubernetesLauncher#launch: Created Pod: kubernetes <mynamespace>/jenkins-agent-f103c
2022-12-09 14:43:03.703+0000 [id=2216]  INFO    o.c.j.p.k.KubernetesLauncher#launch: Pod is running: kubernetes <mynamespace>/jenkins-agent-f103c
2022-12-09 14:43:05.716+0000 [id=112]   INFO    o.c.j.p.k.p.r.Reaper$TerminateAgentOnContainerTerminated#lambda$onEvent$1: <mynamespace>/jenkins-agent-f103c Container jnlp was just terminated, so removing the corresponding Jenkins agent
2022-12-09 14:43:05.720+0000 [id=2216]  INFO    o.c.j.p.k.KubernetesLauncher#launch: Container is terminated jenkins-agent-f103c [jnlp]: ContainerStateTerminated(containerID=containerd://7a7bbb3cbbac7b589f0030f84e5514198cb99f9c418bf56c527cc8c1aeb76c9e, exitCode=1, finishedAt=2022-12-09T14:43:03Z, message=null, reason=Error, signal=null, startedAt=2022-12-09T14:43:03Z, additionalProperties={})
2022-12-09 14:43:05.730+0000 [id=112]   INFO    o.c.j.p.k.KubernetesSlave#_terminate: Terminating Kubernetes instance for agent jenkins-agent-f103c
2022-12-09 14:43:05.735+0000 [id=2216]  SEVERE  o.c.j.p.k.KubernetesLauncher#logLastLines: Error in provisioning; agent=KubernetesSlave name: jenkins-agent-f103c, template=PodTemplate{id='cc80514b1e750f46b34910c3c5607eecede41157c020a135916f5edba7d9b750', name='jenkins-agent', namespace='<mynamespace>', slaveConnectTimeout=100, idleMinutes=1, label='helmchart-jenkins-agent jenkins-agent', serviceAccount='default', nodeSelector='namespace=<mynamespace>,node_type=controller', nodeUsageMode=NORMAL, podRetention='Never', containers=[ContainerTemplate{name='jnlp', image='jenkins/inbound-agent:3077.vd69cf116da_6f-3-jdk11', workingDir='/home/jenkins/agent', command='sleep', args='', resourceRequestCpu='100m', resourceRequestMemory='256Mi', resourceRequestEphemeralStorage='', resourceLimitCpu='200m', resourceLimitMemory='512Mi', resourceLimitEphemeralStorage='', envVars=[KeyValueEnvVar [getValue()=http://helmchart.<mynamespace>.svc.cluster.local:8080/, getKey()=JENKINS_URL]], livenessProbe=ContainerLivenessProbe{execArgs='', timeoutSeconds=0, initialDelaySeconds=0, failureThreshold=0, periodSeconds=0, successThreshold=0}}]}. 
Container jnlp exited with error 1. Logs: sleep: missing operand
Try 'sleep --help' for more information.

@papi83dm
Copy link
Contributor Author

papi83dm commented Dec 9, 2022

As a workaround I ended up configuring my agents under the agents.podTemplates section as in that part of the code I have full control of the configuration.

agent:
    enabled: true
    podName: "donotuse-agent-cannot-remove-sleep-command"
    image: "jenkins/inbound-agent"
    tag: "4.11.2-4"
    showRawYaml: false
	nodeUsageMode: "EXCLUSIVE":  
    customJenkinsLabels:
      - donotuse-agent-cannot-remove-sleep-command

    podTemplates:
      customagent: |
        - name: customagent
          label: customagent
          showRawYaml: false
          serviceAccount: customagent-sa
          namespace: ${NAMESPACE}
          idleMinutes: 30
          instanceCap: 10
          nodeSelector: "node_type=agents,namespace=${NAMESPACE}"
          containers:
            - name: jnlp
              image: mycustomimage
              command: ""
              args: "" 
              resourceRequestCpu: "100m"
              resourceRequestMemory: "256Mi" 
	- name: otheragent
          label: otheragent
          showRawYaml: false
          serviceAccount: otheragent-sa
          namespace: ${NAMESPACE}
          idleMinutes: 1
          instanceCap: 5
          nodeSelector: "node_type=agents,namespace=${NAMESPACE}"
          containers:
            - name: jnlp
              image: customimageother
              command: ""
              args: "" 
              resourceRequestCpu: "100m"
              resourceRequestMemory: "256Mi"       

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants