Skip to content
This repository has been archived by the owner on Jul 23, 2020. It is now read-only.

Jenkins trying to spin up multiple slaves at the same time #2384

Closed
jfchevrette opened this issue Feb 27, 2018 · 10 comments
Closed

Jenkins trying to spin up multiple slaves at the same time #2384

jfchevrette opened this issue Feb 27, 2018 · 10 comments

Comments

@jfchevrette
Copy link
Contributor

Just observed a namespace that had 6 openshift builds in 'Running' state and jenkins was trying to start multiple slave pods at once. Because slaves are started with a PV mount, only one can run at once.

Once I cancelled all the builds and made sure jenkins had stopped trying to start slaves I triggered one build it completed just fine.

Is there something in jenkins that ensures that builds are queued? Especially builds coming from multiple different buildconfigs/apps?

NAME                            READY     STATUS              RESTARTS   AGE
po/content-repository-1-prf7p   1/1       Running             0          2h
po/jenkins-1-fbkfc              1/1       Running             0          2h
po/jenkins-slave-8nsk8-xg3gx    0/2       DeadlineExceeded    0          1h
po/jenkins-slave-gwf08-k1rj4    2/2       Running             0          42m
po/jenkins-slave-v5jtx-x899b    0/2       ContainerCreating   0          1m
NAME                       TYPE              FROM         STATUS      STARTED             DURATION
builds/testquickstart1-1   JenkinsPipeline   Git          Running   4 hours ago         
builds/testquickstart2-1   JenkinsPipeline   Git          Running   4 hours ago         
builds/testquickstart3-1   JenkinsPipeline   Git          Running   4 hours ago         
builds/testquickstart4-1   JenkinsPipeline   Git          Running   4 hours ago         
builds/testquickstart5-1   JenkinsPipeline   Git          Running   3 hours ago         
builds/testquickstart6-1   JenkinsPipeline   Git          Running   3 hours ago         

NAME                 TYPE              FROM         LATEST
bc/testquickstart1   JenkinsPipeline   Git@master   1
bc/testquickstart2   JenkinsPipeline   Git@master   1
bc/testquickstart3   JenkinsPipeline   Git@master   1
bc/testquickstart4   JenkinsPipeline   Git@master   1
bc/testquickstart5   JenkinsPipeline   Git@master   1
bc/testquickstart6   JenkinsPipeline   Git@master   1
@jaseemabid
Copy link
Contributor

@jfchevrette I'm starting to look into this and I could use some help understanding this.

Because slaves are started with a PV mount, only one can run at once.

  1. Where is this configured/code? Where can I read a bit more about this?
  2. Does this imply this is happening on the new 2a cluster with gluster backed storage?

Is there something in jenkins that ensures that builds are queued? Especially builds coming from multiple different buildconfigs/apps?

Nothing I know of, but I'll take a look and comment here if any.

@jaseemabid
Copy link
Contributor

jaseemabid commented Mar 15, 2018

These are the volumes I see when I run 2 concurrent builds. Which of them will cause issues when shared?

vols

@jaseemabid
Copy link
Contributor

Among the few volumes I found at the jenkins dc, this is the only volume that's mounted in a rw mode (I'm assuming the default is read only). Is this the one that cannot be shared?

        - mountPath: /var/lib/jenkins
          name: jenkins-home
          readOnly: false

@jaseemabid
Copy link
Contributor

@pradeepto
Copy link

@jfchevrette @pbergene Jaseem needs your input on this issue. Thanks.

@jfchevrette
Copy link
Contributor Author

@jaseemabid that is correct, the PV mounted at /var/lib/jenkins cannot be shared between multiple pods - this is a limitation of our current architecture and storage backend.

The slave pods are launched by jenkins. That's all I know. However as you pointed out, unlike the jenkins master the slave pods don't mount a PV.

@rupalibehera
Copy link
Collaborator

rupalibehera commented Apr 9, 2018

Below are the few observations regarding the issue after running multiple builds simultaneously -

  1. The Jenkins slave pod does not mount on /var/lib/jenkins PV, the slave mount some temporary volumes which is removed when the Pod is cleaned up. The slave pod is cleaned up after the job is completed.
  2. if you check in the Jenkins logs or UI you could see that the jobs are queued, the jobs are basically waiting for the slaves to get provision, if there are enough resources for the slaves to come up otherwise they wait in queue.
  3. when the namespace goes out of quota it will stop creating any other container required in pod. In the current situation it tries to create a container for S2i build but it is cancelled due to less quota as there two builds simultaneously running.
  4. I can think of this solution of reducing the container cap like we do on GKE, where also we have faced issues due to resources. for eg like this <instanceCap>1</instanceCap> (https://github.com/fabric8-services/fabric8-tenant-jenkins/blob/master/apps/jenkins/src/main/fabric8/openshift-cm.yml#L322) with this only one job will be triggered and all other jobs will be queued which will not cause failures of build due to timeouts. But in that case we need notify the user otherwise the user waits for a good amount if time.

@krishnapaparaju
Copy link
Collaborator

As discussed @rupalibehera , I guess this issue be taken care if @hrishin has a fix for #2729

@rohanKanojia
Copy link
Collaborator

linked to #2729

@maxandersen
Copy link
Collaborator

closing this as its duplicate/directly related to #2729

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants