-
Notifications
You must be signed in to change notification settings - Fork 37
Common build errors
Why are there no build executors? Why are build executors offline/suspended and/or builds never start?
On our cluster-based infrastructure all build executors/agents/pods (except on dedicated agents) are dynamically spun up. This usually takes a little while. Therefore the build executor status panel might show something like
- "(pending—Waiting for next available executor)" or
- "my-agent-abcd123 is offline or suspended" for a few seconds, before the build starts. We've opened a ticket with a suggestion on the Jenkins project to speed up the provisioning process. Feel free to vote on this issue if you feel it is important for you.
If such a message is shown for more than ~5 minutes, you can safely assume that something is wrong with the pod/container config. For example: a docker image can't be found because there is a typo in the name or the tag was wrong.
A good indicator that your build is aborted due to an OutOfMemory error is the word "Killed" appearing in the console logs.
First, please get familiarized with how Kubernetes assigns memory resources to containers and pods
Then, you should know that, as soon as you run your build in a custom Kubernetes agent, Jenkins adds a container named "jnlp" that will handle the connection between the pod agent and the master. The resources assigned to this "jnlp" container come from a default values we set for you. Because we know the "jnlp" container does not use need of cpu and memory, we set the default values for all containers to low values (about 512MiB and 0.25 vCPU). This way, we're sure that the "jnlp" container won't uselessly consume too much resources that are allocated to your project. But this has the negative effect that, if you don't effectively specify the resources requests and limits in your pod template, your custom containers will also inherit those values (which are probably too low for you). To overcome the issue, you need to specify those values in your pod template like:
pipeline {
agent {
kubernetes {
label 'my-agent-pod'
yaml """
apiVersion: v1
kind: Pod
spec:
containers:
- name: maven
image: maven:alpine
command:
- cat
tty: true
resources:
limits:
memory: "2Gi"
cpu: "1"
requests:
memory: "2Gi"
cpu: "1"
"""
}
}
stages {
stage('Run maven') {
steps {
container('maven') {
sh 'mvn -version'
}
}
}
}
}
Note that if you run multiple containers, you need to specify the limits for each.
We plan to develop some tooling to automatically inject correct default values for your custom containers depending on the resource quotas and the concurrency level (i.e. how many agent can run at once) assigned to your project GitHub Issue #20.
You are most likely running UI tests that require a desktop environment and a VNC server. The default pod template (basic
) does not provide such an environment. Therefore you will need to use a different pod template or a custom docker image.
**The Ubuntu pod templates (labels: "ubuntu-2204", "ubuntu-2404", "ubuntu-latest") can be used for UI tests.
See https://wiki.eclipse.org/Jenkins#How_do_I_run_UI-tests_on_the_new_infra.3F on how the pod template can be used with freestyle or pipeline jobs.
Your build is still running with a JDK < 8 (e.g. JDK 6). Since these are really old and unsupported JDKs we urge you to switch to a more recent JDK, at least JDK 8. If that's not immediately possible - for reasons - you will need to use the following workaround:
Unset the environment variables "JAVA_TOOL_OPTIONS" and "_JAVA_OPTIONS" by creating two string build parameters with the those names:
- In the Jenkins job configuration select "This project is parameterized"
- Add parameter -> String parameter
- Set name "JAVA_TOOL_OPTIONS"
- Leave default value empty
- Repeat 2.-4. for "_JAVA_OPTIONS"
First, you need to know that we run containers using an arbitrarily assigned user ID (1000100000) in our OpenShift cluster. This is for security reasons.
Unfortunately, most of images you can find on DockerHub (including official images) do not support running as an arbitrary user. Actually, most of them expect to run as root, which is definitely a bad practice. See also question below about #How can I run my build in a container with root privileges?.
Moreover, some programs like ssh
search for a mapping between the user ID (1000100000) and a user name on the system (here a container). It's very rare that any container anticipate this need and actually created a user with ID=1000100000. To avoid this error, you need to customize the image. OpenShift publishes guidelines with best practices about how to create Docker images. More specifically, see the section about how to support running with arbitrary user ID.
In order to make your image call the uid_entrypoint as listed in the link above, you will need to add it to the command directive in the pod template, e.g.:
pipeline {
agent {
kubernetes {
label 'my-pod'
yaml **
apiVersion: v1
kind: Pod
spec:
containers:
- name: custom-container
image: 'custom/image'
command: [ "/usr/local/bin/uid_entrypoint" ]
args: [ "cat" ]
tty: true
**
}
}
stages {
stage('Stage 1') {
steps {
container('custom-container') {
sh 'whoami'
}
}
}
}
}
If you want to see in practice, have a look at some images we've defined to run in the cluster on this GitHub repository.
As long as you stay in the default jnlp
docker image (i.e. use a Freestyle Job or a Pipeline job without custom pod template), you'll benefit from our existing configuration where we mount a known_hosts
file in the >~/.ssh
folder of all containers.
If you define a custom pod template, you need to add some configuration to mount this config map in your containers. The only thing that you have to know is the config map name known-hosts and mount it at the proper location /home/jenkins/.ssh
.
pipeline {
agent {
kubernetes {
label 'my-agent-pod'
yaml """
apiVersion: v1
kind: Pod
spec:
containers:
- name: maven
image: maven:alpine
command:
- cat
tty: true
volumeMounts:
- name: volume-known-hosts
mountPath: /home/jenkins/.ssh
volumes:
- name: volume-known-hosts
configMap:
name: known-hosts
"""
}
}
stages {
...
}
}
Currently, the known_hosts file we provide has the host keys for the following sites:
- git.eclipse.org:22
- git.eclipse.org:29418
- build.eclipse.org
- github.com
If you need any other site to be added, feel free to open a HelpDesk issue.
On our build cluster all builds are executed on dynamic or static build agents. The master instance has no executors and therefore can not run any builds itself.
In case of pipeline jobs, it might appear as if the master is building, but it only handles the workflow of the tasks defined in the pipeline and post-build tasks, never the actually build steps.
First thing to do when you think that a build is slow, is to compare it with several other run over the course of a couple of days and relativize. We consider a single build being slow not only by comparing the average build time, but taking into account the standard deviation of the build time history.
You should keep in mind that when running on a clustered agent, CPU is always requested as an absolute quantity, never as a relative quantity; 0.1 is the same amount of CPU on a single-core, dual-core, or 48-core machine. Also, the CPU resources assigned to containers are absolute and not relative to the clock speed of the CPUs. On the cluster we currently have some machines with 3.70GHz CPU cores and others with 2.00GHz CPU cores. So, depending on which node you jobs are schedule to run on, the ''performances may vary''. We will eventually streamline the machines on the cluster and reserve higher performance nodes for build jobs and keep lower performance ones for other tasks. But during the migration and until the JIPP machines from the old infra are moved to the cluster, we need those less performant machines to keep up with the requested compute resources.
This could be related to JENKINS-47759 (See Bug 565044)
Instead of a pod label like my-agent-pod
, consider using a label like my-agent-pod + '-' + env.BUILD_NUMBER
.
On our new cluster-based infrastructure, all builds run in dynamically created docker containers. After the build, the containers are destroyed and the workspace vanishes.
Build artifacts can be archived (as before) by using the "archive artifacts" post-build action. If you need to access log files, etc. (for debugging purposes), you can archive them as well.
We understand that this can be inconvenient, especially when debugging a failing build. One possible workaround is to use "sleep" in a shell build step (e.g. for 10min) at the end, to be able to access the workspace after the build.