Skip to content
This repository has been archived by the owner on Jul 23, 2020. It is now read-only.

If a user has multiple builds running at the same time, one or both will fail ("Unable to build the image using the OpenShift build service") #2729

Closed
ldimaggi opened this issue Mar 22, 2018 · 43 comments · Fixed by fabric8io/openshift-jenkins-s2i-config#172 or fabric8-services/fabric8-tenant-jenkins#98 · May be fixed by fabric8io/kubernetes-plugin#6

Comments

@ldimaggi
Copy link
Collaborator

ldimaggi commented Mar 22, 2018

Steps to recreate:

  • Login, create a new space
  • Create a new quickstart, logout
  • Login, create a second space
  • Create a new quickstart

One or both of the pipelines will fail with this error:

EXITCODE   0[ERROR] F8: Failed to execute the build [Unable to build the image using the OpenShift build service]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 06:38 min
[INFO] Finished at: 2018-03-22T15:17:00+00:00
[INFO] Final Memory: 36M/53M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal io.fabric8:fabric8-maven-plugin:3.5.38:build (fmp) on project testmar221521731018217: Failed to execute the build: Unable to build the image using the OpenShift build service: An error has occurred. timeout: Socket closed -> [Help 1]
@ldimaggi
Copy link
Collaborator Author


org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal io.fabric8:fabric8-maven-plugin:3.5.38:build (fmp) on project testmar221521731018217: Failed to execute the build
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:212)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
	at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
	at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307)
	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193)
	at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106)
	at org.apache.maven.cli.MavenCli.execute(MavenCli.java:863)
	at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:288)
	at org.apache.maven.cli.MavenCli.main(MavenCli.java:199)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
	at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
	at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
	at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
Caused by: org.apache.maven.plugin.MojoExecutionException: Failed to execute the build
	at io.fabric8.maven.plugin.mojo.build.BuildMojo.buildAndTag(BuildMojo.java:270)
	at io.fabric8.maven.docker.BuildMojo.executeInternal(BuildMojo.java:44)
	at io.fabric8.maven.plugin.mojo.build.BuildMojo.executeInternal(BuildMojo.java:228)
	at io.fabric8.maven.docker.AbstractDockerMojo.execute(AbstractDockerMojo.java:223)
	at io.fabric8.maven.plugin.mojo.build.BuildMojo.execute(BuildMojo.java:199)
	at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:207)
	... 20 more
Caused by: io.fabric8.maven.core.service.Fabric8ServiceException: Unable to build the image using the OpenShift build service
	at io.fabric8.maven.core.service.openshift.OpenshiftBuildService.build(OpenshiftBuildService.java:121)
	at io.fabric8.maven.plugin.mojo.build.BuildMojo.buildAndTag(BuildMojo.java:267)
	... 26 more
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred.
	at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:62)
	at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:53)
	at io.fabric8.openshift.client.dsl.internal.BuildConfigOperationsImpl.fromInputStream(BuildConfigOperationsImpl.java:276)
	at io.fabric8.openshift.client.dsl.internal.BuildConfigOperationsImpl.fromFile(BuildConfigOperationsImpl.java:231)
	at io.fabric8.openshift.client.dsl.internal.BuildConfigOperationsImpl.fromFile(BuildConfigOperationsImpl.java:68)
	at io.fabric8.maven.core.service.openshift.OpenshiftBuildService.startBuild(OpenshiftBuildService.java:361)
	at io.fabric8.maven.core.service.openshift.OpenshiftBuildService.build(OpenshiftBuildService.java:111)
	... 27 more
Caused by: java.net.SocketTimeoutException: timeout
	at okio.Okio$4.newTimeoutException(Okio.java:230)
	at okio.AsyncTimeout.exit(AsyncTimeout.java:285)
	at okio.AsyncTimeout$2.read(AsyncTimeout.java:241)
	at okio.RealBufferedSource.indexOf(RealBufferedSource.java:345)
	at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:217)
	at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:211)
	at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:187)
	at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:61)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:125)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at io.fabric8.openshift.client.internal.OpenShiftOAuthInterceptor.intercept(OpenShiftOAuthInterceptor.java:66)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:200)
	at okhttp3.RealCall.execute(RealCall.java:77)
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:377)
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:359)
	at io.fabric8.openshift.client.dsl.internal.BuildConfigOperationsImpl.fromInputStream(BuildConfigOperationsImpl.java:274)
	... 31 more
Caused by: java.net.SocketException: Socket closed
	at java.net.SocketInputStream.read(SocketInputStream.java:204)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
	at sun.security.ssl.InputRecord.read(InputRecord.java:503)
	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983)
	at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940)
	at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
	at okio.Okio$2.read(Okio.java:139)
	at okio.AsyncTimeout$2.read(AsyncTimeout.java:237)
	... 56 more

@ldimaggi ldimaggi changed the title Intermittent build error on cluster starter-us-east-2 ("Unable to build the image using the OpenShift build service") If a user has multuiple build running at the same time, one or both can fail ("Unable to build the image using the OpenShift build service") Mar 22, 2018
@ldimaggi ldimaggi changed the title If a user has multuiple build running at the same time, one or both can fail ("Unable to build the image using the OpenShift build service") If a user has multiple builds running at the same time, one or both can fail ("Unable to build the image using the OpenShift build service") Mar 22, 2018
@ldimaggi ldimaggi changed the title If a user has multiple builds running at the same time, one or both can fail ("Unable to build the image using the OpenShift build service") If a user has multiple builds running at the same time, one or both will fail ("Unable to build the image using the OpenShift build service") Mar 22, 2018
@krishnapaparaju
Copy link
Collaborator

@chmouel Does currently OSIO support parallel builds for a tenant ?
@kishansagathiya Is there a way to figure out for a tenant, if a OSIO build is in progress ?

@chmouel
Copy link

chmouel commented Mar 23, 2018

@krishnapaparaju AFAIK no, @kbsingh sometime ago pointed it to me,

@ldimaggi
Copy link
Collaborator Author

What makes this problem difficult for users is that the competing builds can be in different spaces, so that it is easy to overlook one or more of them.

@ppitonak
Copy link
Collaborator

I don't think that this issue is occurring only when multiple builds are running at the same time. I reset my account, created a new project and build failed.

@ldimaggi
Copy link
Collaborator Author

ldimaggi commented Mar 27, 2018 via email

@hrishin
Copy link

hrishin commented Apr 3, 2018

What makes this problem difficult for users is that the competing builds can be in different spaces, so that it is easy to overlook one or more of them.

Even though user run the two builds in the same space it will fall into the same issue.

Issue

The primary issue is with resource quota limitation on OSO when more than two build starts running. Build try to spin up at least 4 pods i.e. 2 Jenkins slave pods and 2 build pods.

Eventually, build throws following status event on OSO

:58:33 PM	Warning	Failed Create 	Error creating: pods "multi1build-s2i-1-build" is forbidden: exceeded quota: compute-resources-timebound, requested: limits.cpu=1,limits.memory=512Mi, used: limits.cpu=3500m,limits.memory=1792Mi, limited: limits.cpu=4,limits.memory=2Gi

After some time build status become
image
and it results into

Caused by: java.net.SocketTimeoutException: timeout

Edit:
Is it worth to switch from Deployment to DeploymentConfig? fabric8io/fabric8-maven-plugin#1042 to reduce pod consumption in the build process

@maxandersen
Copy link
Collaborator

lets make sure we are fixing this at the right level. Parallel builds should be supported on OSIO; no question about that - but users might not have enough resources to do so before upgrading.

In build.next we can hopefully manage this for the user; but until then would this be a matter of enabling something like https://wiki.jenkins.io/display/JENKINS/Throttle+Concurrent+Builds+Plugin or similar to by default have limit of concurrent builds to be 1 and everything else gets queued ?

@hrishin
Copy link

hrishin commented Apr 4, 2018

In build.next we can hopefully manage this for the user; but until then would this be a matter of enabling something like https://wiki.jenkins.io/display/JENKINS/Throttle+Concurrent+Builds+Plugin or similar to by

Queuing and enforcing build execution policies could be useful for build.next as well. Thanks for this plugin, will evaluate it for our use case.

@hrishin
Copy link

hrishin commented Apr 9, 2018

Update

We've evaluated the Jenkins throttling plugin. Its possible to throttle the parallel builds. Need to consider following facts.

  • Need to enable the category from the global configuration
  • To enable throttling we need to insert the following throttle construct either in Jenkinsfile or through pipeline library
// Throttle of a single operation
throttle(['category']) {
    node {
         ....
    }
}
  • User will not get any notification, plugin will silently queue the build request

At the end if two concurrent builds starts runing this is how pending job looks like.
image

@maxandersen
Copy link
Collaborator

unfortunate if the throttling has to be defined by the user in their jobs - then that kinda defeats the purpose.

@hrishin
Copy link

hrishin commented Apr 13, 2018

Update:

We have evaluated multiple approaches to limit concurrent build for the time being

  1. Jenkins throttling plugin
  2. Changing Jenkins configuration to limit the concurrent build execution
  3. Using jenkins proxy

Approach 2 is more feasible option where in we are changing <instanceCap>1</instanceCap> (https://github.com/fabric8-services/fabric8-tenant-jenkins/blob/master/apps/jenkins/src/main/fabric8/openshift-cm.yml#L322) with this only one job will be triggered and all other jobs will be queued which will not cause failures of build due to timeouts.

@hrishin
Copy link

hrishin commented Apr 16, 2018

Update:

Setting <containerCap>1</containerCap> parameter to 1 is limiting the one slave pod at a time while <instanceCap> is for limit the number of slaves to spin up. Setting <containerCap> to 1 will fix the #2384 as well for time being.

Will submit the PR to fix this issue.

@hrishin
Copy link

hrishin commented May 29, 2018

FYI @sthaha @jaseemabid @krishnapaparaju entry point for the build locker plugin to dispatch or block the job for execution.

@lordofthejars could you please help here to understand the plugin and Jenkins behaviour?

@krishnapaparaju
Copy link
Collaborator

krishnapaparaju commented May 29, 2018

Try this Regular expression , this works

' .*.* '

@lordofthejars
Copy link
Collaborator

@hrishin I have no idea what else to add. What I see is that if a build is occurring and you need a new agent then this agent is started, but it takes some time to get started so if another build enters to the system since the agent is not already started then it starts another one. As you said it seems a race condition. So I was never involved in any master/agent communication task, so the only thing that comes to my mind would modify the plugin (probably k8s plugin if we have the knowledge) to do the next task:

A request to create an agent comes, lock and create a file, and unlock. Then any other request just check this file, if it is present then it needs to wait until it disappears and after that create a new agent. Problem with that is that if you miss for any reason to delete this file after the first build finishes (maybe because a failure) then you are in a deadlock. So I don't like so much this solution because there is a risk of dead lock, but nothing else comes to my mind right now.

Other option might be lock the process until the agent is up and running so any other request is locked until the agent is up and running, it is a different way of doing but again we really need to take care about dead locks.

@jaseemabid
Copy link
Contributor

@krishnapaparaju Could you explain why .* wont work but .*.* works? We are shipping this to prod and we should know why things work or does not. Could you send a PR to build-blocker-plugin if its really an issue with the plugin?

@hrishin
Copy link

hrishin commented Jun 5, 2018

Update:

Using build blocker plugin Jenkins is able to throttle builds.

image

Final PR's for this issue:

  1. Changed build block plugin code for OSIO : fixes concurrent build issue fabric8-jenkins/build-blocker-plugin#1
  2. Fabric8 Jenkins Image

@hrishin
Copy link

hrishin commented Jun 5, 2018

accidentally closed by PR merge

@rupalibehera
Copy link
Collaborator

rupalibehera commented Jun 8, 2018

openshiftio/saas-openshiftio#904, raised PR to get this in Production
and raised an issue #3752

@hrishin
Copy link

hrishin commented Jun 9, 2018

Auto closed by PR merge

@hrishin hrishin reopened this Jun 9, 2018
@rupalibehera
Copy link
Collaborator

Closing this is should be in prod #3752.

@hrishin
Copy link

hrishin commented Jun 12, 2018

Since this fix is in the production, only one pipeline can run at a time. It would make sure not throwing random Soket time out exception .

@krishnapaparaju @pradeepto

@krishnapaparaju
Copy link
Collaborator

great. thanks @hrishin

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.