-
Notifications
You must be signed in to change notification settings - Fork 570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Travis builds are hanging mid-print #4126
Comments
I put the diagnostics into PR #4127 and it reproduces. The hang seems to most likely be at the fork call:
|
Well, I added some prints before and around the fork, around the child exec, and I put in alarms in the parent while it tees the child stdout: and with that in there I can't reproduce the hang. Strange. Going to chalk it up to some perl bug and move on. |
It looks like the problem is still there. I left in the -VV and 30s alarms, and we can see the build hang mid-print! https://travis-ci.com/DynamoRIO/dynamorio/jobs/295543355
It seems like some Travis issue doesn't it? Our build is not going to do that on its own. It is strange that it has only been in the cross-compile job. |
Actually what I just pasted was from the x86 non-cross-compile package build: so it is not limited to cross-compiling. |
Travis says "Ran for 21 min 25 sec". So it built for 10 mins and then mid-print it hung for 10 mins. ?!? |
Happened again, breaking the cronbuild -- grr, raising priority. https://travis-ci.com/github/DynamoRIO/dynamorio/jobs/298058837
Could it possibly be the perl signal handler interrupting a print and Perl 5.8+ has "safe signals": waits until back in regular interp loop to https://docs.travis-ci.com/user/reference/xenial/#perl-support So it has the safe signals. |
I did a re-build and it did the exact same thing. ?!#? Searching doesn't show much. This seems to only happen to actual no-output cases. There is "travis_wait" but it doesn't seem like that's the problem here: but we could try it. It has no output until the end (kind of ironic....) but maybe we could live with that. |
It happens consistently at about this point: "Ran for 21 min 37 sec". And it consistently has 20 instances of "30s elapsed". That does not sound like some perl alarm signal issue. It sounds like a Travis 10-minute-old process issue. Yet the other builds include jobs that take >10 mins. E.g. this one took 15 mins and has 29 "30s elapsed" messages: So what's the difference between those >10min succeeding jobs and the ones that hang: the cross-compile that used to hang plus this cronbuild? |
Cronbuild failed again. This is really blocking us. I'm going to try to repro in a cronbuild where we can actually experiment: Travis refuses to deploy there but hopefully the same build will repro. |
I got it to repro in a custom PR job. I then removed the perl signal: same behavior, so that is not the culprit. |
I tried prefixing the runsuite_wrapper command line with "travis_wait 45" and that succeeded: https://travis-ci.com/github/DynamoRIO/dynamorio/builds/154398973
It prints nothing incremental is the downside. It would be simplest to put it in for all builds even though we only need it for the package build so I will probably just do that and we'll live without incremental output (again, pretty ironic). |
Filing this since it keeps happening. PR #4119 was my attempt to diagnose it.
The cross-compile job (the 3rd job) has been repeatedly failing ever since PR #4118 except in the diagnostic attempt. It looks like this:
Very strange.
The text was updated successfully, but these errors were encountered: