Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frequent ARM failures #1738

Closed
BridgeAR opened this issue Mar 22, 2019 · 10 comments
Closed

Frequent ARM failures #1738

BridgeAR opened this issue Mar 22, 2019 · 10 comments

Comments

@BridgeAR
Copy link
Member

Recently there have been a lot of ARM failures such as: https://ci.nodejs.org/job/node-test-commit-arm/23076/nodes=ubuntu1604-arm64/console

@refack
Copy link
Contributor

refack commented Mar 22, 2019

It's related to manual testing being done on the machine. We're trying to setup better procedures WRT such usage that should minimize interference.

@richardlau
Copy link
Member

The problem is the following with the clean leftover processes code (even though it's not supposed to error -- possibly because the processes belong to a different user than being used to run the Jenkins job?).

01:05:17 Clean up any leftover processes but don't error if found.
01:05:17 ps awwx | grep Release/node | grep -v grep | cat
01:05:18 41622 ?        Rl     0:10 out/Release/node --expose-internals /var/tmp/shigeki/node_v8_71/test/parallel/test-heapdump-inspector.js
01:05:18 41643 ?        Rl     0:10 out/Release/node --expose-internals /var/tmp/shigeki/node_v8_71/test/parallel/test-heapdump-zlib.js
01:05:18 41863 ?        Sl     0:09 out/Release/node --expose-internals /var/tmp/shigeki/node_v8_71/test/parallel/test-heapdump-dns.js
01:05:18 41880 ?        Sl     0:06 out/Release/node --expose-internals /var/tmp/shigeki/node_v8_71/test/parallel/test-heapdump-http2.js
01:05:18 41881 ?        Rl     0:08 out/Release/node --expose-internals /var/tmp/shigeki/node_v8_71/test/parallel/test-heapdump-fs-promise.js
01:05:18 41909 ?        Rl     0:07 out/Release/node --expose-internals /var/tmp/shigeki/node_v8_71/test/parallel/test-heapdump-tls.js
01:05:18 41920 ?        Sl     0:08 out/Release/node --expose-internals --experimental-worker /var/tmp/shigeki/node_v8_71/test/parallel/test-heapdump-worker.js
01:05:18 42827 ?        Sl     0:00 out/Release/node /var/tmp/shigeki/node_v8_71/test/parallel/test-http-client-timeout-option-with-agent.js
01:05:18 43261 ?        Rl     0:01 out/Release/node /var/tmp/shigeki/node_v8_71/test/parallel/test-http-pipeline-flood.js
01:05:18 43289 ?        Rl     0:02 out/Release/node /var/tmp/shigeki/node_v8_71/test/parallel/test-http-pipeline-requests-connection-leak.js
01:05:18 43524 ?        Sl     0:00 /var/tmp/shigeki/node_v8_71/out/Release/node /var/tmp/shigeki/node_v8_71/test/parallel/test-http-pipeline-flood.js child 44623
01:05:18 43647 ?        Sl     0:00 out/Release/node /var/tmp/shigeki/node_v8_71/test/parallel/test-http-server-keep-alive-timeout.js
01:05:18 43819 ?        Rl     0:00 out/Release/node /var/tmp/shigeki/node_v8_71/test/parallel/test-http-response-splitting.js
01:05:18 43830 ?        Sl     0:00 out/Release/node /var/tmp/shigeki/node_v8_71/test/parallel/test-http-set-timeout-server.js
01:05:18 43840 ?        Rl     0:00 out/Release/node /var/tmp/shigeki/node_v8_71/test/parallel/test-http-should-keep-alive.js
01:05:18 43851 ?        Rl     0:00 out/Release/node /var/tmp/shigeki/node_v8_71/test/parallel/test-http-response-statuscode.js
01:05:18 43855 ?        Rl     0:00 out/Release/node /var/tmp/shigeki/node_v8_71/test/parallel/test-http-status-code.js
01:05:18 43874 ?        Rl     0:00 out/Release/node /var/tmp/shigeki/node_v8_71/test/parallel/test-http-timeout-overflow.js
01:05:18 43889 ?        Rl     0:00 out/Release/node /var/tmp/shigeki/node_v8_71/test/parallel/test-http-unix-socket-keep-alive.js
01:05:18 43904 ?        Rl     0:00 out/Release/node /var/tmp/shigeki/node_v8_71/test/parallel/test-http-outgoing-settimeout.js
01:05:18 43920 ?        Rl     0:00 out/Release/node /var/tmp/shigeki/node_v8_71/test/parallel/test-http-outgoing-proto.js
01:05:18 43925 ?        Rl     0:00 out/Release/node /var/tmp/shigeki/node_v8_71/test/parallel/test-http-server-multiheaders2.js
01:05:18 43931 ?        Rl     0:00 out/Release/node /var/tmp/shigeki/node_v8_71/test/parallel/test-http-upgrade-binary.js
01:05:18 43943 ?        Rl     0:00 out/Release/node /var/tmp/shigeki/node_v8_71/test/parallel/test-http-upgrade-client.js
01:05:18 43963 ?        Rl     0:00 out/Release/node /var/tmp/shigeki/node_v8_71/test/parallel/test-http-server-options-server-response.js
01:05:18 43976 ?        Rl     0:00 out/Release/node /var/tmp/shigeki/node_v8_71/test/parallel/test-http-upgrade-server.js
01:05:18 43995 ?        Rl     0:00 out/Release/node /var/tmp/shigeki/node_v8_71/test/parallel/test-http-url.parse-auth.js
01:05:18 44005 ?        Rl     0:00 out/Release/node /var/tmp/shigeki/node_v8_71/test/parallel/test-http-server-reject-cr-no-lf.js
01:05:18 44008 ?        Rl     0:00 out/Release/node /var/tmp/shigeki/node_v8_71/test/parallel/test-http-url.parse-basic.js
01:05:18 44022 ?        Rl     0:00 out/Release/node /var/tmp/shigeki/node_v8_71/test/parallel/test-http-server-unconsume-consume.js
01:05:18 44024 ?        Rl     0:00 out/Release/node /var/tmp/shigeki/node_v8_71/test/parallel/test-http-url.parse-path.js
01:05:18 44029 ?        Rl     0:00 out/Release/node /var/tmp/shigeki/node_v8_71/test/parallel/test-http-server-write-after-end.js
01:05:18 Makefile:448: recipe for target 'clear-stalled' failed
01:05:18 make[1]: *** [clear-stalled] Error 123
01:05:18 Makefile:532: recipe for target 'run-ci' failed

cc @nodejs/build

@refack
Copy link
Contributor

refack commented Mar 22, 2019

possibly because the processes belong to a different user

Yes, those tests are being run right now (under the root user). I guess it's better to fail the cleanup then to succeed, invalidate the manual testing, and probably generated other unexplained flakes...

@Trott
Copy link
Member

Trott commented Mar 22, 2019

I marked the node offline. When someone is doing manual testing like that, that's probably the thing to do?

@refack
Copy link
Contributor

refack commented Mar 22, 2019

I marked the node offline. When someone is doing manual testing like that, that's probably the thing to do?

For those who are given access to machines, but don't have "Jenkins/disable" privileges, the thing to do is kill the jenkins daemon.

@Trott
Copy link
Member

Trott commented Mar 22, 2019

Might want to contact everyone on our list of folks who have shell access and ask that they hop into #node-build on Freenode IRC and get the machine taken offline before doing a bunch of stuff?

@refack
Copy link
Contributor

refack commented Mar 22, 2019

We ask... people forget (me included).

BTW: list of access requests is radiated by https://github.com/nodejs/build/projects/2

@Trott
Copy link
Member

Trott commented Mar 22, 2019

list of access requests is radiated

Nice!

Since ARM failures seem to have stopped, should we close this?

There's 228 issues open in this repo. 😱 Might go through and close a few old and inactive ones now....

@refack
Copy link
Contributor

refack commented Mar 22, 2019

I say keep this open till the end of the day, just for visibility, in case someone sees this failure and want to see if it was already reported...

There's 228 issues open in this repo.

You can definatly see my influence
image
(hay I have a lot of thing I want to look at $LATER)

@refack
Copy link
Contributor

refack commented Mar 23, 2019

Ahhh
image

@refack refack closed this as completed Mar 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants