Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access request for addaleax on ubuntu1804_sharedlibs_zlib_x64 #2063

Closed
addaleax opened this issue Nov 26, 2019 · 17 comments
Closed

Access request for addaleax on ubuntu1804_sharedlibs_zlib_x64 #2063

addaleax opened this issue Nov 26, 2019 · 17 comments

Comments

@addaleax
Copy link
Member

Basically, I’d like to have access to whichever this machine is

https://ci.nodejs.org/job/node-test-commit-linux-containered/nodes=ubuntu1804_sharedlibs_zlib_x64/16212/consoleFull

in order to look into nodejs/node#30648

@Trott
Copy link
Member

Trott commented Nov 26, 2019

+1

The crash that will be investigated with this access is happening an awful lot.

@Trott
Copy link
Member

Trott commented Nov 26, 2019

The machine in question would be (I think) test-digitalocean-ubuntu1804_sharedlibs_container-x64-7. Access to the containers is not something I ever educated myself about so I'm not sure what's involved.

@gireeshpunathil
Copy link
Member

+1

@Trott
Copy link
Member

Trott commented Nov 26, 2019

@nodejs/build

@sam-github
Copy link
Contributor

+1 @addaleax why not join the build-wg, so you have the keys to all the test machines, and this manual process can be avoided?

Rod posted a container debugging howto in an issue recently, I've never done any container debugging myself.

@addaleax
Copy link
Member Author

@sam-github As long as everyone’s cool with me trying to mostly focus on flaky tests and not the rest, I’m happy to do so. That being said, this process here is usually very smooth anyway :)

@sam-github
Copy link
Contributor

@addaleax There are no expectations on what volunteers must do :-). You don't even have to show for meetings, just don't break things (which I know you won't).

As for this request, I can add you, I need your ssh public key and the name of the machine to add it to.

@addaleax
Copy link
Member Author

As for this request, I can add you, I need your ssh public key and the name of the machine to add it to.

@sam-github SSH keys for Github users can be found at e.g. https://github.com/addaleax.keys :) The first one should be enough. And for the machine name, @Trott said above:

The machine in question would be (I think) test-digitalocean-ubuntu1804_sharedlibs_container-x64-7.

@sam-github
Copy link
Contributor

When users have multiple keys, its not clear which one to use.

I added the first key to root@test-digitalocean-ubuntu1804-docker-x64-1, because test-digitalocean-ubuntu1804_sharedlibs_container-x64-7 isn't a machine, possibly its a virtual name for a container that can run on the host I added you to.

Sorry, I've never touched the container build, I've no idea how it works. I think @rvagg posted a comment in the last weeks describing how to get into it and trouble shoot in quite some detail, but I couldn't find it just now.

@richardlau
Copy link
Member

Sorry, I've never touched the container build, I've no idea how it works. I think @rvagg posted a comment in the last weeks describing how to get into it and trouble shoot in quite some detail, but I couldn't find it just now.

nodejs/node#29977 (comment)?

@addaleax
Copy link
Member Author

I’m not sure how this works exactly, but could the node (only that container) be taken offline for a bit? My docker fu isn’t great, so I don’t quite know what would be involved in trying to replicate this elsewhere, but this is reproducible enough that it might be reasonably debuggable?

@sam-github
Copy link
Contributor

Only the nodes on https://ci.nodejs.org/computer/ can be set offline , to my knowlege, and that doesn't have any containers.

You might be able to do docker run -it IMAGE bash to run a new container from the same base image, that should be isolated.

Otherwise, I can take the whole host https://ci.nodejs.org/computer/test-digitalocean-ubuntu1604-x86-1/, offline for you. I suspect that only build wg members can see the offline button.

@richardlau
Copy link
Member

I’m not sure how this works exactly, but could the node (only that container) be taken offline for a bit? My docker fu isn’t great, so I don’t quite know what would be involved in trying to replicate this elsewhere, but this is reproducible enough that it might be reasonably debuggable?

I've marked https://ci.nodejs.org/computer/test-digitalocean-ubuntu1804_sharedlibs_container-x64-7/ offline with a comment linking back to this issue.

@rvagg
Copy link
Member

rvagg commented Nov 26, 2019

[email protected]

on these machines, the images are running full-time with the jenkins client inside of them, so you'll want to jump into the container in question rather than starting a new one.

docker ps can show you what's running, you'll see one with the image name node-ci:test-digitalocean-ubuntu1804_sharedlibs_container-x64-7: 61ea0f8fb833 (it's also labelled node-ci-test-digitalocean-ubuntu1804_sharedlibs_container-x64-7 which you can use too but less confusing to just find the image id).

docker exec -ti 61ea0f8fb833 bash will jump you in to the container. You'll be user iojs and need to navigate to /home/iojs/build/workspace/... like normal and you should be able to use it as a standard test server. If you need to be root to install or adjust anything, use -u root in your docker exec and you can then do what you like in it. If you do this, we'll just need to restart the container after you're done to reset it but it's not a big deal if you need to modify it.

@addaleax
Copy link
Member Author

@rvagg Thanks for the help! I figured most of that out by now, but it’s good to know what the situation is for installing packages :)

I did install basic debugging tools (gdb, valgrind, strace), feel free to restart the machine.

I think can reproduce the issue locally now, even though not as frequently as on the container, so it should be safe to take the node online again.

@Trott
Copy link
Member

Trott commented Nov 27, 2019

I think can reproduce the issue locally now, even though not as frequently as on the container, so it should be safe to take the node online again.

I've brought the node back online. I guess if we remove @addaleax's SSH key, we can close this issue.

@addaleax
Copy link
Member Author

I think can reproduce the issue locally now, even though not as frequently as on the container, so it should be safe to take the node online again.

I've brought the node back online. I guess if we remove @addaleax's SSH key, we can close this issue.

Removed my key 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants