Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: capture ceph and salt logs for failed tests #691

Closed
tserong opened this issue Apr 18, 2023 · 6 comments
Closed

ci: capture ceph and salt logs for failed tests #691

tserong opened this issue Apr 18, 2023 · 6 comments
Assignees

Comments

@tserong
Copy link
Member

tserong commented Apr 18, 2023

We've recently had intermittent failures during deployment when running sesdev-integration in jenkins. The relevant part of the sesdev output is:

    master: [2023-04-14 04:55:32.031051] [master.mini.test] [STAGE] [BEGIN] Prepare to bootstrap the Ceph cluster
    master: [2023-04-14 04:55:32.079906] [master.mini.test] [STEP ] [BEGIN] Download ceph container image
    master: 
    master: Finished execution of ceph-salt formula
    master: 
    master: Summary: Total=1 Succeeded=0 Warnings=0 Failed=1
    master: "ceph-salt apply" exit code: 0
    [...]
    master: +++ ssh master.mini.test cephadm ls
    master: MONs in cluster (actual/expected): 1/1 (3080 seconds to timeout)
    master: MGRs in cluster (actual/expected): 1/1 (3080 seconds to timeout)
    master: +++ set +x
    master: ++ ceph status
    master: 2023-03-24T10:19:36.718+0000 7f3ef1ca5700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
    master: 2023-03-24T10:19:36.718+0000 7f3ef1ca5700 -1 AuthRegistry(0x7f3eec05ed08) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx
    master: 2023-03-24T10:19:36.818+0000 7f3ef1ca5700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
    master: 2023-03-24T10:19:36.818+0000 7f3ef1ca5700 -1 AuthRegistry(0x7f3ef1ca3f50) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx
    master: 2023-03-24T10:19:36.858+0000 7f3eeb7fe700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
    master: 2023-03-24T10:19:36.862+0000 7f3ef1ca5700 -1 monclient: authenticate NOTE: no keyring found; disabled cephx authentication
    master: [errno 13] RADOS permission denied (error connecting to the cluster)

As best as I can tell, it does at least part of the bootstrap, but then something fails and it doesn't get any further. I think it's extremely weird that we get Summary: Total=1 Succeeded=0 Warnings=0 Failed=1 but with no other error output saying what failed. Also, I'm surprised ceph-salt apply has an exit code of zero in this case.

I tried to add some extra logging to sesdev itself if deployment fails at this point (see #689), but haven't been able to reproduce the problem since my latest addition of ceph-salt.log so I still don't know what the problem is (or was), and anyway, it really shouldn't be up to sesdev to capture log files.

Can we make jenkins capture /var/log/ceph/*, /var/log/salt/* and /var/log/ceph-salt.log when tests fail? That way next time we hit this error I'll hopefully be able to figure out what's going wrong.

@kshtsk
Copy link
Contributor

kshtsk commented Apr 19, 2023

@tserong does sesdev already copies those logs somewhere or has an interface for grabbing it?

@kshtsk kshtsk assigned tserong and unassigned kshtsk Apr 19, 2023
@tserong
Copy link
Member Author

tserong commented Apr 20, 2023

@tserong does sesdev already copies those logs somewhere or has an interface for grabbing it?

No, and it shouldn't need to. When we run sesdev manually, if it fails, we can log in to the nodes it deploys and inspect the logs if we need to. When run inside jenkins, this is not possible (jenkins destroys all the instances, right?) so AIUI jenkins needs to capture the logs for later analysis if something fails.

Edit to clarify: you can use sesdev scp to copy files off the nodes it deploys, so maybe that could be used to capture logs inside jenkins? The storage-ci-devel-s7p (and other jobs) grab these log files somehow. I don't know how that's done, but whatever the trick is, can it be applied to the sesdev CI infra as well?

@tserong tserong assigned kshtsk and unassigned tserong Apr 20, 2023
@tserong
Copy link
Member Author

tserong commented Apr 21, 2023

Just for further clarity, when I said "No, and it shouldn't need to", I meant sesdev doesn't have a specific function for copying those log files. There is sesdev supportconfig but I don't think we want to use that, because it only works on SLE deployments (not openSUSE), and is rather heavier that what we probably generally want to capture to diagnose sesdev failures.

I just did a bit of experimentation with sesdev scp, and I think we'll be good if we can add the following three commands when sedev fails inside jenkins:

sesdev scp -r mini master:/var/log/ceph /wherever/we/want/to/save/the/logs/
sesdev scp -r mini master:/var/log/salt /wherever/we/want/to/save/the/logs/
sesdev scp mini master:/var/log/ceph-salt.log /wherever/we/want/to/save/the/logs/

That should work because the sesdev deployment inside jenkins is just a single node, and the deployment name is "mini".

@kshtsk
Copy link
Contributor

kshtsk commented May 3, 2023

The storage-ci-devel-s7p (and other jobs) grab these log files somehow. I don't know how that's done, but | whatever the trick is, can it be applied to the sesdev CI infra as well?

The sesdev jobs are using pipelines, when the ses devel jobs are using straight jobs where can the archive files plugin can be used. I don't see right now a way of doing it in pipelines, maybe some investigation required.

@kshtsk
Copy link
Contributor

kshtsk commented May 3, 2023

Rather than just dump logs to the output?

@kshtsk
Copy link
Contributor

kshtsk commented May 3, 2023

Trying out something in #695

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants