-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
github: Use Canonical runners for scheduled system tests #469
Conversation
8af0edc
to
4f38ed2
Compare
159ed91
to
44b77ae
Compare
@masnax I did some tests regarding this error. Unfortunately the timeout set in the MicroCeph I have bootstrapped a single node MicroCloud and fired requests to |
Well this current failure is happening long before MicroCloud is bootstrapped, as it happens right after system discovery and before asking any setup questions. In the bootstrap case, the only delay would be related to refreshing the truststore and waiting for the lock, but even that wouldn't happen on a single-node request as it all goes through the unix socket which skips truststore verification. When bootstrapping, the listeners also restart, so that could be the delay you're seeing locally. But again that wouldn't affect the test failure since it's not during bootstrap.
this is the whole local proxy block in MicroCloud so it's definitely not waiting for anything here. Since it's a network request, there is the additional overhead of authHandlerMTLS pulling the truststore. |
7c6d5ff
to
ae01100
Compare
Mh it looks we can fix it by waiting for |
Is this something that can be checked over the API? Perhaps |
d8a0c34
to
56c1b8a
Compare
Issue logged in canonical/microceph#473. |
62ec1f0
to
222713b
Compare
@MggMuggins have you ever seen this one https://github.com/canonical/microcloud/actions/runs/12031420307/job/33588930317?pr=469#step:17:1490? It might be that this is caused because the runners are slower. We already saw various others scenarios caused by the "slowness". Maybe you have an idea. |
IIRC the trust store is held in memory in MicroCluster and synchronized (fanotify?); I think the recovery process doesn't use the in-memory synchronization because it's expected that it won't be accessing the trust store at the same time as any other thread. I wonder if this was an incorrect assumption. I'll try and take a look this afternoon. |
09bc020
to
fbbdbbe
Compare
I have downloaded the logs for the failed PR run and will plan to look into this next pulse. |
fbbdbbe
to
68e771e
Compare
cf0fcd4
to
1e31a6e
Compare
fb5cff6
to
7be4bc3
Compare
7be4bc3
to
22fd4da
Compare
needs a rebase please |
Ensure MicroCeph is fully started after bootstrapping to prevent running into timeouts if the test suite is too fast. Signed-off-by: Julian Pelizäus <[email protected]>
When trying to install the LXD snap but it already exists, the exit code isn't >0 so the refresh will never happen. Signed-off-by: Julian Pelizäus <[email protected]>
Signed-off-by: Julian Pelizäus <[email protected]>
This allows reducing the time between sending the password and starting to listen on join intents. On slow test runners we saw errors because the initiator hasn't yet started to listen on join intents but potential joiners where already dialing in with the passphrase. Signed-off-by: Julian Pelizäus <[email protected]>
Signed-off-by: Julian Pelizäus <[email protected]>
Signed-off-by: Julian Pelizäus <[email protected]>
This allows having a much cleaner matrix definition grouped by core, upgrade and Canonical specific system tests. Signed-off-by: Julian Pelizäus <[email protected]>
This allows reuse of the system test steps for all groups core, upgrade and Canonical specific tests. Signed-off-by: Julian Pelizäus <[email protected]>
22fd4da
to
e035c9b
Compare
@tomponline @masnax rebased. |
This PR splits the rather complex matrix system test into three groups.
The system test code is moved into a repo local action which is leveraged by each of the groups.
See an example of the restructured tests workflow here.
In addition the "instances" suite is now executed also on the Canonical runners when the workflow is triggered by schedule.