Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request timeout during cluster bootstrap #473

Open
roosterfish opened this issue Nov 19, 2024 · 1 comment
Open

Request timeout during cluster bootstrap #473

roosterfish opened this issue Nov 19, 2024 · 1 comment

Comments

@roosterfish
Copy link
Contributor

roosterfish commented Nov 19, 2024

As part of the MicroCloud test suite we are deploying single node MicroCeph clusters in a lot of situations to simulate already existing MicroCeph clusters on different nodes.
We do this by running microceph cluster bootstrap.

On possibly slow systems we recently saw that firing requests against the /1.0/configs API right after microceph cluster bootstrap returns might not always succeed and run into a timeout as the context deadline of the respective MicroCeph GetConfig client function expires.
MicroCloud is using this endpoint to retrieve information about MicroCeph required for bootstrapping.

We found that in the pipeline blocking until microceph status reports Services: mds, mgr, mon will fix it but it looks microceph cluster bootstrap could wait a bit longer and return only if the API can respond.

I made the following test and hit the /1.0/configs endpoint with requests whilst running microceph cluster bootstrap on another terminal:

# Start this before running the bootstrap
while true; do curl --unix-socket /var/snap/microceph/common/state/control.socket http:/ceph/1.0/configs -X GET -d '{}' -s | awk '{print strftime("%r") " " $1}' | tee -a log; sleep .1; done

# In another terminal
microceph cluster bootstrap

See the log here microceph_bootstrap_monitor.txt
Something interesting (maybe the actual issue) is around 03:08:26 PM. There you see a window of around 2 seconds in which the request didn't receive a response.
The microceph cluster bootstrap command was started around this time.
It could be that on slow systems this exceeds the 5s timeout which is set in the GetConfig MicroCeph client func.

@arpadmuller
Copy link

arpadmuller commented Nov 19, 2024

I have the same issue.

Hardware: Intel CPU (3Ghz), 8Gb ram
OS: Ubuntu server 24.04 (fresh install)

Commands:

$ sudo snap install microceph 
microceph (squid/stable) 19.2.0+snap9aeaeb2970 from Canonical✓ installed

$ sudo snap refresh --hold microceph
General refreshes of "microceph" held indefinitely

$ sudo microceph cluster bootstrap
Error: Post "http://control.socket/core/control": context deadline exceeded

Log:

2024-11-19T19:47:21Z microceph.daemon[23231]: time="2024-11-19T19:47:21Z" level=debug msg="start: database not ready, waiting..."
2024-11-19T19:47:31Z microceph.daemon[23231]: time="2024-11-19T19:47:31Z" level=debug msg="start: database not ready, waiting..."
2024-11-19T19:47:41Z microceph.daemon[23231]: time="2024-11-19T19:47:41Z" level=debug msg="start: database not ready, waiting..."
2024-11-19T19:47:51Z microceph.daemon[23231]: time="2024-11-19T19:47:51Z" level=debug msg="start: database not ready, waiting..."
2024-11-19T19:48:01Z microceph.daemon[23231]: time="2024-11-19T19:48:01Z" level=debug msg="start: database not ready, waiting..."
2024-11-19T19:48:11Z microceph.daemon[23231]: time="2024-11-19T19:48:11Z" level=debug msg="start: database not ready, waiting..."
2024-11-19T19:48:21Z microceph.daemon[23231]: time="2024-11-19T19:48:21Z" level=debug msg="start: database not ready, waiting..."
2024-11-19T19:48:31Z microceph.daemon[23231]: time="2024-11-19T19:48:31Z" level=debug msg="start: database not ready, waiting..."
2024-11-19T19:48:41Z microceph.daemon[23231]: time="2024-11-19T19:48:41Z" level=debug msg="start: database not ready, waiting..."
2024-11-19T19:48:51Z microceph.daemon[23231]: time="2024-11-19T19:48:51Z" level=debug msg="start: database not ready, waiting..."
```

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants