Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fdbmonitor fails to restart process sometimes when the configuration changes. for version 7.3.43 #11764

Closed
mpatou-openai opened this issue Nov 8, 2024 · 7 comments

Comments

@mpatou-openai
Copy link
Contributor

I'm using fdb in a kubernete cluster, I went on a pod and changed one parameter in the configuration for 1 pod and noticed that fdbmonitor picked up my change:

Time="1731041239.406780" Severity="10" LogGroup="fdb-cluster" Process="fdbmonitor": Loading configuration /var/dynamic-conf/fdbmonitor.conf
Time="1731041239.407008" Severity="10" LogGroup="fdb-cluster" Process="fdbmonitor": Updated configuration for fdbserver.1
Time="1731041239.407187" Severity="10" LogGroup="fdb-cluster" Process="fdbmonitor": Found new configuration for fdbserver.3
Time="1731041239.407378" Severity="10" LogGroup="fdb-cluster" Process="fdbmonitor": Updated configuration for fdbserver.2

But it didn't restart process for fdbserver.3 with the new settings, I tried to copy the file back and forth and the file change was noticed but still no changes:

Time="1731041240.468774" Severity="10" LogGroup="fdb-cluster" Process="fdbmonitor": Watching conf file /var/dynamic-conf/fdbmonitor.conf
Time="1731041240.468820" Severity="10" LogGroup="fdb-cluster" Process="fdbmonitor": Watching conf dir /var/dynamic-conf/ (36)
Time="1731041240.468829" Severity="10" LogGroup="fdb-cluster" Process="fdbmonitor": Loading configuration /var/dynamic-conf/fdbmonitor.conf
Time="1731041240.469060" Severity="10" LogGroup="fdb-cluster" Process="fdbmonitor": Updated configuration for fdbserver.1
Time="1731041240.469137" Severity="10" LogGroup="fdb-cluster" Process="fdbmonitor": Updated configuration for fdbserver.3
Time="1731041240.469190" Severity="10" LogGroup="fdb-cluster" Process="fdbmonitor": Updated configuration for fdbserver.2
@spraza
Copy link
Collaborator

spraza commented Nov 8, 2024

Not sure if this is a Kubernetes specific issue. @johscheuer seen anything similar?

@mpatou-openai
Copy link
Contributor Author

I guess the way to double check that would be to run a cluster with let's say docker compose and see if it is manifesting there ?

@spraza
Copy link
Collaborator

spraza commented Nov 8, 2024

I guess the way to double check that would be to run a cluster with let's say docker compose and see if it is manifesting there ?

Right, that's one approach. Although looking at this again, had another question (below).

Time="1731041239.406780" Severity="10" LogGroup="fdb-cluster" Process="fdbmonitor": Loading configuration /var/dynamic-conf/fdbmonitor.conf
Time="1731041239.407008" Severity="10" LogGroup="fdb-cluster" Process="fdbmonitor": Updated configuration for fdbserver.1
Time="1731041239.407187" Severity="10" LogGroup="fdb-cluster" Process="fdbmonitor": Found new configuration for fdbserver.3
Time="1731041239.407378" Severity="10" LogGroup="fdb-cluster" Process="fdbmonitor": Updated configuration for fdbserver.2

I am assuming all fdbserver processes are running on the same pod, the logs of which are above. If that's correct, can you describe what was the configuration change? It looks like fdbserver.1 and fdbserver.2 were already in the fdbmonitor config, and you added a new fdbserver.3 section.

@mpatou-openai
Copy link
Contributor Author

So in this particular change I went on the pod and changed the one knob option for one process, I wanted to see if the process would be restarted once I changed the setting and it seems it was not, fdbmonitor seemed to have picked up the change as indicated in the log but not restarted the process whatever was the reason.
The knob I changed is knob_max_storage_server_watch_bytes = 1100000000 (it as 1GB before)

@mpatou-openai
Copy link
Contributor Author

yeah I did more investigation at least in kubernetes, fdbmonitor don't restart the processes, it does noticed the change but somehow decides to not restart processes.

@mpatou-openai
Copy link
Contributor Author

Actually I just realized that the operator is configuring with kill_on_configuration_change = false so this is a false positive.

@johscheuer
Copy link
Contributor

Correct the operator configures fdbmonitor to not restart fdbserver processes when the configuration changes: https://github.com/FoundationDB/fdb-kubernetes-operator/blob/main/internal/monitor_conf.go#L95. The reason for this is to allow the operator to orchestrate the restart of the processes. I added some more details in my comment here: FoundationDB/fdb-kubernetes-operator#2161 (comment)

I would be closing this issue as the fdbmonitor is doing what it should be doing (detecting the change but not restarting the processes).

If you think there is more to discuss on the fdbmonitor side, feel free to reopen the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants