-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Salt master not deploy correctly because another salt master is already running. #2840
Labels
kind:bug
Something isn't working
topic:flakiness
Some test are flaky and cause CI to do transient failing
Comments
MonPote
added
kind:bug
Something isn't working
topic:flakiness
Some test are flaky and cause CI to do transient failing
labels
Oct 9, 2020
Got a similar error, but this time it's the
and indeed:
Happened when I undeploy old solution/deploy a new one. |
got this when importing a new metalk8s iso |
gdemonet
added a commit
that referenced
this issue
Dec 28, 2020
When using `metalk8s.static_pod_managed`, we call `file.managed` behind the scenes. This state does a lot of magic, including creating a temporary file with the new contents before replacing the old file. This temp file gets created **in the same directory** as the managed file by default, so it gets picked up by `kubelet` as if it were another static Pod to manage. If the replacement occurs too late, `kubelet` may have already created another Pod for the temp file, and may not be able to "remember" the old Pod, hence not cleaning it up. This results in "rogue containers", which can create issues (e.g. preventing new containers from binding some ports on the host). This commit ensures we create the temp files in `/tmp` (unless specified otherwise), which should prevent the aforementioned situation from happening. Fixes: #2840
gdemonet
added a commit
that referenced
this issue
Dec 28, 2020
When using `metalk8s.static_pod_managed`, we call `file.managed` behind the scenes. This state does a lot of magic, including creating a temporary file with the new contents before replacing the old file. This temp file gets created **in the same directory** as the managed file by default, so it gets picked up by `kubelet` as if it were another static Pod to manage. If the replacement occurs too late, `kubelet` may have already created another Pod for the temp file, and may not be able to "remember" the old Pod, hence not cleaning it up. This results in "rogue containers", which can create issues (e.g. preventing new containers from binding some ports on the host). This commit ensures we create the temp files in `/tmp` (unless specified otherwise), which should prevent the aforementioned situation from happening. Fixes: #2840
gdemonet
added a commit
that referenced
this issue
Jan 7, 2021
When using `metalk8s.static_pod_managed`, we call `file.managed` behind the scenes. This state does a lot of magic, including creating a temporary file with the new contents before replacing the old file. This temp file gets created **in the same directory** as the managed file by default, so it gets picked up by `kubelet` as if it were another static Pod to manage. If the replacement occurs too late, `kubelet` may have already created another Pod for the temp file, and may not be able to "remember" the old Pod, hence not cleaning it up. This results in "rogue containers", which can create issues (e.g. preventing new containers from binding some ports on the host). This commit reimplements the 'file.managed' state in a minimal fashion, to ensure the temporary file used for making an "atomic replace" is ignored by kubelet. Note that it requires us to also reimplement the 'file.manage_file' execution function, since it always relies on the existing "atomic copy" operation from `salt.utils.files.copyfile`. Fixes: #2840
gdemonet
added a commit
that referenced
this issue
Jan 7, 2021
When using `metalk8s.static_pod_managed`, we call `file.managed` behind the scenes. This state does a lot of magic, including creating a temporary file with the new contents before replacing the old file. This temp file gets created **in the same directory** as the managed file by default, so it gets picked up by `kubelet` as if it were another static Pod to manage. If the replacement occurs too late, `kubelet` may have already created another Pod for the temp file, and may not be able to "remember" the old Pod, hence not cleaning it up. This results in "rogue containers", which can create issues (e.g. preventing new containers from binding some ports on the host). This commit reimplements the 'file.managed' state in a minimal fashion, to ensure the temporary file used for making an "atomic replace" is ignored by kubelet. Note that it requires us to also reimplement the 'file.manage_file' execution function, since it always relies on the existing "atomic copy" operation from `salt.utils.files.copyfile`. Fixes: #2840
gdemonet
added a commit
that referenced
this issue
Jan 7, 2021
When using `metalk8s.static_pod_managed`, we call `file.managed` behind the scenes. This state does a lot of magic, including creating a temporary file with the new contents before replacing the old file. This temp file gets created **in the same directory** as the managed file by default, so it gets picked up by `kubelet` as if it were another static Pod to manage. If the replacement occurs too late, `kubelet` may have already created another Pod for the temp file, and may not be able to "remember" the old Pod, hence not cleaning it up. This results in "rogue containers", which can create issues (e.g. preventing new containers from binding some ports on the host). This commit reimplements the 'file.managed' state in a minimal fashion, to ensure the temporary file used for making an "atomic replace" is ignored by kubelet. Note that it requires us to also reimplement the 'file.manage_file' execution function, since it always relies on the existing "atomic copy" operation from `salt.utils.files.copyfile`. Fixes: #2840
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
kind:bug
Something isn't working
topic:flakiness
Some test are flaky and cause CI to do transient failing
Component: salt
What happened:
On a fresh install,
salt-master
is sometime not deployed correctly because another salt-master is already running a using the port.After some checking we figure out that this come from a rogue salt-master that escaped metalk8s tracking.
(container is still running, but not shown by kubectl)
Stoping and removing both
salt-master
containers solve the issue.What was expected:
When you deployed a fresh bootstrap,
salt-master
should be deployed correctly.Steps to reproduce
After some discussions with @slaperche-scality , this flaky can also happen when you
deploy/undeploy
a solution.So the best way to reproduce this is to
deploy
andundeploy
a (complex ?) solution.It is maybe a bug when salt is restarting.
Resolution proposal (optional):
The text was updated successfully, but these errors were encountered: