-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] After Upgrading, Windows Minion RAM Usage 7GB+ #62706
Comments
A little more context before I start restarting minions:
(Same server as above) |
Did you upgrade through each major version, or jump straight to 3005? If you install |
This was a clean uninstall/install to move to the new directory structure (so complete removal of the The salt-minion installer broke when attempting to upgrade (spinned for days), so manual removal and reinstall (we also moved from the EXE to MSI, the later is much easier to debug). How would one use
|
As the quote says, probably by using Process Explorer instead of the Windows Process Manager. |
|
Ah, found it, it creates a handle via a system mutex.
Looks like the processes used for scheduled runs keep running after completion. I don't see these jid's being reported back to the salt-master. |
There was a change in schedule config at some point, though I don't remember specifically when. Are you setting the schedule via a state or pillar? |
We are setting schedule via pillar:
(you can tell how old this is by usage of I want to say this looks valid per https://docs.saltproject.io/salt/user-guide/en/latest/topics/scheduler.html Highstate jobs are still executing at the expected |
Hi @Silvenga we are trying to reproduce your process leak and are having no luck. Can you help us by telling what states and modules your using? Do you know what your doing when salt minion is making the process that is hanging? For example are you running some kind of state file around the same time the hanging process spawns? |
Does this help? Highstate output for a Windows node (Hyper-V host)
I don't think any processes are hanging, at least there's no impact to any operations. The process details suggest it's a scheduled highstate.
It appears this occurs just with Highstate, no other execution are normally occurring on these servers. Aside, I am hoping these servers will be upgraded to Windows Server 2022 soon'ish. There's already a migration to upgrade to Ubuntu 22.04 - while refactoring the existing Salt states (migrating to onedir and removing some legacy workarounds for random Salt problems over the may years). |
@Silvenga thanks for the ^ info. Can I also see your minion config? Are you sure you did not make any changes to your states? Salt is eating up some much ram because it has a lot of sub processes for some reason. My guess is your cmd states are not exiting right. |
The combined minion configuration for a Hyper-V Windows node. Nothing super exotic, the This is mostly the standard configuration deployed to both Linux and Windows nodes. master: <blah>
id: <blah>
schedule:
__mine_interval: {enabled: true, function: mine.update, jid_include: true, maxrunning: 2,
minutes: 60, return_job: false, run_on_start: true}
restart-minion:
args: [salt-minion]
cron: 28 6 * * *
enabled: true
function: service.restart
jid_include: true
maxrunning: 1
name: restart-minion
grains_cache: true
# Not documented.
# https://github.com/saltstack/salt/issues/48773
disable_grains:
- esxi
disable_modules:
- vsphere
# Just annoying:
- boto3_elasticsearch
- victorops
- ifttt
- pushbullet
- zfs
Oh, that's interesting. I'll see if I can get a process tree before the minion is restarted. I should note that there are no processes hanging around after the service restart (e.g. process count is mostly constant on these nodes as measured by a Telegraf agent). Icinga also monitors the number of processes, and hasn't detected anything unconstrained. I'm sorry to say that some of these Hyper-V nodes aren't exactly rebooted on a monthly cadence like we should - mostly because we don't have N+1 resources in some of the Hyper-V clusters to facilitate live migrations during rolling reboots. So, what I mean to say, some of these nodes have quite an uptime, suggesting the process count is constrained. EDIT: 66 days of uptime is the current record across the clusters. |
Thanks for the fast response, I'll keep looking if I can reproduce. |
TBH, if you can't repo, I think the onus should be on the reporter (me), we should upgrade to |
@Silvenga if would be great if you could find that state causing this problem. I would appreciate the help. |
Closing Issues due to inactivity. Feel free to re-open if deemed necessary. |
Description
After upgrading to
3005
from3002.9
we are seeing heavy Salt minion memory usage across all our Windows servers. On this server with enough memory (96GB), total RAM usage of the Salt Minion is around 7GB. On more memory constrained Windows instances, the Salt Minion eats nearly all the available memory and starts paging. For example, on a 4GB server, Salt RAM usage is around 3GB.On this server there are 55 salt-minion processes, each using about 130MB of ram each:
Setup
Windows 2016 instances (both bare metal and virtualized via Hyper-V). Version 3005, via the MSI installer.
Steps to Reproduce the behavior
Nothing interesting in the logs.
Minion configuration is close to defaults:
Expected behavior
RAM usage should be constrained.
Screenshots
These servers are Core instances, so not much of a UI. This is a screenshot from a Hypervisor (96GB of ram, 7GB used by Salt).
Versions Report
salt --versions-report
(Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)Additional context
All Windows instances appear to be impacted.
The text was updated successfully, but these errors were encountered: