Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nomad executor processes consume lot of CPU on windows Server #10960

Open
srihari-vistrian opened this issue Jul 28, 2021 · 2 comments
Open

Nomad executor processes consume lot of CPU on windows Server #10960

srihari-vistrian opened this issue Jul 28, 2021 · 2 comments

Comments

@srihari-vistrian
Copy link

Nomad version

Nomad v1.1.2 (60638a0)

Operating system and Environment details

OS Name: Microsoft Windows Server 2016 Standard
OS Version: 10.0.14393 N/A Build 14393
OS Manufacturer: Microsoft Corporation
OS Configuration: Member Server
OS Build Type: Multiprocessor Free
Registered Owner: Windows User
Registered Organization:
Product ID: 00377-60000-00000-AA934
Original Install Date: 03/09/2019, 10:04:45 AM
System Boot Time: 29/06/2021, 10:22:26 AM
System Manufacturer: VMware, Inc.
System Model: VMware Virtual Platform
System Type: x64-based PC
Processor(s): 6 Processor(s) Installed.
[01]: Intel64 Family 6 Model 63 Stepping 0 GenuineIntel ~2694 Mhz
[02]: Intel64 Family 6 Model 63 Stepping 0 GenuineIntel ~2694 Mhz
[03]: Intel64 Family 6 Model 63 Stepping 0 GenuineIntel ~2694 Mhz
[04]: Intel64 Family 6 Model 63 Stepping 0 GenuineIntel ~2694 Mhz
[05]: Intel64 Family 6 Model 63 Stepping 0 GenuineIntel ~2694 Mhz
[06]: Intel64 Family 6 Model 63 Stepping 0 GenuineIntel ~2694 Mhz
BIOS Version: Phoenix Technologies LTD 6.00, 12/12/2018
Windows Directory: C:\Windows
System Directory: C:\Windows\system32
Boot Device: \Device\HarddiskVolume1
Total Physical Memory: 65,535 MB
Available Physical Memory: 57,022 MB
Virtual Memory: Max Size: 87,551 MB
Virtual Memory: Available: 76,334 MB
Virtual Memory: In Use: 11,217 MB
Page File Location(s): C:\pagefile.sys
P:\pagefile.sys
Hotfix(s): 13 Hotfix(s) Installed.
[01]: KB3199986
[02]: KB4346087
[03]: KB4485447
[04]: KB4498947
[05]: KB4503537
[06]: KB4509091
[07]: KB4535680
[08]: KB4550994
[09]: KB4562561
[10]: KB4576750
[11]: KB5001078
[12]: KB5001402
[13]: KB5003638
Network Card(s): 1 NIC(s) Installed.
[01]: vmxnet3 Ethernet Adapter
Connection Name: Ethernet0
DHCP Enabled: No
Hyper-V Requirements: A hypervisor has been detected. Features required for Hyper-V will not be displayed.

Issue

We are facing issues running large number of jobs > 60 where the nomad executor processes start consuming CPU and basically stall the system and eventually jobs are killed and tried to reschedule or some eventually fail. If the jobs are around 60, the executor CPU overhead appears every few minutes or so and hits 100% for a couple of minutes (or > 10m sometimes) and then falls down and stabilises.

This appears to be related to the issue #5832 where similar problem occurred on linux, however the fix for the same is not relevant for windows I believe.

This is becoming a major bottle neck for us as we are using the HashiCorp stack - especially Nomad and Consul in production and are also considering moving to the enterprise edition but however this issue has been a significant blocker.

Reproduction steps

Run any job of type service with count in large numbers > 100.

I had also posted details related to this in the forum but however never received any inputs related to it:

https://discuss.hashicorp.com/t/nomad-v1-0-1-executor-raw-exec-processes-appear-to-consume-significant-cpu-on-windows/21771

@srihari-vistrian
Copy link
Author

Increasing the pidScanInterval as suggested in the #5832 certainly makes the system more stable although it does occasionally spike to 100% for a few seconds to almost a minute intermittently.

The average CPU consumption is around 10-15% when the nomad executor processes aren't hogging the CPU.

@schmichael
Copy link
Member

Thanks @srihari-vistrian. Sadly I don't think we can get a CPU profile from the executor process like we can from the Nomad agent process via the agent profile endpoint: https://www.nomadproject.io/api-docs/agent#agent-runtime-profiles

That being said if it really as straightforward to reproduce as running >60 jobs we should be able to track it down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants