-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
slow client GC prevents new allocs from running (periodic dispatch) #19917
Comments
Hi @aneustroev 👋 I'm not sure I fully understood the problem. Is the Nomad GC preventing new allocations from being placed until it completes? Would you be able to share the output of the status for the periodic job and the allocations when they get into this state? Thank you! |
sometimes task hang in queue for hours |
Does the task have any events? And have you noticed if this happens in a particular client or are all allocs pending, regardless of where they are scheduled? Would be able to share logs from the Nomad client when the problem happens? You can email them to [email protected] with the issue number in the subject. |
It's don't depend of client. When it happened I see next log messages.
No one else WARN or ERROR messages. |
Until all allocs behind server GC settings
client GC settings
|
I believe Nomad's client GC is not-blocking, so it shouldn't impact the new allocations. Would you be able to drop the client log level to |
why is it happend?
|
Some details here #7787 (comment) |
Lower |
Sorry about the delays in returning to this (and the seemingly related #7787). I'm going to mark this as accepted and for roadmapping. |
Internal ref: https://hashicorp.atlassian.net/browse/NET-10187 |
Nomad version
Nomad v1.7.4
Operating system and Environment details
Ubuntu
Issue
If I run periodic tasks, childrens of periodic don't cleanup by GC until it become
gc_max_allocs
and then all jobs stuck in pending state. Waiting for childrens cleanup.Also, I can't set
gc_max_allocs
very big value, because each alloc make two mounts, and with mounts more that 30k OS work unstable.Reproduction steps
1 Make a lot frequency periodic tasks (>100)
2 Waiting for allocs >
gc_max_allocs
3 See that all new allocs in pending state
Expected Result
1 GC remove old allocs by TTL
or
2 GC unmount secrets, private mounts by TTL
Actual Result
1 When
gc_max_allocs
small, new allocs created with delay or newer be created2 When
gc_max_allocs
big, OS become unstable over timeJob file (if appropriate)
Nomad Server logs (if appropriate)
All is good.
Nomad Client logs (if appropriate)
also, I saw strange logs for dead allocs
why it's killing, if it's dead?
The text was updated successfully, but these errors were encountered: