-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nomad doens't make GC for manualy stoped batch allocations #4532
Comments
@preetapan Is this expected behavior? |
I'm seeing this problem. While the job isn't registered anymore, I'm able to see the allocations via the /v1/allocations endpoint. I've run a manual GC -- and even a normal GC should have caught them at this point. Normally I'd be fine just ignoring them, but there are a lot of them (the evaluation endpoint gives me back ~500 MiB+ of JSON) from when I encountered a bug over the weekend that caused a ton of evaluations to occur of a periodic job, to the point where some of my Nomad instances are using significant amounts of memory:
Where the float number there is the % of memory on the VM (out of 8 or 16 GiB). This is using vanilla 0.8.7. The allocations look like this: [{"AnnotatePlan":false,"BlockedEval":"","ClassEligibility":null,"CreateIndex":2900362,"DeploymentID":"","EscapedComputedClass":false,"FailedTGAllocs":null,"ID":"000205df-5f0b-c0f6-f76a-9712aabadb06","JobID":"stage-a-restart-services/periodic-1552197600","JobModifyIndex":2900361,"LeaderACL":"","ModifyIndex":2900362,"Namespace":"default","NextEval":"","NodeID":"","NodeModifyIndex":0,"PreviousEval":"","Priority":50,"QueuedAllocations":null,"QuotaLimitReached":"","SnapshotIndex":0,"Status":"pending","StatusDescription":"","TriggeredBy":"periodic-job","Type":"batch","Wait":0,"WaitUntil":"0001-01-01T00:00:00Z"},
{"AnnotatePlan":false,"BlockedEval":"","ClassEligibility":null,"CreateIndex":1846653,"DeploymentID":"","EscapedComputedClass":false,"FailedTGAllocs":null,"ID":"00022322-9920-7679-4c7c-b475bcd92eb9","JobID":"stage-a-restart-services/periodic-1552197600","JobModifyIndex":1846652,"LeaderACL":"","ModifyIndex":1846653,"Namespace":"default","NextEval":"","NodeID":"","NodeModifyIndex":0,"PreviousEval":"","Priority":50,"QueuedAllocations":null,"QuotaLimitReached":"","SnapshotIndex":0,"Status":"pending","StatusDescription":"","TriggeredBy":"periodic-job","Type":"batch","Wait":0,"WaitUntil":"0001-01-01T00:00:00Z"},
{"AnnotatePlan":false,"BlockedEval":"","ClassEligibility":null,"CreateIndex":2387713,"DeploymentID":"","EscapedComputedClass":false,"FailedTGAllocs":null,"ID":"00027399-797d-7551-3bea-3c4d2ed6e851","JobID":"stage-a-restart-services/periodic-1552197600","JobModifyIndex":2387712,"LeaderACL":"","ModifyIndex":2387713,"Namespace":"default","NextEval":"","NodeID":"","NodeModifyIndex":0,"PreviousEval":"","Priority":50,"QueuedAllocations":null,"QuotaLimitReached":"","SnapshotIndex":0,"Status":"pending","StatusDescription":"","TriggeredBy":"periodic-job","Type":"batch","Wait":0,"WaitUntil":"0001-01-01T00:00:00Z"},
{"AnnotatePlan":false,"BlockedEval":"","ClassEligibility":null,"CreateIndex":1562539,"DeploymentID":"","EscapedComputedClass":false,"FailedTGAllocs":null,"ID":"00028a09-2d6a-cb2b-b7b3-ade83240fcf3","JobID":"stage-a-restart-services/periodic-1552197600","JobModifyIndex":1562538,"LeaderACL":"","ModifyIndex":1562539,"Namespace":"default","NextEval":"","NodeID":"","NodeModifyIndex":0,"PreviousEval":"","Priority":50,"QueuedAllocations":null,"QuotaLimitReached":"","SnapshotIndex":0,"Status":"pending","StatusDescription":"","TriggeredBy":"periodic-job","Type":"batch","Wait":0,"WaitUntil":"0001-01-01T00:00:00Z"},
{"AnnotatePlan":false,"BlockedEval":"","ClassEligibility":null,"CreateIndex":1685747,"DeploymentID":"","EscapedComputedClass":false,"FailedTGAllocs":null,"ID":"0002906c-5f97-0dfc-f862-7f9785b08810","JobID":"stage-a-restart-services/periodic-1552197600","JobModifyIndex":1685746,"LeaderACL":"","ModifyIndex":1685747,"Namespace":"default","NextEval":"","NodeID":"","NodeModifyIndex":0,"PreviousEval":"","Priority":50,"QueuedAllocations":null,"QuotaLimitReached":"","SnapshotIndex":0,"Status":"pending","StatusDescription":"","TriggeredBy":"periodic-job","Type":"batch","Wait":0,"WaitUntil":"0001-01-01T00:00:00Z"},
{"AnnotatePlan":false,"BlockedEval":"","ClassEligibility":null,"CreateIndex":1369536,"DeploymentID":"","EscapedComputedClass":false,"FailedTGAllocs":null,"ID":"0002aa6c-92ed-ab22-adc9-a5f3f9f2913b","JobID":"stage-a-restart-services/periodic-1552197600","JobModifyIndex":1369534,"LeaderACL":"","ModifyIndex":1369536,"Namespace":"default","NextEval":"","NodeID":"","NodeModifyIndex":0,"PreviousEval":"","Priority":50,"QueuedAllocations":null,"QuotaLimitReached":"","SnapshotIndex":0,"Status":"pending","StatusDescription":"","TriggeredBy":"periodic-job","Type":"batch","Wait":0,"WaitUntil":"0001-01-01T00:00:00Z"},
{"AnnotatePlan":false,"BlockedEval":"","ClassEligibility":null,"CreateIndex":2887201,"DeploymentID":"","EscapedComputedClass":false,"FailedTGAllocs":null,"ID":"0002bac9-1f4a-0230-739f-9a7c6670b2d4","JobID":"stage-a-restart-services/periodic-1552197600","JobModifyIndex":2887200,"LeaderACL":"","ModifyIndex":2887201,"Namespace":"default","NextEval":"","NodeID":"","NodeModifyIndex":0,"PreviousEval":"","Priority":50,"QueuedAllocations":null,"QuotaLimitReached":"","SnapshotIndex":0,"Status":"pending","StatusDescription":"","TriggeredBy":"periodic-job","Type":"batch","Wait":0,"WaitUntil":"0001-01-01T00:00:00Z"},
{"AnnotatePlan":false,"BlockedEval":"","ClassEligibility":null,"CreateIndex":2310569,"DeploymentID":"","EscapedComputedClass":false,"FailedTGAllocs":null,"ID":"0002f2e9-4a12-c6bc-6a9f-a4981e4e91e1","JobID":"stage-a-restart-services/periodic-1552197600","JobModifyIndex":2310568,"LeaderACL":"","ModifyIndex":2310569,"Namespace":"default","NextEval":"","NodeID":"","NodeModifyIndex":0,"PreviousEval":"","Priority":50,"QueuedAllocations":null,"QuotaLimitReached":"","SnapshotIndex":0,"Status":"pending","StatusDescription":"","TriggeredBy":"periodic-job","Type":"batch","Wait":0,"WaitUntil":"0001-01-01T00:00:00Z"},
... I tried removing |
Hey there Since this issue hasn't had any activity in a while - we're going to automatically close it in 30 days. If you're still seeing this issue with the latest version of Nomad, please respond here and we'll keep this open and take another look at this. Thanks! |
This issue will be auto-closed because there hasn't been any activity for a few months. Feel free to open a new one if you still experience this problem 👍 |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Nomad v0.8.4 (dbee1d7)
Issue
If we made stop foar batch jobs then launch it again throw
nomad run
. Old Allocations will not be gc, manual gc also doens' helpReproduction steps
for example we have test job:
then we launch it throw
nomad run ./test.nomad
, then stop(nomad stop test
) then againnomad run ./test.nomad
, then againnomad stop test
and finallynomad run ./test.nomad
, so we have follow state of jobAfter that manipulations we try to made manual GC(we made it 2-3 times):
And nothing happens, old allocations doesn't clean. Only if we fully stop batch job GC for it will clean all allocations. All this we made on test stand, but on real environment we have following situations:
The text was updated successfully, but these errors were encountered: