Client Side Allocation Directory GC #1418

camerondavison · 2016-07-13T14:34:58Z

nomad 0.4.0

Since the disk fingerprint is both checking the disk for "free" space, and then is also being allocated for logs the logs on disk can end up counting twice.

If for instance I have 450 MB of disk space, and and 2 tasks that are set to have 10 rolling log files of 10MB each, and then I set both of those tasks to require 100MB of space. After my 2 tasks run for a long time and fill up all of the logs I can end up with resource allocation of 200/250 . Since the logs will be taking up physically 200MB on disk, and then also I will be trying to allocate 200MB worth for them.

This would mean that I could not put another task of 100MB disk allocation onto this node, because according to the resource check there is only 50MB available, but since we know that 200MB of the 250 have already been accounted for we should be able to provision the task.

camerondavison · 2016-07-13T15:36:03Z

Looking through the code more. Looks like the storage fingerprint is not periodic. I think that this especially is hurting me while trying to upgrade from 0.3.2 to 0.4.0. I ran nomad-drain which removed all of the tasks from the node (but probably did not gc the allocated log dirs) then stopped nomad, then restarted the machine, started nomad again. This time though when nomad came back up it fingerprinted the disk to be much less, because of the fact that all the old logs had not been gc'd yet.

dadgar · 2016-07-13T15:55:12Z

Hey,

This is something we are aware of and will be fixing. It is really do to the client not garbage collecting allocation directories it manages. It currently waits for a signal from the server that occurs on an interval which is incorrect.

camerondavison · 2016-07-13T16:43:41Z

Are you saying that you want to gc the allocation directories before startup, and before the fingerprint runs?

If you want to try and re-attach to any executors that are still running after startup (or run this check periodically) then you will encounter the problem of counting logs twice.

diptanu · 2016-07-13T17:52:50Z

@a86c6f7964 We will GC allocations which are dead and if there are new tasks trying to get disk space.

stephenlb · 2016-07-13T18:23:26Z

+1 yo

camerondavison · 2016-07-14T16:55:56Z

I can wait to see what happens, but I feel like I am a little lost.

Current state of the world

disk allocation out of checked free disk space
1 disk check for free space at startup
free space as calculated by the os (total - os stuff - any alloc logs on disk(both running, and dead))

State that I think would be good

disk allocation would still be out of free space
disk check more often maybe every 10 minutes
free space calculated as os free space plus all running alloc logs (we need to add this to the free space, if we are in fact going to also use the disk resource allocation checks). This would mean that the non running allocations would eat into the free space (but could be GC'd when space is needed as you stated above)

diptanu · 2016-07-15T04:17:40Z

There is also a PR which is going to land soon which is going to kill tasks when they exceed the allocated quota.

dadgar · 2016-07-15T18:43:28Z

@a86c6f7964 What you described is the goal

jshaw86 · 2016-07-18T22:35:24Z

@dadgar So we are seeing two issues:

the disk space is being reported as incorrect
the disk space never gets cleaned up

From the above conversation it seems that my first point is being addressed. I'm unclear though from the above if the disk will actually be cleaned up from stopped or failed allocations automatically, or if we will need to run a GC task to clean up the file system manually?

diptanu · 2016-07-19T01:12:12Z

@jshaw86 You won't have to run a GC task to clean up the dead allocations. Nomad will clean them up automatically once we have implemented the client GC feature.

camerondavison · 2016-07-19T01:14:34Z

Also they are automatically cleaned up periodically when the master server
issues a GC currently.

On Mon, Jul 18, 2016, 8:12 PM Diptanu Choudhury [email protected]
wrote:

@jshaw86 https://github.com/jshaw86 You won't have to run a GC task to
clean up the dead allocations. Nomad will clean them up automatically once
we have implemented the client GC feature.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1418 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAKGBebLM1lw9Vru0OjNWJwDt8Es6SRNks5qXCRxgaJpZM4JLfBA
.

jshaw86 · 2016-07-20T18:47:10Z

@a86c6f7964 are you currently seeing this automatically cleaned up behavior by master server GC? We are not seeing any disk cleanup under 0.4.0 even after the 24 hours.

camerondavison · 2016-07-22T16:08:47Z

I saw them go away because in order to accomplish an upgrade of nomad I did.

nomad node-drain -self -enable
<wait for drain>
curl $NOMAD_SERVER_CLUSTER_ADDR/v1/system/gc
<restart server to upgrade os and nomad, wait for new nomad version to be up in the cluster>
nomad node-drain -self -disable

So maybe it only does it if you run the system gc?

camerondavison · 2016-12-28T17:24:52Z

Does anyone know if #2081 helps this issue out at all?

diptanu · 2017-01-03T18:37:26Z

Fixed via #2081

github-actions · 2022-12-17T02:11:49Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

dadgar changed the title ~~disk resource fingerprint with full logs, allocation can count twice~~ Client Side Allocation Directory GC Jul 13, 2016

dadgar added type/bug theme/client labels Jul 13, 2016

diptanu closed this as completed Jan 3, 2017

github-actions bot locked as resolved and limited conversation to collaborators Dec 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Client Side Allocation Directory GC #1418

Client Side Allocation Directory GC #1418

camerondavison commented Jul 13, 2016

camerondavison commented Jul 13, 2016

dadgar commented Jul 13, 2016

camerondavison commented Jul 13, 2016

diptanu commented Jul 13, 2016

stephenlb commented Jul 13, 2016

camerondavison commented Jul 14, 2016

diptanu commented Jul 15, 2016

dadgar commented Jul 15, 2016

jshaw86 commented Jul 18, 2016

diptanu commented Jul 19, 2016

camerondavison commented Jul 19, 2016

jshaw86 commented Jul 20, 2016

camerondavison commented Jul 22, 2016

camerondavison commented Dec 28, 2016

diptanu commented Jan 3, 2017

github-actions bot commented Dec 17, 2022

Client Side Allocation Directory GC #1418

Client Side Allocation Directory GC #1418

Comments

camerondavison commented Jul 13, 2016

camerondavison commented Jul 13, 2016

dadgar commented Jul 13, 2016

camerondavison commented Jul 13, 2016

diptanu commented Jul 13, 2016

stephenlb commented Jul 13, 2016

camerondavison commented Jul 14, 2016

diptanu commented Jul 15, 2016

dadgar commented Jul 15, 2016

jshaw86 commented Jul 18, 2016

diptanu commented Jul 19, 2016

camerondavison commented Jul 19, 2016

jshaw86 commented Jul 20, 2016

camerondavison commented Jul 22, 2016

camerondavison commented Dec 28, 2016

diptanu commented Jan 3, 2017

github-actions bot commented Dec 17, 2022