-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allocwatcher: don't destroy local allocdir after migration #18108
Conversation
After confirming the behavior in #18034, I tested the patch in this PR with the following job specification. jobspecjob "example" {
group "web" {
network {
mode = "bridge"
port "www" {
to = 8001
}
}
ephemeral_disk {
migrate = true
sticky = true
}
task "httpd" {
driver = "docker"
config {
image = "busybox:1"
command = "httpd"
args = ["-vv", "-f", "-p", "8001", "-h", "/local"]
ports = ["www"]
}
resources {
cpu = 128
memory = 100
}
}
}
} Run the job:
Write to its logs and to its ephemeral data and verify those writes have been made:
Update the job and run again:
Verify that the logs for the previous allocation still exist, that the ephemeral disk has been migrated, but that the old allocation's ephemeral disk has been moved as expected.
|
When ephemeral disks are migrated from an allocation on the same node, allocation logs for the previous allocation are lost. There are two workflows for the best-effort attempt to migrate the allocation data between the old and new allocations. For previous allocations on other clients (the "remote" workflow), we create a local allocdir and download the data from the previous client into it. That data is then moved into the new allocdir and we delete the allocdir of the previous alloc. For "local" previous allocations we don't need to create an extra directory for the previous allocation and instead move the files directly from one to the other. But we still delete the old allocdir _entirely_, which includes all the logs! There doesn't seem to be any reason to destroy the local previous allocdir, as the usual client garbage collection should destroy it later on when needed. By not deleting it, the previous allocation's logs are still available for the user to read. Fixes: #18034
c03d830
to
a3a637e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
When ephemeral disks are migrated from an allocation on the same node, allocation logs for the previous allocation are lost. There are two workflows for the best-effort attempt to migrate the allocation data between the old and new allocations. For previous allocations on other clients (the "remote" workflow), we create a local allocdir and download the data from the previous client into it. That data is then moved into the new allocdir and we delete the allocdir of the previous alloc. For "local" previous allocations we don't need to create an extra directory for the previous allocation and instead move the files directly from one to the other. But we still delete the old allocdir _entirely_, which includes all the logs! There doesn't seem to be any reason to destroy the local previous allocdir, as the usual client garbage collection should destroy it later on when needed. By not deleting it, the previous allocation's logs are still available for the user to read. Fixes: #18034
BPA failed at random for the 1.4.x release branch, I've backported that by hand. |
When ephemeral disks are migrated from an allocation on the same node, allocation logs for the previous allocation are lost.
There are two workflows for the best-effort attempt to migrate the allocation data between the old and new allocations. For previous allocations on other clients (the "remote" workflow), we create a local allocdir and download the data from the previous client into it. That data is then moved into the new allocdir and we delete the allocdir of the previous alloc.
For "local" previous allocations we don't need to create an extra directory for the previous allocation and instead move the files directly from one to the other. But we still delete the old allocdir entirely, which includes all the logs!
There doesn't seem to be any reason to destroy the local previous allocdir, as the usual client garbage collection should destroy it later on when needed. By not deleting it, the previous allocation's logs are still available for the user to read.
Fixes: #18034