Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] ephemeral_disk sticky changed? #4420

Closed
ygersie opened this issue Jun 15, 2018 · 6 comments
Closed

[question] ephemeral_disk sticky changed? #4420

ygersie opened this issue Jun 15, 2018 · 6 comments

Comments

@ygersie
Copy link
Contributor

ygersie commented Jun 15, 2018

Question

Not sure in which release this changed as I migrated all the way from 0.5.4 to 0.8.4 but we also see this on another cluster running version 0.7.1.
With version 0.5.4 of Nomad you could stop a job with an ephemeral disk configured and upon re-start the scheduler placed the new allocation on the same node reattaching the data directories. Now this only seems to work when you drain a node or bump the job version by updating it, is this as designed?

Job file (if appropriate)

job "foo" {
    datacenters = ["dc1"]

    group "foo" {
        count = 1

        ephemeral_disk {
            size = 1000
            sticky = true
            migrate = true
        }

        task "foo" {
            driver = "docker"
            config {
                image = "alpine"
                args = ["/bin/sh", "-c", "sleep 10000"]
                network_mode = "host"
            }

            resources {
                cpu = 100
                memory = 64
                network {
                    mbits = 1
                }
            }
        }
    }
}
@ygersie
Copy link
Contributor Author

ygersie commented Jun 15, 2018

One use case where this is really necessary is when a scheduled maintenance occurs dropping all workers from the cluster due to missed heartbeats. If that happens you lose all data in the current situation.

With the old situation there was the option to shut everything down before maintenance and start back up after retaining the data.

@schmichael
Copy link
Member

Not sure in which release this changed as I migrated all the way from 0.5.4 to 0.8.4

Just FYI: Generally we recommend skipping at most a single point release at a time (eg 0.5.4 => 0.7.1).

With version 0.5.4 of Nomad you could stop a job with an ephemeral disk configured and upon re-start the scheduler placed the new allocation on the same node reattaching the data directories.

The behavior has changed, and I'm sorry it wasn't made clear! Once a job is stopped, we do not consider newly run jobs with the same name as being updates to the old job. This could lead to scenarios where a "db" job is stopped, and a new unrelated "db" is started and given all of the stopped job's data!

Once a job is explicitly stopped its data should be considered unavailable as it may be GC'd at any time.

I know for your use case the 0.5.4 behavior was ideal, but since migrating ephemeral disks has always been considered a "best effort" instead of a guarantee we have decided the new behavior is safer.

In the future we'll be adding adding volume management (#150) which will have much better guarantees around your data being migrated between nodes. Until then only expect ephemeral disks to be migrated when jobs are updated or rescheduled (a new 0.8 feature).

One use case where this is really necessary is when a scheduled maintenance occurs dropping all workers from the cluster due to missed heartbeats.

I'm curious why this happens. If you're shutting down all Nomad servers, clients should reconnect when they restart and nothing should get marked as lost. If you're doing a rolling restart of Nomad servers (the recommended approach), clients should be able to heartbeat throughout the maintenance window.

The only way I can think of that of that this behavior should happen is if you partition the servers from the clients without shutting them down. If this is necessary I would suggest the workaround below to avoid nodes becoming lost.

Workaround: Increase heartbeat intervals during upgrades

Bumping the heartbeat_grace during maintenance windows is often a good idea to avoid lost nodes and needless rescheduling.

This is useful enough we're hoping to add a way to toggle a maintenance mode that raises the grace period without requiring restarts.

@ygersie
Copy link
Contributor Author

ygersie commented Jun 15, 2018

Hey Michael, thanks so much for your detailed answer! I’ll have a look next week to see if I can prevent nodes getting in to the lost state during shutdown and might bump the heartbeat_grace quite a bit. I rather have delay in node failure detection than unnecessary data shifts.

I’ll post my results here.

@burdandrei
Copy link
Contributor

Adding my 2 cents:
we got 0.8.4 with ACL enabled running.
Here is the Anonymous policy:

$ nomad acl policy info anonymous

Name        = anonymous
Description = Allow read-only access for anonymous requests
Rules       = namespace "default" {
  policy = "read"
}
agent {
    policy = "read"
}
node {
    policy = "write"
}
CreateIndex = 6841952
ModifyIndex = 6842341

We tried to use ephemeral_disk { migrate = true size = "500" sticky = true } }

and this is what i can see in the client debug logs:

Jul 22 14:30:17 ip-10-aaa-bbb-ccc nomad[2545]:     2018/07/22 14:30:17.560315 [WARN] client: alloc "b8b35f2e-9e29-1866-cf87-22858f81f0f4" error while migrating data from previous alloc: error getting snapshot from previous alloc "441b2bd9-0f31-9e95-88f2-5868211d51b0": Unexpected response code: 403 (Permission denied)

Looks like because Nomad client doesn't know to run with the agent token, like consul and anonymous token cant read the fs of the allocation, call that tries to migrate the data receives 403.
Any black magic we can do with this @schmichael?

@burdandrei
Copy link
Contributor

figured that this one is closed and opened #4525

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants