Allow per_alloc to be used with host volumes #15780

ygersie · 2023-01-13T16:16:33Z

Disallowing per_alloc for host volumes in some cases makes life of a nomad user much harder. When we rely on the NOMAD_ALLOC_INDEX for any configuration that needs to be re-used across restarts we need to make sure allocation placement is consistent. With CSI volumes we can use the per_alloc feature but for some reason this is explicitly disabled for host volumes.

I couldn't find a reason why this done, since as far as I could see the per_alloc is only used for the scheduler to determine placement.

Disallowing per_alloc for host volumes in some cases makes life of a nomad user much harder. When we rely on the NOMAD_ALLOC_INDEX for any configuration that needs to be re-used across restarts we need to make sure allocation placement is consistent. With CSI volumes we can use the `per_alloc` feature but for some reason this is explicitly disabled for host volumes. I couldn't find a reason why this done since as far as I could see the per_alloc is only used for the scheduler to determine placement.

ygersie · 2023-01-13T16:18:23Z

@tgross hope you don't mind me pinging you here, but this seems to be written by you. Do you happen to remember if there was an explicit reason to disable per_alloc for host volumes?

tgross · 2023-01-13T20:59:10Z

Do you happen to remember if there was an explicit reason to disable per_alloc for host volumes?

Mostly because we didn't think anyone would want it because it creates tight coupling between cluster administrator config and job operator config. You'd end up with a client configuration like the following:

client {
  host_volume "host_data[0]" {
    path = "/srv"
  }
}

And then a jobspec with the following:

job "docs" {
  group "example" {
    volume "certs" {
      type      = "host"
      source    = "host_data"
      per_alloc = true
    }
  }
}

Which I guess can work but doesn't that forever pin the workload to a specific client (or at least the subset of clients with that same index on the host volume config)?

ygersie · 2023-01-13T21:16:08Z

Which I guess can work but doesn't that forever pin the workload to a specific client (or at least the subset of clients with that same index on the host volume config)?

In many cases you wouldn’t need to pin an allocation to a specific node but there are some. Kafka for example, could benefit from this. When you deploy a job config with a broker.id rendered from the NOMAD_ALLOC_INDEX you need to ensure the data matches that specific allocation to deploy a consistent config.

tgross

Hi @ygersie! Ok, I think I understand the use case and that sounds reasonable. Hopefully #15489 will fill some of these gaps but that's a ways out yet.

This change isn't enough to expose per-alloc semantics for host volumes though:

We'll need to reproduce the per-alloc logic currently in the scheduler for CSI volumes to the host volumes. Look in scheduler/feasible.go for that.
We'll need to reproduce the client-side logic we currently have in csi_hook.go for creating the per-alloc name into the task runner's volume_hook.go.

Right now the validation tests don't pass either. We should probably update the docs as well.

ygersie · 2023-01-24T09:57:32Z

@tgross I took a stab at it, please roast me where appropriate ;)

tgross

Hi @ygersie! The code looks good, and I pulled it down locally to try it out and everything seems to work as expected.

Two additional bits:

Documentation updates: we'll need to move the per_alloc in the docs into the section above with the rest of the fields that work for both CSI and Host volumes. You can find that in volume.mdx. We'll also need to update the update.canary description.
Can you run make cl to add a changelog entry?

Once that's done we can get this merged and it'll ship in the upcoming Nomad 1.5.0.

ygersie · 2023-01-26T09:17:12Z

@tgross thanks for all the pointers, much appreciated! Should be good now.

tgross

LGTM! Thanks @ygersie!

vercel bot deployed to Preview – nomad-storybook-and-ui January 13, 2023 16:20 View deployment

tgross self-assigned this Jan 20, 2023

tgross added theme/storage type/enhancement labels Jan 20, 2023

tgross requested changes Jan 20, 2023

View reviewed changes

tgross added the stage/waiting-reply label Jan 20, 2023

Ensure host volumes understand the concept of per_alloc

d94caf3

fix volume test

10ec5df

tgross removed the stage/waiting-reply label Jan 24, 2023

tgross requested changes Jan 25, 2023

View reviewed changes

tgross added this to the 1.5.0 milestone Jan 25, 2023

tgross added the stage/waiting-reply label Jan 25, 2023

Update docs and add changelog entry

5cbf5ab

tgross approved these changes Jan 26, 2023

View reviewed changes

tgross merged commit 24a575a into hashicorp:main Jan 26, 2023

tgross removed the stage/waiting-reply label Jan 26, 2023

vercel bot had a problem deploying to Preview – nomad January 26, 2023 14:19 Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow per_alloc to be used with host volumes #15780

Allow per_alloc to be used with host volumes #15780

ygersie commented Jan 13, 2023

ygersie commented Jan 13, 2023

tgross commented Jan 13, 2023

ygersie commented Jan 13, 2023

tgross left a comment

ygersie commented Jan 24, 2023

tgross left a comment

ygersie commented Jan 26, 2023

tgross left a comment

Allow per_alloc to be used with host volumes #15780

Allow per_alloc to be used with host volumes #15780

Conversation

ygersie commented Jan 13, 2023

ygersie commented Jan 13, 2023

tgross commented Jan 13, 2023

ygersie commented Jan 13, 2023

tgross left a comment

Choose a reason for hiding this comment

ygersie commented Jan 24, 2023

tgross left a comment

Choose a reason for hiding this comment

ygersie commented Jan 26, 2023

tgross left a comment

Choose a reason for hiding this comment