Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add extra Docker labels ( job name, task and task group name) #9885

Merged
merged 14 commits into from
Mar 8, 2021

Conversation

sofixa
Copy link
Contributor

@sofixa sofixa commented Jan 25, 2021

As discussed in #4781 , having those extra labels is needed for log shipping, cAdvisor, etc.

@vercel vercel bot temporarily deployed to Preview – nomad-storybook-and-ui January 25, 2021 15:53 Inactive
@vercel vercel bot temporarily deployed to Preview – nomad January 25, 2021 15:53 Inactive
@henrikjohansen
Copy link

@sofixa Exposing the namespace as a label would also be super useful, especially now that namespaces are available in Nomad OSS 👍

@sofixa
Copy link
Contributor Author

sofixa commented Jan 29, 2021

@henrikjohansen i thought about adding namespace and node name, but they don't seem to be available at that level ( TaskConfig ) and i wanted to keep the PR easy, hoping that it'd be merged faster that way :D

If anyone has an idea about how to get namespace and node information at task creation, i'm all ears!

@apollo13
Copy link
Contributor

apollo13 commented Feb 6, 2021

It would be relatively easy to add the namespace/nodename here:

// buildTaskConfig builds a drivers.TaskConfig with an unique ID for the task.
// The ID is unique for every invocation, it is built from the alloc ID, task
// name and 8 random characters.
func (tr *TaskRunner) buildTaskConfig() *drivers.TaskConfig {
task := tr.Task()
alloc := tr.Alloc()
invocationid := uuid.Generate()[:8]
taskResources := tr.taskResources
ports := tr.Alloc().AllocatedResources.Shared.Ports
env := tr.envBuilder.Build()
tr.networkIsolationLock.Lock()
defer tr.networkIsolationLock.Unlock()
var dns *drivers.DNSConfig
if alloc.AllocatedResources != nil && len(alloc.AllocatedResources.Shared.Networks) > 0 {
allocDNS := alloc.AllocatedResources.Shared.Networks[0].DNS
if allocDNS != nil {
dns = &drivers.DNSConfig{
Servers: allocDNS.Servers,
Searches: allocDNS.Searches,
Options: allocDNS.Options,
}
}
}
return &drivers.TaskConfig{
ID: fmt.Sprintf("%s/%s/%s", alloc.ID, task.Name, invocationid),
Name: task.Name,
JobName: alloc.Job.Name,
TaskGroupName: alloc.TaskGroup,
Resources: &drivers.Resources{
NomadResources: taskResources,
LinuxResources: &drivers.LinuxResources{
MemoryLimitBytes: taskResources.Memory.MemoryMB * 1024 * 1024,
CPUShares: taskResources.Cpu.CpuShares,
PercentTicks: float64(taskResources.Cpu.CpuShares) / float64(tr.clientConfig.Node.NodeResources.Cpu.CpuShares),
},
Ports: &ports,
},
Devices: tr.hookResources.getDevices(),
Mounts: tr.hookResources.getMounts(),
Env: env.Map(),
DeviceEnv: env.DeviceEnv(),
User: task.User,
AllocDir: tr.taskDir.AllocDir,
StdoutPath: tr.logmonHookConfig.stdoutFifo,
StderrPath: tr.logmonHookConfig.stderrFifo,
AllocID: tr.allocID,
NetworkIsolation: tr.networkIsolationSpec,
DNS: dns,
}
}

alloc.Namespace and alloc.NodeName is all that would be needed to be passed down into the TaskConfig.

That said maybe @tgross could comment on whether that would be okay for the nomad team or whether another approach should be taken.

@apollo13
Copy link
Contributor

apollo13 commented Feb 6, 2021

Attached you will find a diff that also adds namespace & node_name:

diff --git a/client/allocrunner/taskrunner/task_runner.go b/client/allocrunner/taskrunner/task_runner.go
index 9b94e1c75..e6604688f 100644
--- a/client/allocrunner/taskrunner/task_runner.go
+++ b/client/allocrunner/taskrunner/task_runner.go
@@ -981,6 +981,8 @@ func (tr *TaskRunner) buildTaskConfig() *drivers.TaskConfig {
 		Name:          task.Name,
 		JobName:       alloc.Job.Name,
 		TaskGroupName: alloc.TaskGroup,
+		Namespace:     alloc.Namespace,
+		NodeName:      alloc.NodeName,
 		Resources: &drivers.Resources{
 			NomadResources: taskResources,
 			LinuxResources: &drivers.LinuxResources{
diff --git a/drivers/docker/driver.go b/drivers/docker/driver.go
index 9a382d71d..a81697688 100644
--- a/drivers/docker/driver.go
+++ b/drivers/docker/driver.go
@@ -74,6 +74,8 @@ const (
 	dockerLabelJobName       = "com.hashicorp.nomad.job_name"
 	dockerLabelTaskGroupName = "com.hashicorp.nomad.task_group_name"
 	dockerLabelTaskName      = "com.hashicorp.nomad.task_name"
+	dockerLabelNamespace     = "com.hashicorp.nomad.namespace"
+	dockerLabelNodeName      = "com.hashicorp.nomad.node_name"
 )
 
 type Driver struct {
@@ -1121,6 +1123,8 @@ func (d *Driver) createContainerConfig(task *drivers.TaskConfig, driverConfig *T
 	labels[dockerLabelJobName] = task.JobName
 	labels[dockerLabelTaskGroupName] = task.TaskGroupName
 	labels[dockerLabelTaskName] = task.Name
+	labels[dockerLabelNamespace] = task.Namespace
+	labels[dockerLabelNodeName] = task.NodeName
 
 	config.Labels = labels
 	logger.Debug("applied labels on the container", "labels", config.Labels)
diff --git a/drivers/docker/driver_test.go b/drivers/docker/driver_test.go
index 05ce029d9..fb5bc239c 100644
--- a/drivers/docker/driver_test.go
+++ b/drivers/docker/driver_test.go
@@ -799,7 +799,7 @@ func TestDockerDriver_Labels(t *testing.T) {
 		t.Fatalf("err: %v", err)
 	}
 
-	// expect to see 4 additional standard labels (allocID, jobName, TaskGroupName and TaskName)
+	// expect to see 6 additional standard labels (allocID, jobName, TaskGroupName, TaskName, Namepsace and NondeName)
 	require.Equal(t, len(cfg.Labels)+4, len(container.Config.Labels))
 	for k, v := range cfg.Labels {
 		require.Equal(t, v, container.Config.Labels[k])
@@ -1070,6 +1070,8 @@ func TestDockerDriver_CreateContainerConfig_Labels(t *testing.T) {
 		"com.hashicorp.nomad.job_name":        task.JobName,
 		"com.hashicorp.nomad.task_group_name": task.TaskGroupName,
 		"com.hashicorp.nomad.task_name":       task.Name,
+		"com.hashicorp.nomad.namespace":       task.Namespace,
+		"com.hashicorp.nomad.node_name":       task.NodeName,
 	}
 
 	require.Equal(t, expectedLabels, c.Config.Labels)
diff --git a/plugins/drivers/driver.go b/plugins/drivers/driver.go
index c2d040eed..9154d6557 100644
--- a/plugins/drivers/driver.go
+++ b/plugins/drivers/driver.go
@@ -239,6 +239,8 @@ type TaskConfig struct {
 	JobName          string
 	TaskGroupName    string
 	Name             string
+	Namespace        string
+	NodeName         string
 	Env              map[string]string
 	DeviceEnv        map[string]string
 	Resources        *Resources

Tested locally and checked the created container:

            "Labels": {
                "com.hashicorp.nomad.alloc_id": "30889d45-303e-81dd-e0b4-29d6ed60bbaf",
                "com.hashicorp.nomad.job_name": "example",
                "com.hashicorp.nomad.namespace": "default",
                "com.hashicorp.nomad.node_name": "apollo13",
                "com.hashicorp.nomad.task_group_name": "cache",
                "com.hashicorp.nomad.task_name": "redis"
            }

@kpweiler
Copy link

kpweiler commented Feb 11, 2021

Is there anything preventing this from being merged? This would be super helpful. Would be happy to help if I can.

Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi folks! Apologies for letting this PR linger. As it turns out, we wrote and closed a similar PR #5153 a little over a year ago. Our concern there was that adding labels will increase the volume of high-cardinality metadata, and that'll make it more expensive for folks using third-party monitoring systems to use Nomad.

I brought this up again for discussion internally and we came to a consensus that if the PR included an option to tune the labels being emitted in the client's plugin configuration, we'd be more comfortable merging this. Think you'd be willing to take a shot at that?

@sofixa
Copy link
Contributor Author

sofixa commented Feb 20, 2021

@tgross yep, that seems logical. How do you propose the configuration should look like? A per-label flag or a global one, e.g. :

extra_flags = true

or

extra_flags {
task_name = true
job_name = true
}

Adrian

@apollo13
Copy link
Contributor

apollo13 commented Feb 20, 2021 via email

@sofixa
Copy link
Contributor Author

sofixa commented Feb 20, 2021

Apf, i meant labels, not flags, that's what happens when i write pseudocode before fully waking up :D

And indeed, i think a list with an "all" keyword or a * wildcard is better than a map.

@tgross
Copy link
Member

tgross commented Feb 22, 2021

I think extra_labels seems like the right move here, along with a wildcard *:

plugin "docker" {
  extra_labels = ["node_name", "job_name", ...]
}

One minor detail for the labels is that I'd recommend we go with job_id over job_name and node_id over node_name: those *_name fields aren't guaranteed to be unique whereas the *_id fields are (we'd probably love to deprecate job_name but it's a backwards compatibility pain to do so).

@sofixa
Copy link
Contributor Author

sofixa commented Feb 22, 2021

@tgross we can add both, but in most cases, names would be more useful than ids for metrics and logs - in most cases you want to see the logs for job or task name X, not for a particular instance of it. Furthermore, indexing is recommended only on non-random fields in tools like Loki or Prometheus.

So, we're looking at:

All labels:

  • alloc_id
  • job_name
  • job_id
  • namespace
  • node_name
  • node_id
  • task_group_name
  • task_name

With a configuration like:

plugin "docker" {
  extra_labels = ["node_name", "job_name", "task*"]
}

@tgross @apollo13 what do you think? I can start working on it ~tomorrow.

Adrian

@apollo13
Copy link
Contributor

+1 on having the actual job names and not ids in there.

@tgross
Copy link
Member

tgross commented Feb 22, 2021

@tgross we can add both, but in most cases, names would be more useful than ids for metrics and logs - in most cases you want to see the logs for job or task name X, not for a particular instance of it. Furthermore, indexing is recommended only on non-random fields in tools like Loki or Prometheus.

Ok, I'm sold on that argument. Go for it!

@sofixa sofixa requested a review from tgross March 2, 2021 18:12
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty good! I did a quickie test with some of the globs and I see the expected labels just fine:

$ docker inspect 0a5 | jq '.[0].Config.Labels'
{
  "com.hashicorp.nomad.alloc_id": "bacfab64-5e45-5b8e-79df-28d758f8a055",
  "com.hashicorp.nomad.job_id": "example",
  "com.hashicorp.nomad.job_name": "example",
  "com.hashicorp.nomad.task_group_name": "cache",
  "com.hashicorp.nomad.task_name": "redis"
}

@tgross @apollo13 any ideas how i can simulate the driver configuration in tests? That doesn't seem to be done - i can't find any instances in drivers/docker/driver_test.go, and there's only configuration parsing tests config_test.go, like so:

For doing more than configuration parsing (which I agree would be nice), there's a function dockerSetup that takes the config you want to pass to the driver.

drivers/docker/driver_test.go Outdated Show resolved Hide resolved
website/content/docs/drivers/docker.mdx Outdated Show resolved Hide resolved
@apollo13
Copy link
Contributor

apollo13 commented Mar 5, 2021

  "com.hashicorp.nomad.job_id": "example",
  "com.hashicorp.nomad.job_name": "example",

what is the difference between job_id & job_name? I would have expected the id to be a uuid, but they are both example? Can the two be different?

@tgross
Copy link
Member

tgross commented Mar 5, 2021

what is the difference between job_id & job_name? I would have expected the id to be a uuid, but they are both example? Can the two be different?

They are normally identical because folks don't set the optional job name field. It's not really all that useful to do so and it's one of those things where if we could remove it without breaking backwards compat we probably would (there's this ancient issue #2248 which suggests otherwise, but I suspect we won't go that route). The job ID + namespace has to be unique across the whole cluster, whereas there's no such constraint on job name.

@sofixa
Copy link
Contributor Author

sofixa commented Mar 5, 2021

Test is done!

@tgross
Copy link
Member

tgross commented Mar 8, 2021

This looks great @sofixa! After taking a second look at it, we can avoid a bunch of allocations and generally make the code a bit easier to read with the change I just pushed in 29d3460 (I'd have made that as a comment but it's trivial and I want to get this merged in for you). Once CI is green from that commit, I'll squash and merge this.

@tgross tgross merged commit 2748d2a into hashicorp:master Mar 8, 2021
@tgross tgross added this to the 1.1.0 milestone Mar 8, 2021
@tgross
Copy link
Member

tgross commented Mar 8, 2021

Ok, that's been merged! This PR will ship in the next release of Nomad, which is 1.1.0.

@sofixa
Copy link
Contributor Author

sofixa commented Mar 8, 2021

That's great, thanks @tgross ! Any idea when 1.1.0 would come out? I'd like to have this ASAP, but i'm a bit hesitant about running the version from the master branch :D

@tgross
Copy link
Member

tgross commented Mar 8, 2021

Sorry, I can't give you a timeline on that except that it's definitely not in the next couple weeks. But your patch should apply cleanly on top of v1.0.4 as far as I know.

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants