Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service name duplicate check does not expand placeholders #6637

Closed
nvx opened this issue Nov 7, 2019 · 10 comments · Fixed by #10868
Closed

Service name duplicate check does not expand placeholders #6637

nvx opened this issue Nov 7, 2019 · 10 comments · Fixed by #10868
Assignees
Milestone

Comments

@nvx
Copy link
Contributor

nvx commented Nov 7, 2019

Nomad version

Nomad v0.10.1 (829f9af35c77d564b3dab74454eeba9bf25e2df8)

Operating system and Environment details

Linux

Issue

When running nomad job run or nomad job plan, there appears to be a validation step that checks for duplicate Consul service names. The problem is this check doesn't appear to be expanding placeholders such as NOMAD_TASK_NAME resulting in false positives about duplicates.

This is a regression from previous versions, although I'm not 100% sure which version the regression appeared in.

Reproduction steps

Run run a job plan on the provided job file

Job file (if appropriate)

job "test" {
  datacenters = ["test"]

  type = "service"

  group "group" {
    task "task1" {
      driver = "docker"

      config {
        image = "docker.elastic.co/elasticsearch/elasticsearch:7.4.2"
        port_map {
          http = 9200
        }
      }

      env {
        discovery.type = "single-node"
        TAKE_FILE_OWNERSHIP = "yes"
      }

      resources {
        cpu = 500
        memory = 2048
        network {
          port "http" {}
        }
      }

      service {
        name = "${NOMAD_JOB_NAME}-${NOMAD_TASK_NAME}"
        port = "http"
        address_mode = "host"
        check {
          name = "health"
          type = "http"
          path = "/"
          interval = "5s"
          timeout = "2s"
        }
      }
    }

    task "task2" {
      driver = "docker"

      config {
        image = "docker.elastic.co/elasticsearch/elasticsearch:7.4.2"
        port_map {
          http = 9200
        }
      }

      env {
        discovery.type = "single-node"
        TAKE_FILE_OWNERSHIP = "yes"
      }

      resources {
        cpu = 500
        memory = 2048
        network {
          port "http" {}
        }
      }

      service {
        name = "${NOMAD_JOB_NAME}-${NOMAD_TASK_NAME}"
        port = "http"
        address_mode = "host"
        check {
          name = "health"
          type = "http"
          path = "/"
          interval = "5s"
          timeout = "2s"
        }
      }
    }
  }
}

Nomad Client logs (if appropriate)

$ nomad job plan test.nomad
Error during plan: Unexpected response code: 500 (1 error(s) occurred:

* Task group group validation failed: 1 error(s) occurred:

* Task group service validation failed: 1 error(s) occurred:

* Service ${NOMAD_JOB_NAME}-${NOMAD_TASK_NAME} is duplicate)
$
@tgross
Copy link
Member

tgross commented Nov 8, 2019

Hi @nvx and thanks for this report! It looks like I may have introduced this regression in c4a45a6 when we added validation for Task Group services. We're checking for collisions between the Task and it's Task Group, but none of the tasks have had their environment interpolated yet. This is exposing an existing bug where we were checking for collisions within a task but not between different tasks in a job.

I don't think we have the task environment at this stage of validation to do the interpolation, so the solution may need to be that we allow for possible collisions across tasks at the validation stage and let it bubble up as errors on service registration. I'll look into that.

@tgross tgross self-assigned this Nov 8, 2019
@tgross tgross added this to the near-term milestone Nov 8, 2019
@nvx
Copy link
Contributor Author

nvx commented Nov 11, 2019

Makes sense.

Is it actually invalid to have multiple tasks registering the same service name in Consul though? Logically it might be a bit strange (I'd expect it more across different task groups, or when the group has a count > 1) but I don't think Consul would have an issue with it?

@tgross
Copy link
Member

tgross commented Nov 11, 2019

Strictly speaking, no it's not invalid to have multiple tasks registering the same service name in Consul. But it'd be a very unusual case where someone would intentionally name multiple services the same name within a task group (and it's a common configuration error), so we validate for it.

@tgross
Copy link
Member

tgross commented Jan 8, 2020

Now that I've had a chance to work through #6836, which turns out not to be related, I wanted to circle back to the UX of this. The task environment doesn't exist at the time we do the validation. In c4a45a6 we describe the validation as:

// validateServices runs Service.Validate() on group-level services,
// checks that group services do not conflict with task services and that
// group service checks that refer to tasks only refer to tasks that exist.

I'm wondering if it makes any sense for us to check for name conflicts; while it's a weird case, it's not strictly an error and it's something Consul can manage just fine with so long as you've got the right checks. Maybe we should drop that part of the validation and only check that the group service checks refer to tasks that exist. @schmichael any thoughts here?

@tgross tgross modified the milestones: near-term, unscheduled Jan 9, 2020
@tommyalatalo
Copy link

I would like to chime in on this and say that the service name collision check between tasks in the same group should be removed. It's a hindrance and I've just come upon a use case where I would want multiple tasks in the same group registering separate instances of something as the same service name.

@tgross tgross removed their assignment Mar 5, 2020
@davidatkinsondoyle
Copy link

I also have a use case where registering a variant of a service in tasks within the group is required.

In my case running a Redis Cluster with a master and slave on the same node. Both having different ports, different tags and different host_volumes.

Only when I ran the final version of the job file i found it blocked by the validation. As @tgross indicates, if this isn't an error could the validation be removed?

@frederikbosch
Copy link

Running into a similar case as @davidatkinsondoyle with related to same service name, different tags. I also think the validation should be removed.

@frederikbosch
Copy link

If you move them out of the group, there is no validation.

@tgross
Copy link
Member

tgross commented Apr 30, 2020

I should have followed up here. We are going to remove the duplicate check, but I haven't had a chance to make the change yet.

@tgross tgross removed this from the unscheduled milestone Feb 12, 2021
@tgross tgross added this to the 1.1.3 milestone Jul 8, 2021
@tgross tgross self-assigned this Jul 8, 2021
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants