Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate and document 0.12 mbits/network deprecations #8743

Merged
merged 13 commits into from
Nov 23, 2020

Conversation

nickethier
Copy link
Member

@nickethier nickethier commented Aug 25, 2020

This PR adds two warnings to job validation including a check for task network resources and mbits in the group network.

In addition a notice is added to the upgrade docs about the deprecation.

fixes #8497 #8681 #8746 #8780

@nickethier nickethier requested a review from notnoop August 25, 2020 19:58
@nickethier nickethier self-assigned this Aug 25, 2020
@nickethier nickethier marked this pull request as draft August 25, 2020 19:59
@arianvp
Copy link

arianvp commented Aug 26, 2020

Note that the homepage still uses resources.network as an example: https://www.nomadproject.io/docs/job-specification I'm not sure if this is already fixed in master; but I found it odd as it uses a feature that isn't documented on the rest of the website anymore

@nickethier
Copy link
Member Author

I changed the mock Job to reflect the task network changes and ended up playing quite a bit of whack-a-test as a result. I also updated the init command to render jobs with network and service definitions in the task group with links back to docs site.

@nickethier nickethier force-pushed the f-task_network_warning branch from 8922a38 to b0acc33 Compare September 12, 2020 03:04
@notnoop notnoop added this to the 0.12.5 milestone Sep 14, 2020
Copy link
Contributor

@notnoop notnoop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good - a couple of questions nitpicking about versions.

api/resources.go Outdated Show resolved Hide resolved
if len(tgNetworks) > 0 {
ports := tgNetworks[0].PortLabels()
for portLabel := range ports {
portLabels[portLabel] = struct{}{}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a backward compatibility concern here? If an old job that specifies ports and labels in task resources, will they continue to be validated/processed the same way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking here again - if the answer is no, let's merge then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@notnoop I've re-added the removed block, appending task level port labels to the set of port labels to validate against.

drivers/docker/driver.go Show resolved Hide resolved
@shoenig shoenig self-requested a review September 16, 2020 16:26
@shoenig
Copy link
Member

shoenig commented Sep 16, 2020

I'm pretty sure this test failure is relevant, I'll dig into it.

    --- FAIL: TestHTTP_JobPlanRegion/api_region_takes_precedence (0.14s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x2672958]

goroutine 39139 [running]:
testing.tRunner.func1.1(0x2c52e20, 0x4dcf3f0)
	/usr/local/go/src/testing/testing.go:1057 +0x30d
testing.tRunner.func1(0xc008172600)
	/usr/local/go/src/testing/testing.go:1060 +0x41a
panic(0x2c52e20, 0x4dcf3f0)
	/usr/local/go/src/runtime/panic.go:969 +0x175
github.com/hashicorp/nomad/command/agent.ApiNetworkResourceToStructs(0xc00361c1a0, 0x1, 0x4, 0x0, 0x0, 0x0)
	/home/circleci/go/src/github.com/hashicorp/nomad/command/agent/job_endpoint.go:1196 +0x138

@shoenig
Copy link
Member

shoenig commented Sep 16, 2020

The updated documented examples seem to work as expected:

[nzxt foo] $ nomad job init
Example job file written to example.nomad
[nzxt foo] $ nomad job run example.nomad
==> Monitoring evaluation "80e90cfc"
    Evaluation triggered by job "example"
    Allocation "2f608bc0" created: node "d079b198", group "cache"
    Evaluation within deployment: "1182e4ad"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "80e90cfc" finished with status "complete"
[nzxt foo] $ nomad job init -short
Example job file written to example.nomad
[nzxt foo] $ nomad job run example.nomad
==> Monitoring evaluation "fe683a47"
    Evaluation triggered by job "example"
    Allocation "eae73160" created: node "d079b198", group "cache"
    Evaluation within deployment: "5d30c927"
    Allocation "eae73160" status changed: "pending" -> "running" (Tasks are running)
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "fe683a47" finished with status "complete"

A warning when setting task.resources.networks.mbits:

task.resources.networks.mbits set
$ nomad job run example.nomad
Job Warnings:
1 warning(s):

* Group "cache" has warnings: 1 error occurred:
	* 1 error occurred:
	* Task "redis": task network resources have been deprecated as of Nomad 0.12.0. Please configure networking via group network block.

A warning when setting group.networks.mbits:

$ nomad job run example.nomad
Job Warnings:
1 warning(s):

* Group "cache" has warnings: 1 error occurred:
	* mbits has been deprecated as of Nomad 0.12.0. Please remove mbits from the network block

@shoenig
Copy link
Member

shoenig commented Sep 16, 2020

The port validation in validateServices that @mahmood pointed out might still be a problem.

There's this difference in behavior between master and this PR using this job file:

job "example" {
  datacenters = ["dc1"]

  group "cache" {
    task "redis" {

      service {
	name = "rservice"
	port = "db"
      }
      
      driver = "docker"

      config {
        image = "redis:3.2"

        port_map {
          db = 6379
        }
      }

      resources {
        cpu    = 500
        memory = 256

        network {
          mbits = 10
          port "db" {}
        }
      }
    }
  }
}

on master)

$ nomad job run example.nomad
==> Monitoring evaluation "a9dd7ade"
    Evaluation triggered by job "example"
    Allocation "695c295f" created: node "d123aad0", group "cache"
    Evaluation within deployment: "857f2f8f"
    Allocation "695c295f" status changed: "pending" -> "running" (Tasks are running)
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "a9dd7ade" finished with status "complete"

on PR)

$ nomad job run example.nomad
Error submitting job: Unexpected response code: 500 (1 error occurred:
	* Task group cache validation failed: 1 error occurred:
	* Task redis validation failed: 1 error occurred:
	* 1 error occurred:
	* port label "db" referenced by services rservice does not exist

@tgross tgross modified the milestones: 0.12.5, 0.13 Sep 18, 2020
Enable users to submit jobs that still make use of the deprecated
task.resources.network stanza. Such jobs can be submitted, but
will emit a warning.
@shoenig
Copy link
Member

shoenig commented Nov 23, 2020

Added a fix for the above backwards incompatibility, now the job is submissible with warnings as expected

$ nomad job run example.nomad 
Job Warnings:
1 warning(s):

* Group "cache" has warnings: 1 error occurred:
	* 1 error occurred:
	* Task "redis": task network resources have been deprecated as of Nomad 0.12.0. Please configure networking via group network block.





==> Monitoring evaluation "6e8cc203"
    Evaluation triggered by job "example"
    Allocation "b3a2ccbe" created: node "1b39f488", group "cache"
==> Monitoring evaluation "6e8cc203"
    Evaluation within deployment: "c8339b4c"
    Allocation "b3a2ccbe" status changed: "pending" -> "running" (Tasks are running)
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "6e8cc203" finished with status "complete"

@shoenig
Copy link
Member

shoenig commented Nov 23, 2020

I'll add a few e2e tests around this, but in a separate PR.

@shoenig shoenig merged commit 289d91d into master Nov 23, 2020
@shoenig shoenig deleted the f-task_network_warning branch November 23, 2020 21:36
@tgross tgross added the hcc/cst Admin - internal label Jan 26, 2021
@kpweiler
Copy link

kpweiler commented Feb 3, 2021

@shoenig - I think the above workaround might be misunderstanding the problem. The problem is that services can be defined at the group OR the task level, but network resources defined at the task level are deprecated. If a port is defined at the group level, it can't seem to be used at the task level - hence:

port label "db" referenced by services rservice does not exist

For some context - I want to define my service at the task level in a "count > 1" situation so that I can apply a consul service tag to differentiate the different instances (using NOMAD_ALLOC_INDEX) - this yields DNS entries like instance-0.myservice.service.consul from instance-${NOMAD_ALLOC_INDEX}.myservice.service.consul. Maybe there's a better way to do this?

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
hcc/cst Admin - internal
Projects
None yet
7 participants