Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

confusing defaults when enabling scaling stanza #8355

Closed
shantanugadgil opened this issue Jul 3, 2020 · 5 comments · Fixed by #8360
Closed

confusing defaults when enabling scaling stanza #8355

shantanugadgil opened this issue Jul 3, 2020 · 5 comments · Fixed by #8360
Assignees
Labels
theme/autoscaling Issues related to supporting autoscaling theme/jobspec type/bug
Milestone

Comments

@shantanugadgil
Copy link
Contributor

Nomad version

Output from nomad version
Nomad v0.12.0-beta2 (5b80d4e)

Operating system and Environment details

CentOS 7/8, Ubuntu 16.04/18.04

Issue

For the scaling stanza, the min value should use the default group count of 1.
It overrides the default group count of 1 with the min value in the scaling stanza.

Reproduction steps

starting with the default job file, perform the steps mentioned below ...

Job file (if appropriate)

  1. start with the default "short" example.nomad:
job "example" {
  datacenters = ["dc1"]

  group "cache" {
    task "redis" {
      driver = "docker"

      config {
        image = "redis:3.2"

        port_map {
          db = 6379
        }
      }

      resources {
        cpu    = 500
        memory = 256

        network {
          mbits = 10
          port "db" {}
        }
      }
    }
  }
}
  1. I want to enable scaling so I proceed to "just" add the scaling stanza (I would assume this should work)
job "example" {
  datacenters = ["dc1"]

  group "cache" {
    scaling {
      enabled = true
    }

    task "redis" {
      driver = "docker"

      config {
        image = "redis:3.2"

        port_map {
          db = 6379
        }
      }

      resources {
        cpu    = 500
        memory = 256

        network {
          mbits = 10
          port "db" {}
        }
      }
    }
  }
}

only this much doesn't work ... it complains about min not set with error and some spurious whitespaces and a bracket ...

$ nomad plan example.nomad
Error during plan: Unexpected response code: 500 (1 error occurred:
        * Task group cache validation failed: 1 error occurred:
        * Task group scaling policy validation failed: 2 errors occurred:
        * Scaling policy invalid: maximum count must not be less than minimum count
        * Scaling policy invalid: task group count must not be greater than maximum count in scaling policy





)
  1. so now I set the min value to 0 ...
job "example" {
  datacenters = ["dc1"]

  group "cache" {
    scaling {
      enabled = true
      min = 0
    }

    task "redis" {
      driver = "docker"

      config {
        image = "redis:3.2"

        port_map {
          db = 6379
        }
      }

      resources {
        cpu    = 500
        memory = 256

        network {
          mbits = 10
          port "db" {}
        }
      }
    }
  }
}
$ nomad plan example.nomad
+/- Job: "example"
+/- Task Group: "cache" (1 ignore)
  +/- Count: "1" => "0" (forces destroy)
      Task: "redis"

Scheduler dry-run:
- All tasks successfully allocated.

Job Warnings:
1 warning(s):

* Group "cache" has warnings: 1 error occurred:
        * Update max parallel count is greater than task group count (1 > 0). A destructive change would result in the simultaneous replacement of all allocations.

BUG! (I think), The unspecified default group count of 1 was over-ridden by the min value.
Also, the error about max has gone away as it plans to kill off the running instance (!!)
Even if I add min = 0 and max = 10, the error is the same.

Expected behavior

Possible solutions could be:

  • min should use the current default count of 1
  1. max should default to the job count, which is 1.
  2. generate the job file with the explicit default count of 1 ???

Nomad Client logs (if appropriate)

na

Nomad Server logs (if appropriate)

na

@cgbaker
Copy link
Contributor

cgbaker commented Jul 4, 2020

Okay, this is confusing. I'll update the processing to make this work better. But here's the gist... Scaling -> Max is required. Unfortunately, the way I wrote this, the JSON parser is filling out the field as a 0, which is not at all helpful. This needs to be explicitly enforced; I'll fix this right away. The reason is that we didn't want anyone using the autoscaler without a maximum value.

The rest it is mostly as you recommended: for your example, Min and Count are getting the default of 1. But because Max is zero, this causes two violations: max < min and max < count.

There is one addition bit; when a scaling stanza is present with a min, the default task group count is Scaling -> Min. That way, an autoscaled group can be specified without a count:

  group "cache" {
     scaling {
       min = 5
       max = 20
     }
  }

I'll update the docs so that the new rules for Group -> Count are stated, and I'll fix parsing/validation so that neglected Max is more explicit.

@cgbaker cgbaker added this to the 0.12.0 milestone Jul 4, 2020
@cgbaker cgbaker self-assigned this Jul 4, 2020
@cgbaker cgbaker added theme/autoscaling Issues related to supporting autoscaling type/bug theme/jobspec labels Jul 4, 2020
@shantanugadgil
Copy link
Contributor Author

shantanugadgil commented Jul 4, 2020

This might sound nitpicky, but there are subtle nuances what can lead to confusion (not harmful), just confusing:

  • what will be the value if I had explicitly set some >1 group count and then commented it? Will it default to 1 or min?

  • Not sure how "must be set" the values should be, because in my opinion, just enabling scaling should be allowed, with effective values of 1,1,1 (- +) will show up on the gui, but I won't be able to do anything (probably the gui needs a tooltip on mouse over, but that's a different issue)

👍👍👍

@cgbaker
Copy link
Contributor

cgbaker commented Jul 4, 2020

To the first nit: on any job submission (create or update), if count is not specified but min is, count will default to min. if you would like to preserve the existing count, there is an option for that. Some of this will probably be streamlined once we get a sense for usage patterns; in the short term, however, we're erring on the side of backwards compatibility.

The the second nit: there was a lot of debate during RFC as to whether max should be mandatory or not. The consensus is that a) if the user can't determine a reasonable max, they probably shouldn't be using autoscaling; and b) we can't determine a reasonable value for them. Especially with horizontal cluster autoscaling coming online, where improper scaling policy can result in 💸💸💸.

cgbaker added a commit that referenced this issue Jul 4, 2020
…messages

* made api.Scaling.Max a pointer, so we can detect (and complain) when it is neglected
* added checks to HCL parsing that it is present
* when Scaling.Max is absent/invalid, don't return extraneous error messages during validation
* tweak to multiregion handling to ensure that the count is valid on the interpolated regional jobs

resolves #8355
@shantanugadgil
Copy link
Contributor Author

  1. I am sure this would have been mulled over by folks a lot, but shouldn't "preserving count" be the default when "count > min" ? and only if a min > count is specified, then only make count = min
    (and shouldn't "reset count to min" (or "update counts") be the additional parameter? (vs. "-preserve-counts")

*** I am not too hung up about this for now at all.

note: my thoughts could be driven by having used AWS' ASG, so that's there! (min, max, desired)

  1. I agree, if one doesn't need scaling, why are you enabling the stanza to begin with! 😄 , but then, if I were to only set min and not max, can't max just default to min ?

Ohhh ... the various possible combinations are making my head hurt, and a little voice in my head is saying that "how about just make everything mandatory to be specified 😈 "

Anyway, anything is fine by me, just as long as the nomad plan tells the absolute correct thing what is going to happen 👍

@github-actions
Copy link

github-actions bot commented Nov 5, 2022

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 5, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
theme/autoscaling Issues related to supporting autoscaling theme/jobspec type/bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants