Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Change EKS default version to 1.17 #949

Closed
wants to merge 2 commits into from
Closed

feat: Change EKS default version to 1.17 #949

wants to merge 2 commits into from

Conversation

sc250024
Copy link
Contributor

@sc250024 sc250024 commented Jul 12, 2020

PR o'clock

Description

Upgrading module to support EKS 1.17. Resolves issue #947.

Checklist

@barryib barryib added this to the v13.0.0 milestone Jul 13, 2020
Copy link

@filipedeo filipedeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dpiddockcmp
Copy link
Contributor

Pushing out a major version every time AWS releases a new version seems a little silly to me. Can't we just make cluster_version a required parameter, no default? One last major version bump for this particular issue. Then we'd only need to update the examples which can be a minor version bump.

@barryib @max-rocket-internet what do you think?

@sc250024
Copy link
Contributor Author

Just rebased against the latest master branch. Re-review needed.

@gdurandvadas
Copy link

One note on this. When using node_groups, updating the EKS version to 1.17 doesn't update the node groups AMI version.
Moreover, the AMI version required to work with EKS 1.17 (1.17.7-20200709) is not allowed to be used.

Error: error updating EKS Node Group (main-development:main-development-system-1-keen-glider) version: InvalidParameterException: Requested Nodegroup release version 1.17.7-20200709 is invalid. Allowed release version is 1.16.12-20200710
{
  RespMetadata: {
    StatusCode: 400,
    RequestID: "897202ad-0a43-40d0-a5b8-a2cb72211ee3"
  },
  ClusterName: "main-development",
  Message_: "Requested Nodegroup release version 1.17.7-20200709 is invalid. Allowed release version is 1.16.12-20200710",
  NodegroupName: "main-development-system-1-keen-glider"
}

@sc250024
Copy link
Contributor Author

sc250024 commented Jul 17, 2020

One note on this. When using node_groups, updating the EKS version to 1.17 doesn't update the node groups AMI version.
Moreover, the AMI version required to work with EKS 1.17 (1.17.7-20200709) is not allowed to be used.

Error: error updating EKS Node Group (main-development:main-development-system-1-keen-glider) version: InvalidParameterException: Requested Nodegroup release version 1.17.7-20200709 is invalid. Allowed release version is 1.16.12-20200710
{
  RespMetadata: {
    StatusCode: 400,
    RequestID: "897202ad-0a43-40d0-a5b8-a2cb72211ee3"
  },
  ClusterName: "main-development",
  Message_: "Requested Nodegroup release version 1.17.7-20200709 is invalid. Allowed release version is 1.16.12-20200710",
  NodegroupName: "main-development-system-1-keen-glider"
}

The AMI version is retrieved automatically from local.worker_ami_name_filter, so it must be something else.

@gdurandvadas What AWS Terraform provider version are you using?

@sc250024
Copy link
Contributor Author

sc250024 commented Jul 17, 2020

@gdurandvadas I think it's related to this, in which case I'd need to bump the provider version:

I just pushed another commit, can you try again please?

@sc250024
Copy link
Contributor Author

Pushing out a major version every time AWS releases a new version seems a little silly to me. Can't we just make cluster_version a required parameter, no default? One last major version bump for this particular issue. Then we'd only need to update the examples which can be a minor version bump.

@barryib @max-rocket-internet what do you think?

A counterargument on this: look at the comment from @gdurandvadas above. He discovered that when upgrading his cluster to 1.17, he ran into a problem.

Now, the problem wasn't related to this module; instead, it was a problem with the version of the AWS provider he was using, and was fixed in provider version v2.63.0. By having regular major version releases, we can address problems like this; i.e. bumping the major version of this module gives an nice opportunity to make other breaking changes, if needed.

@gdurandvadas
Copy link

@sc250024 thanks for the update, agreed that this is not the module itself.
My versions.tf has defined aws = ">= 2.52.0" but if I check the .terraform folder, I see the downloaded plugin is terraform-provider-aws_v2.66.0_x4
Let me try with the latest AWS provider version (v2.70.0). I'll come back to you.

@gdurandvadas
Copy link

gdurandvadas commented Jul 17, 2020

The AMI version is retrieved automatically from local.worker_ami_name_filter, so it must be something else.

@gdurandvadas What AWS Terraform provider version are you using?

Bear in mind that I'm using node_groups not workers, and I set ami_release_version="1.17.7-20200709" on the node definition.

@dpiddockcmp
Copy link
Contributor

We can, and do, put out major versions when other breaking changes are made. I don't think that AWS releasing a new supported version of kubernetes in EKS should trigger the module to release a major version. This is unnecessary work for users.

Looking through the changelog, I think the following releases were major version bumps just because of the kubernetes version:

OK, that's less than I would have thought. My memory must be biased due to the previous two kubernetes releases. 1.11, 1.13 and 1.14 were mixed in with other big changes. But having a major release pending does seem to delay the potential for a release. Mainly because we only run a single branch.

I still think having the version set as a default is a little silly 😉

@sc250024
Copy link
Contributor Author

sc250024 commented Jul 20, 2020

We can, and do, put out major versions when other breaking changes are made. I don't think that AWS releasing a new supported version of kubernetes in EKS should trigger the module to release a major version. This is unnecessary work for users.

Looking through the changelog, I think the following releases were major version bumps just because of the kubernetes version:

OK, that's less than I would have thought. My memory must be biased due to the previous two kubernetes releases. 1.11, 1.13 and 1.14 were mixed in with other big changes. But having a major release pending does seem to delay the potential for a release. Mainly because we only run a single branch.

I still think having the version set as a default is a little silly 😉

@dpiddockcmp In the end, it's your guys' module. I don't mind the version bump personally, but if it's decided to get rid of it, then 🤷‍♂️

@filipedeo @barryib Is there anything else needed for this to get merged?

README.md Show resolved Hide resolved
@barryib
Copy link
Member

barryib commented Jul 20, 2020

Pushing out a major version every time AWS releases a new version seems a little silly to me. Can't we just make cluster_version a required parameter, no default? One last major version bump for this particular issue. Then we'd only need to update the examples which can be a minor version bump.

@barryib @max-rocket-internet what do you think?

@dpiddockcmp I don't have a strong opinion on that. Maybe @max-rocket-internet or @antonbabenko have one ? But i think that we shouldn’t be afraid of bumping a major release as long as our documentation and module are consistent for a particular EKS version.

@dpiddockcmp In the end, it's your guys' module. I don't mind the version bump personally, but if it's decided to get rid of it, then 🤷‍♂️

@sc250024 To me it's a community module. It's belong to the community. We are only actual maintainers. That's why @dpiddockcmp is starting a discussion, otherwise he would just drop the default value without asking.

Again thank you for your contributions and more broadly to the whole community ❤️❤️❤️

@sc250024 sc250024 requested a review from barryib July 21, 2020 08:59
sc250024 added 2 commits July 21, 2020 11:03
This commit bumps the version of the AWS Terraform provider to solve the
problem originally reported here:
hashicorp/terraform-provider-aws#12675.
@JesterOrNot JesterOrNot mentioned this pull request Jul 21, 2020
2 tasks
@sc250024
Copy link
Contributor Author

@barryib I made the doc changes you requested, and pushed. Would you mind re-reviewing please?

@barryib
Copy link
Member

barryib commented Jul 23, 2020

@barryib I made the doc changes you requested, and pushed. Would you mind re-reviewing please?

Thanks @sc250024 for your contribution. It's LGTM.

Let's wait a bit for @max-rocket-internet or @antonbabenko thought about removing the cluster_version default value.

@sc250024
Copy link
Contributor Author

@barryib Alright.

@antonbabenko
Copy link
Member

LGTM, too. Thanks, @sc250024

@abdennour
Copy link

LGTM, too. What is the blocker of the merge here ?

@abdennour
Copy link

In the meantime, I updated with eksctl :

kubectl scale deployments/cluster-autoscaler --replicas=0 -n kube-system

eksctl upgrade nodegroup --name=node-group-name --cluster=cluster-name --kubernetes-version=1.17

Good luck!

@sc250024 sc250024 closed this Jul 28, 2020
@sc250024 sc250024 reopened this Jul 28, 2020
@sc250024
Copy link
Contributor Author

@max-rocket-internet Anything to add?

@ldemailly
Copy link

gentle-ping :)

@max-rocket-internet
Copy link
Contributor

Let's wait a bit for @max-rocket-internet or @antonbabenko thought about removing the cluster_version default value.

oooh I could go either way on this one, I don't think it's a big deal to remove the default.....@dpiddockcmp makes good points as always, I think it sounds like a good idea, it would reduce amount of releases and could also perhaps stop a few issues from people asking for the module to "support" the new version (#775 #856)

👍

@ldemailly
Copy link

ldemailly commented Aug 3, 2020

@barryib may you please lift your request for change/can we get this in? (or is a change needed?)

@sc250024
Copy link
Contributor Author

sc250024 commented Aug 7, 2020

So this PR has been open for almost a month. I'm fine with making changes, but someone has to pull the trigger. What's the verdict?

@sc250024 sc250024 closed this Aug 11, 2020
@ldemailly
Copy link

:(

@arashkaffamanesh
Copy link

@sc250024 does it mean 1.17 is supported with the latest release here?
https://github.com/terraform-aws-modules/terraform-aws-eks/releases/tag/v12.2.0

Or is eksctl still needed as mentioned by @abdennour :

eksctl upgrade nodegroup --name=node-group-name --cluster=cluster-name --kubernetes-version=1.17

@filipedeo
Copy link

You'd still need eksctl @arashkaffamanesh, changes were not merged in.

@abdennour
Copy link

@arashkaffamanesh .. see answer of @filipedeo

@cippaciong
Copy link

My 2 cents: I was able to upgrade different EKS clusters to version 1.17 using the latest release of this module (12.2). All of them were configured to use managed node groups but I wasn't affected by the issue described by @gdurandvadas even though in some cases I used AWS provider versions that were older than 2.63.0. The clusters were in different projects and used different configurations and in particular I used these provider versions:

  • 2.70.0
  • 2.61.0
  • 2.56.0

One thing I should mention though, is that when I upgrade my cluster using Terraform, only the control plane is upgraded and I need to upgrade the AMIs of the worker nodes manually, using the AWS console. I've always assumed this was a limitation of managed node groups but looking at @gdurandvadas comments I'm wondering if I'm doing anything wrong in my configuration and I could managed worker nodes AMIs using Terraform.

Just as a reference, here is an example of the configuration I usually use in my EKS clusters.

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 12.2"

  cluster_version = "1.17"
  cluster_name    = var.eks_cluster_name
  subnets         = var.private_subnets

  enable_irsa = true

  tags = {
    Company     = var.company
    Project     = var.project
    Environment = var.environment
    Terraform   = true
  }

  vpc_id = var.vpc_id

  node_groups_defaults = {
    ami_type  = "AL2_x86_64"
    disk_size = 50
  }

  node_groups = {
    base-group = {
      desired_capacity = 4
      max_capacity     = 5
      min_capacity     = 3

      autoscaling_enabled   = true
      protect_from_scale_in = true

      instance_type = "t3a.medium"

      k8s_labels = {
        company     = var.company
        project     = var.project
        environment = var.environment
        terraform   = true
      }
      additional_tags = {
        Company     = var.company
        Project     = var.project
        Environment = var.environment
        Terraform   = true
        # Required by IRSA autoscaler
        "k8s.io/cluster-autoscaler/enabled"                 = true
        "k8s.io/cluster-autoscaler/${var.eks_cluster_name}" = true
      }
    }
  }

  map_roles = var.map_roles
}

@sc250024 sc250024 deleted the feat-Eks117 branch August 12, 2020 11:26
@max-rocket-internet
Copy link
Contributor

Not sure why the PR was closed and branch deleted? But AFAIK, there's nothing stopping anyone using this module to create/upgrade to 1.17.

MNG issue mentioned here is not related to this module.

Or am I misunderstanding?

@arashkaffamanesh
Copy link

I can confirm 1.17.9 is working fine with aws provider 2.70.0 and with the cluster-autoscaler and spot-instances, the only issue which I had was getting the autoscaler deployed with helm to work, somehow it didn't like to work and I had to use an older implementation from 1.15. with it, which worked out-of-the-box.

@cippaciong thanks so much for the hint about the provider aws = "2.70.0"!

➜  k get nodes                                                                                                                                                  
NAME                                          STATUS   ROLES    AGE    VERSION
ip-10-2-90-58.eu-central-1.compute.internal   Ready    <none>   172m   v1.17.9-eks-4c6976
terraform {
  required_version = ">= 0.12.9"

  required_providers {
    # aws        = ">= 2.55.0"
    aws        = "2.70.0"
    local      = ">= 1.4"
    null       = ">= 2.1"
    template   = ">= 2.1"
    random     = ">= 2.1"
    kubernetes = ">= 1.11.1"
  }
}

@ldemailly
Copy link

ldemailly commented Aug 12, 2020

Not sure why the PR was closed and branch deleted?

I am guessing because of non responsiveness of reviewers?

@sc250024
Copy link
Contributor Author

sc250024 commented Aug 12, 2020

Not sure why the PR was closed and branch deleted?

I am guessing because of non responsiveness of reviewers?

Pretty much. And it appears #972 is the preferred solution anyway.

@barryib
Copy link
Member

barryib commented Aug 17, 2020

Not sure why the PR was closed and branch deleted?

I am guessing because of non responsiveness of reviewers?

Pretty much. And it appears #972 is the preferred solution anyway.

@sc250024 Was in vacation for almost a month. Sorry for the delay.

For the records, we don't need this PR to upgrade to EKS 1.17. Just use the latest version of the module and set your cluster_version to 1.17. In the following version of this module, the cluster version will be required. See #972

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.