Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eks_node_group unable to update version of workers #12675

Closed
llamahunter opened this issue Apr 4, 2020 · 6 comments · Fixed by #13407
Closed

eks_node_group unable to update version of workers #12675

llamahunter opened this issue Apr 4, 2020 · 6 comments · Fixed by #13407
Assignees
Labels
bug Addresses a defect in current functionality. service/eks Issues and PRs that pertain to the eks service.
Milestone

Comments

@llamahunter
Copy link

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

0.11.14

Affected Resource(s)

  • aws_eks_node_group

Terraform Configuration Files

resource "aws_eks_node_group" "worker" {
  cluster_name    = "${var.datacenter}"
  node_group_name = "${var.worker_group_name}"
  node_role_arn   = "${data.aws_iam_role.worker.arn}"
  subnet_ids      = [ "${var.subnet_ids}" ]

  scaling_config {
    desired_size = "${var.worker_desired_count}"
    max_size     = "${var.worker_max_count}"
    min_size     = "${var.worker_min_count}"
  }

  instance_types = [ "${var.worker_instance_type}" ]
  disk_size = "${var.worker_volume_size}"
  version = "${var.eks_version}"

  remote_access {
    ec2_ssh_key = "${var.ssh_key_name}"
  }
}

Debug Output

Error: Error applying plan:

1 error occurred:
	* module.eks-worker.aws_eks_node_group.worker: 1 error occurred:
2020-04-03T23:59:02.142-0700 [DEBUG] plugin.terraform-provider-kubernetes_v1.11.1_x4: 2020/04/03 23:59:02 [ERR] plugin: plugin server: accept unix /var/folders/cc/7_jb5tld1kvd5jmv8r0dxp0c0000gn/T/plugin583841734: use of closed network connection
2020-04-03T23:59:02.142-0700 [DEBUG] plugin.terraform-provider-aws_v2.56.0_x4: 2020/04/03 23:59:02 [ERR] plugin: plugin server: accept unix /var/folders/cc/7_jb5tld1kvd5jmv8r0dxp0c0000gn/T/plugin491393577: use of closed network connection
	* aws_eks_node_group.worker: error updating EKS Node Group (tek2:general) version: InvalidParameterException: Requested Nodegroup release version 1.14.7-20190927 is invalid. Allowed release version is 1.15.10-20200228
{
  ClusterName: "tek2",
  Message_: "Requested Nodegroup release version 1.14.7-20190927 is invalid. Allowed release version is 1.15.10-20200228",
  NodegroupName: "general"
}

Expected Behavior

Terraform should have performed a rolling update of the worker nodes to the new matching AMI for 1.15 following pod disruption budgets

Actual Behavior

Terraform failed to update the nodes because the old 1.14.7-20190927 AMI release_version attribute got auto added to the terraform state by the provider when the cluster was deployed with k8s 1.14

Steps to Reproduce

  1. Create an eks cluster with 1.14
  2. Create matching 1.14 managed workers
  3. Update control plane version to 1.15
  4. Update worker version to 1.15

Important Factoids

Cluster was previously deployed using managed workers at 1.14 without setting a 'version' or 'release_verison' attribute.

References

@ghost ghost added the service/eks Issues and PRs that pertain to the eks service. label Apr 4, 2020
@github-actions github-actions bot added the needs-triage Waiting for first response or review from a maintainer. label Apr 4, 2020
@bflad bflad added bug Addresses a defect in current functionality. and removed needs-triage Waiting for first response or review from a maintainer. labels May 18, 2020
@bflad bflad self-assigned this May 19, 2020
@bflad
Copy link
Contributor

bflad commented May 19, 2020

Hi folks 👋 Is this issue still reproducible? I just tried replicating it today between 1.15 and 1.16 where Terraform submitted the following request, which oddly enough upgraded just fine:

2020/05/19 15:40:44 [DEBUG] [aws-sdk-go] DEBUG: Request eks/UpdateNodegroupVersion Details:
---[ REQUEST POST-SIGN ]-----------------------------
POST /clusters/tf-acc-test-3848944511257360101/node-groups/tf-acc-test-3848944511257360101/update-version HTTP/1.1
Host: eks.us-west-2.amazonaws.com
User-Agent: aws-sdk-go/1.31.0 (go1.14.2; darwin; amd64) APN/1.0 HashiCorp/1.0 Terraform/0.12.7-sdk (+https://www.terraform.io)
Content-Length: 128
Authorization: AWS4-HMAC-SHA256 Credential=--OMITTED--/20200519/us-west-2/eks/aws4_request, SignedHeaders=content-length;content-type;host;x-amz-date, Signature=3200ded1f843ffd69c4977c8a4e2687d805e3e6d2eba0896fa6688bf137fe838
Content-Type: application/json
X-Amz-Date: 20200519T194044Z
Accept-Encoding: gzip

{"clientRequestToken":"terraform-20200519194044260400000007","force":false,"releaseVersion":"1.15.11-20200507","version":"1.16"}
-----------------------------------------------------
2020/05/19 15:40:44 [DEBUG] [aws-sdk-go] DEBUG: Response eks/UpdateNodegroupVersion Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 200 OK
Connection: close
Content-Length: 237
Content-Type: application/json
Date: Tue, 19 May 2020 19:40:44 GMT
X-Amz-Apigw-Id: My1peF08PHcFTpw=
X-Amzn-Requestid: e9d91eea-3b1a-41b5-9717-8293c63ad790
X-Amzn-Trace-Id: Root=1-5ec4363c-3ea74b1a22329f8ee8800522


-----------------------------------------------------
2020/05/19 15:40:44 [DEBUG] [aws-sdk-go] {"update":{"id":"79f19070-4c92-3f02-a838-692d739dbc28","status":"InProgress","type":"VersionUpdate","params":[{"type":"Version","value":"1.16"},{"type":"ReleaseVersion","value":"1.16.8-20200507"}],"createdAt":1589917244.665,"errors":[]}}

Note that the EKS API automatically fixed the ReleaseVersion to be compatible in its response and subsequent updating of the EKS Node Group. I'm still going to submit the change to only include ReleaseVersion if it has a configuration change just to prevent any odd behaviors in the future, but it may potentially be working already.

bflad added a commit that referenced this issue May 19, 2020
…degroupVersion if changed

Reference: #12675

It was expected to see an error similar to the issue report about the incompatible update and Terraform was previously submitting the state value for `release_version`, however the EKS API as of today was automatically fixing the incorrect value instead of returning an error. The resource change is to ensure that we will only submit correct API parameters should the API return errors for this situation in the future again.

Output from acceptance testing:

```
--- PASS: TestAccAWSEksNodeGroup_disappears (1356.93s)
--- PASS: TestAccAWSEksNodeGroup_InstanceTypes (1519.46s)
--- PASS: TestAccAWSEksNodeGroup_ScalingConfig_MaxSize (1540.56s)
--- PASS: TestAccAWSEksNodeGroup_DiskSize (1550.40s)
--- PASS: TestAccAWSEksNodeGroup_basic (1581.83s)
--- PASS: TestAccAWSEksNodeGroup_AmiType (1591.47s)
--- PASS: TestAccAWSEksNodeGroup_RemoteAccess_SourceSecurityGroupIds (1602.60s)
--- PASS: TestAccAWSEksNodeGroup_ScalingConfig_DesiredSize (1632.41s)
--- PASS: TestAccAWSEksNodeGroup_ScalingConfig_MinSize (1683.52s)
--- PASS: TestAccAWSEksNodeGroup_Tags (1712.30s)
--- PASS: TestAccAWSEksNodeGroup_Labels (1765.19s)
--- PASS: TestAccAWSEksNodeGroup_RemoteAccess_Ec2SshKey (1767.20s)
--- PASS: TestAccAWSEksNodeGroup_ReleaseVersion (2853.92s)
--- PASS: TestAccAWSEksNodeGroup_Version (3045.62s)
```
@bflad bflad added this to the v2.63.0 milestone May 20, 2020
bflad added a commit that referenced this issue May 20, 2020
…degroupVersion if changed (#13407)

Reference: #12675

It was expected to see an error similar to the issue report about the incompatible update and Terraform was previously submitting the state value for `release_version`, however the EKS API as of today was automatically fixing the incorrect value instead of returning an error. The resource change is to ensure that we will only submit correct API parameters should the API return errors for this situation in the future again.

Output from acceptance testing:

```
--- PASS: TestAccAWSEksNodeGroup_disappears (1356.93s)
--- PASS: TestAccAWSEksNodeGroup_InstanceTypes (1519.46s)
--- PASS: TestAccAWSEksNodeGroup_ScalingConfig_MaxSize (1540.56s)
--- PASS: TestAccAWSEksNodeGroup_DiskSize (1550.40s)
--- PASS: TestAccAWSEksNodeGroup_basic (1581.83s)
--- PASS: TestAccAWSEksNodeGroup_AmiType (1591.47s)
--- PASS: TestAccAWSEksNodeGroup_RemoteAccess_SourceSecurityGroupIds (1602.60s)
--- PASS: TestAccAWSEksNodeGroup_ScalingConfig_DesiredSize (1632.41s)
--- PASS: TestAccAWSEksNodeGroup_ScalingConfig_MinSize (1683.52s)
--- PASS: TestAccAWSEksNodeGroup_Tags (1712.30s)
--- PASS: TestAccAWSEksNodeGroup_Labels (1765.19s)
--- PASS: TestAccAWSEksNodeGroup_RemoteAccess_Ec2SshKey (1767.20s)
--- PASS: TestAccAWSEksNodeGroup_ReleaseVersion (2853.92s)
--- PASS: TestAccAWSEksNodeGroup_Version (3045.62s)
```
@bflad
Copy link
Contributor

bflad commented May 20, 2020

As mentioned above, the EKS API may be allowing the previously incorrect behavior of the resource, but we have now also merged the fix to only submit the ReleaseVersion during the UpdateNodegroupVersion API call when its value has changed. This fix will release in version 2.63.0 of the Terraform AWS Provider, likely tomorrow. 👍

@ghost
Copy link

ghost commented May 22, 2020

This has been released in version 2.63.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template for triage. Thanks!

adamdecaf pushed a commit to adamdecaf/terraform-provider-aws that referenced this issue May 28, 2020
…degroupVersion if changed (hashicorp#13407)

Reference: hashicorp#12675

It was expected to see an error similar to the issue report about the incompatible update and Terraform was previously submitting the state value for `release_version`, however the EKS API as of today was automatically fixing the incorrect value instead of returning an error. The resource change is to ensure that we will only submit correct API parameters should the API return errors for this situation in the future again.

Output from acceptance testing:

```
--- PASS: TestAccAWSEksNodeGroup_disappears (1356.93s)
--- PASS: TestAccAWSEksNodeGroup_InstanceTypes (1519.46s)
--- PASS: TestAccAWSEksNodeGroup_ScalingConfig_MaxSize (1540.56s)
--- PASS: TestAccAWSEksNodeGroup_DiskSize (1550.40s)
--- PASS: TestAccAWSEksNodeGroup_basic (1581.83s)
--- PASS: TestAccAWSEksNodeGroup_AmiType (1591.47s)
--- PASS: TestAccAWSEksNodeGroup_RemoteAccess_SourceSecurityGroupIds (1602.60s)
--- PASS: TestAccAWSEksNodeGroup_ScalingConfig_DesiredSize (1632.41s)
--- PASS: TestAccAWSEksNodeGroup_ScalingConfig_MinSize (1683.52s)
--- PASS: TestAccAWSEksNodeGroup_Tags (1712.30s)
--- PASS: TestAccAWSEksNodeGroup_Labels (1765.19s)
--- PASS: TestAccAWSEksNodeGroup_RemoteAccess_Ec2SshKey (1767.20s)
--- PASS: TestAccAWSEksNodeGroup_ReleaseVersion (2853.92s)
--- PASS: TestAccAWSEksNodeGroup_Version (3045.62s)
```
@Nuru
Copy link

Nuru commented Jun 4, 2020

I am still having this problem in AWS provider 2.64.0

I thought I had this issue in 2.64.0 but it turned out I had 2.60.0 cached and that was where I experienced the issue. I was able to upgrade fine with 2.64.0

@rohitgabriel
Copy link

AWS provider 2.64.0 also has the same issue.

@ghost
Copy link

ghost commented Jun 20, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

@ghost ghost locked and limited conversation to collaborators Jun 20, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Addresses a defect in current functionality. service/eks Issues and PRs that pertain to the eks service.
Projects
None yet
4 participants