Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kops failed to change master's instance type #5771

Closed
koooge opened this issue Sep 12, 2018 · 4 comments
Closed

kops failed to change master's instance type #5771

koooge opened this issue Sep 12, 2018 · 4 comments

Comments

@koooge
Copy link
Contributor

koooge commented Sep 12, 2018

1. What kops version are you running? The command kops version, will display
this information.

$ kops version
Version 1.10.0

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

now kops and I can't connect to kube-apiserver 😩

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-08T16:31:10Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"darwin/amd64"}
Unable to connect to the server: EOF

actually I had used before the upgrade

Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:05:37Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

3. What cloud provider are you using?
aws

4. What commands did you run? What is the simplest way to reproduce this issue?

$ kops edit ig --name <cluster name> master-ap-northeast-1a --state s3:://<state bucket>

I changed master's instance type t2.medium to m5.large.

$ kops update cluster <cluster name> --state s3://<state bucket> --yes

*********************************************************************************

A new kubernetes version is available: 1.10.5
Upgrading is recommended (try kops upgrade cluster)

More information: https://github.com/kubernetes/kops/blob/master/permalinks/upgrade_k8s.md#1.10.5

*********************************************************************************

I0912 10:51:03.952328    7698 apply_cluster.go:505] Gossip DNS: skipping DNS validation
I0912 10:51:03.973493    7698 executor.go:103] Tasks: 0 done / 78 total; 30 can run
I0912 10:51:05.071553    7698 executor.go:103] Tasks: 30 done / 78 total; 24 can run
I0912 10:51:05.788448    7698 executor.go:103] Tasks: 54 done / 78 total; 20 can run
I0912 10:51:06.633464    7698 executor.go:103] Tasks: 74 done / 78 total; 3 can run
I0912 10:51:06.880619    7698 executor.go:103] Tasks: 77 done / 78 total; 1 can run
I0912 10:51:06.961652    7698 executor.go:103] Tasks: 78 done / 78 total; 0 can run
Will modify resources:
  LaunchConfiguration/master-ap-northeast-1a.masters.mspf-staging.k8s.local
  	InstanceType        	 t2.medium -> m5.large

Must specify --yes to apply changes
$ kops rolling-update cluster <cluster name> --state <state> --yes
NAME			STATUS		NEEDUPDATE	READY	MIN	MAX	NODES
master-ap-northeast-1a	NeedsUpdate	1		0	1	1	1
nodes			NeedsUpdate	3		0	3	20	3
I0912 10:51:33.134878    7763 instancegroups.go:157] Draining the node: "<ip>".
node "<name>" cordoned
node "<name>" cordoned
...
pod "dns-controller-5f7b89b574-5v7x9" evicted
pod "cluster-autoscaler-54dd5bff86-r2x8b" evicted
...
node "ip" drained
I0912 10:51:38.469973    7763 instancegroups.go:338] Waiting for 1m30s for pods to stabilize after draining.
I0912 10:53:08.475802    7763 instancegroups.go:278] Stopping instance "<incetance id>", node "<name>", in group "<name>" (this may take a while).

I0912 10:58:08.979214    7763 instancegroups.go:188] Validating the cluster.
I0912 10:58:09.172443    7763 instancegroups.go:248] Cluster did not validate, will try again in "30s" until duration "5m0s" expires: error listing nodes: Get https://<path to>.ap-northeast-1.elb.amazonaws.com/api/v1/nodes: EOF.

(snip)

E0912 11:03:09.169467    7763 instancegroups.go:193] Cluster did not validate within 5m0s

master not healthy after update, stopping rolling-update: "error validating cluster after removing a node: cluster did not validate within a duation of \"5m0s\""

5. What happened after the commands executed?
I can't connect to kubernetes cluster

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-08T16:31:10Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"darwin/amd64"}
Unable to connect to the server: EOF
$ kops validate cluster <name>--state s3://<bucket>
Validating cluster <name>


unexpected error during validation: error listing nodes: Get https://<path to>.ap-northeast-1.elb.amazonaws.com/api/v1/nodes: EOF

6. What did you expect to happen?
updgrade successfully

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

apiVersion: kops/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: 2018-05-28T01:27:39Z
  name: <name>
spec:
  additionalPolicies:
    master: |
      [
        {
          "Effect": "Allow",
          "Action": [
            "autoscaling:DescribeAutoScalingGroups",
            "autoscaling:DescribeAutoScalingInstances",
            "autoscaling:SetDesiredCapacity",
            "autoscaling:TerminateInstanceInAutoScalingGroup"
          ],
          "Resource": "*"
        }
      ]
  api:
    loadBalancer:
      type: Public
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: aws
  configBase: s3://<bucket>
  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-ap-northeast-1a
      name: a
    name: main
  - etcdMembers:
    - instanceGroup: master-ap-northeast-1a
      name: a
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
  kubernetesApiAccess:
  - 0.0.0.0/0
  kubernetesVersion: 1.10.3
  masterInternalName: <name>
  masterPublicName: <name>
  networkCIDR: ,cidr>
  networkID: <id>
  networking:
    kubenet: {}
  nonMasqueradeCIDR: <cidr>
  sshAccess:
  - <cidr>
  subnets:
  - cidr: <cidr>
    name: ap-northeast-1a
    type: Public
    zone: ap-northeast-1a
  - cidr: <cidr>
    name: ap-northeast-1c
    type: Public
    zone: ap-northeast-1c
  topology:
    dns:
      type: Public
    masters: public
    nodes: public

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-05-28T01:27:39Z
  labels:
    kops.k8s.io/cluster: <name>
  name: master-ap-northeast-1a
spec:
  additionalSecurityGroups:
  - <sg>
  image: kope.io/k8s-1.9-debian-jessie-amd64-hvm-ebs-2018-03-11
  machineType: m5.large
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-ap-northeast-1a
  role: Master
  rootVolumeSize: 30
  subnets:
  - ap-northeast-1a

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-05-28T01:27:39Z
  labels:
    kops.k8s.io/cluster: <name>
  name: nodes
spec:
  additionalSecurityGroups:
  - <sg>
  image: kope.io/k8s-1.9-debian-jessie-amd64-hvm-ebs-2018-03-11
  machineType: c5.large
  maxSize: 20
  minSize: 3
  nodeLabels:
    kops.k8s.io/instancegroup: nodes
  role: Node
  rootVolumeSize: 30
  subnets:
  - ap-northeast-1a
  - ap-northeast-1c

8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.

9. Anything else do we need to know?
refs? #5755

but I got an anther issue t2.medium -> t2.large

@koooge
Copy link
Contributor Author

koooge commented Sep 12, 2018

Although master instance is alive, CLB shows the status OutOfService(443).

@koooge koooge changed the title kops failed to change instance type kops failed to change master's instance type Sep 12, 2018
@koooge
Copy link
Contributor Author

koooge commented Sep 12, 2018

I found a point of something strange when I try to change it t2medium => t2.large. In spite of the target master's EC2 instance was already deleted by kops, k8s is disabling the previous master (c)

NAME                                                STATUS                     ROLES     AGE       VERSION
<a>    Ready                      node      20d       v1.10.3
<b>    NotReady                   master    31m       v1.10.3
<c>  Ready,SchedulingDisabled   master    20d       v1.10.3
<d>  Ready                      node      20d       v1.10.3

@jmthvt
Copy link
Contributor

jmthvt commented Nov 2, 2018

The current kops image doesn't work with 5th gen EC2 instances.

New AWS instance types: P3, C5, M5, H1. Please note that NVME volumes are not supported on the default jessie image, so masters will not boot on M5 and C5 instance types unless a stretch image is chosen (change stretch to jessie in the image name). Also note that kubernetes will not support mounting persistent volumes on NVME instances until Kubernetes v1.9.

https://github.com/kubernetes/kops/blob/master/docs/releases/1.8-NOTES.md#significant-changes

Unfortunately the default image in kops 1.10 is still on Jessie.

@koooge
Copy link
Contributor Author

koooge commented Nov 21, 2018

OIC this is kops's spec. 😢
I close this issue. Thank you!

@koooge koooge closed this as completed Nov 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants