Behaviour upgrading cluster using --kubernetes-version with version that cannot be found #5809

nanikjava · 2019-11-01T06:28:34Z

While working fixing #2570 noticed a very strange behaviour. Here is how to reproduce (this is using master branch - no changes for 2570 has been applied)

fresh VM with kubernetes version that is not valid

go run ./main.go start --v=7 --vm-driver=virtualbox --kubernetes-version=v1.15.20

it will throw error which is the right behaviour

W1101 17:20:06.990868   10321 kubeadm.go:670] unable to stop kubelet: Process exited with status 1 command: "/bin/bash -c \"pgrep kubelet && sudo systemctl stop kubelet\"" output: ""
* Downloading kubelet v1.15.20
* Downloading kubeadm v1.15.20
W1101 17:20:07.836890   10321 exit.go:101] Failed to update cluster: downloading binaries: downloading kubeadm: Error downloading kubeadm v1.15.20: failed to download: failed to download to temp file: download failed: 1 error(s) occurred:

* received invalid status code: 404 (expected 200)
* 
X Failed to update cluster: downloading binaries: downloading kubeadm: Error downloading kubeadm v1.15.20: failed to download: failed to download to temp file: download failed: 1 error(s) occurred:

* received invalid status code: 404 (expected 200)
* 
* Sorry that minikube crashed. If this was unexpected, we would love to hear from you:
  - https://github.com/kubernetes/minikube/issues/new/choose
exit status 70

run again the command with the correct version

go run ./main.go start --v=7 --vm-driver=virtualbox --kubernetes-version=v1.15.2

will throw an error

101 17:15:01.208745    7836 translate.go:92] Setting Language to en-AU ...
I1101 17:15:01.208927    7836 translate.go:79] Failed to load translation file for en-AU: Asset translations/en-AU.json not found
I1101 17:15:01.209068    7836 out.go:131] Setting OutFile to fd 1 ...
I1101 17:15:01.209079    7836 out.go:172] isatty.IsTerminal(1) = false
I1101 17:15:01.209083    7836 out.go:138] Setting ErrFile to fd 2...
I1101 17:15:01.209086    7836 out.go:172] isatty.IsTerminal(2) = false
I1101 17:15:01.209140    7836 root.go:284] Updating PATH: /home/nanik/.minikube/bin
I1101 17:15:01.228034    7836 start.go:251] hostinfo: {"hostname":"pop-os","uptime":521525,"bootTime":1572067376,"procs":541,"os":"linux","platform":"ubuntu","platformFamily":"debian","platformVersion":"19.04","kernelVersion":"5.0.0-32-generic","virtualizationSystem":"kvm","virtualizationRole":"host","hostid":"92ee54d9-a9b6-5383-1e03-0bdb5d3eccaa"}
I1101 17:15:01.228503    7836 start.go:261] virtualization: kvm host
* minikube v0.0.0-unset on Ubuntu 19.04
I1101 17:15:01.228684    7836 start.go:547] selectDriver: flag="virtualbox", old=&{{false false https://storage.googleapis.com/minikube/iso/minikube-v1.5.0.iso 2000 2 20000 virtualbox docker  [] [] [] [] 192.168.99.1/24  default qemu:///system false false <nil> [] false [] /nfsshares  false false true} {v1.15.20 192.168.99.247 8443 minikube minikubeCA [] [] cluster.local docker    10.96.0.0/12  [] true false}}
I1101 17:15:01.272561    7836 start.go:293] selected: virtualbox
X Error: You have selected Kubernetes v1.15.2, but the existing cluster for your profile is running Kubernetes v1.15.20. Non-destructive downgrades are not supported, but you can proceed by performing one of the following options:

* Recreate the cluster using Kubernetes v1.15.2: Run "minikube delete ", then "minikube start  --kubernetes-version=1.15.2"
* Create a second cluster with Kubernetes v1.15.2: Run "minikube start -p <new name> --kubernetes-version=1.15.2"
* Reuse the existing cluster with Kubernetes v1.15.20 or newer: Run "minikube start  --kubernetes-version=1.15.20"

which means the user will not be able to use the current profile VM unless it is deleted. Is this the correct behaviour ?

The text was updated successfully, but these errors were encountered:

nanikjava · 2019-11-01T06:29:57Z

/assign @nanikjava

nanikjava · 2019-11-01T06:30:07Z

cc: @tstromberg

medyagh · 2019-11-04T22:51:58Z

@nanikjava I noticed your two commands have one differnce
one of them is
v1.15.20 and the other one is v1.15.2

I wonder if that could be source of the problem ? since 20 doesn't exist and 2 exists?

nanikjava · 2019-11-04T23:12:12Z

@nanikjava I noticed your two commands have one differnce
one of them is
v1.15.20 and the other one is v1.15.2

I wonder if that could be source of the problem ? since 20 doesn't exist and 2 exists?

Yes, it is done intentionally to test the behaviour. What I thought should happened was when v1.15.2 specified since it is not available and cannot be installed it should not stop the user from using the correct version, in this case v1.15.20

What is happening is that minikube detect that the previous version used is v1.15.2 but it did not detect that the installation has been successful. Think minikube should be able to detect this and allow the user to install the correct version.

In my opinion this will need to be fixed but what to understand first whether this is the correct behaviour ?

medyagh · 2019-11-04T23:15:35Z

Ah sorry I didnt missed that part you wrote @nanikjava

because of some issues, we have had, we can not Downgrade a minikube to a lower version,
but in this case, the Higher version was not a real version and was a typo mistake !

and you are right, if the kubenetes version they entered is invalid, we should not store it as something that is on the VM !

it worth noting that, we still do not plan on supporting down-grades.

thank you! you found a bug ! it would be wonderful to fix it !

nanikjava · 2019-11-05T00:15:47Z

thank you! you found a bug ! it would be wonderful to fix it !

Yes...I found a bug 👍

Will assign this to me

nanikjava · 2019-11-05T00:15:55Z

/assign @nanikjava

nanikjava · 2019-11-12T21:13:10Z

On further testing the issue can be resolved by stopping minikube from going forward with the process if there is an error in the download process

I1113 07:40:08.305579    3062 main.go:110] libmachine: (minikube) KVM machine creation complete!
I1113 07:40:08.477322    3062 cache_images.go:329] OPENING:  /home/nanik/.minikube/cache/images/gcr.io/k8s-minikube/storage-provisioner_v1.8.1
I1113 07:40:08.478385    3062 cache_images.go:329] OPENING:  /home/nanik/.minikube/cache/images/k8s.gcr.io/kubernetes-dashboard-amd64_v1.10.1
I1113 07:40:08.479838    3062 cache_images.go:329] OPENING:  /home/nanik/.minikube/cache/images/k8s.gcr.io/etcd_3.3.10
I1113 07:40:08.481192    3062 cache_images.go:329] OPENING:  /home/nanik/.minikube/cache/images/k8s.gcr.io/kube-addon-manager_v9.0
I1113 07:40:08.482818    3062 cache_images.go:329] OPENING:  /home/nanik/.minikube/cache/images/k8s.gcr.io/k8s-dns-sidecar-amd64_1.14.13
I1113 07:40:08.483291    3062 cache_images.go:329] OPENING:  /home/nanik/.minikube/cache/images/k8s.gcr.io/k8s-dns-dnsmasq-nanny-amd64_1.14.13
I1113 07:40:08.483394    3062 cache_images.go:329] OPENING:  /home/nanik/.minikube/cache/images/k8s.gcr.io/coredns_1.3.1
I1113 07:40:08.483424    3062 cache_images.go:329] OPENING:  /home/nanik/.minikube/cache/images/k8s.gcr.io/pause_3.1
I1113 07:40:08.486010    3062 cache_images.go:329] OPENING:  /home/nanik/.minikube/cache/images/k8s.gcr.io/kube-scheduler_v1.15.20
I1113 07:40:08.488643    3062 cache_images.go:329] OPENING:  /home/nanik/.minikube/cache/images/k8s.gcr.io/k8s-dns-kube-dns-amd64_1.14.13
I1113 07:40:08.488996    3062 cache_images.go:329] OPENING:  /home/nanik/.minikube/cache/images/k8s.gcr.io/kube-controller-manager_v1.15.20
I1113 07:40:08.489129    3062 cache_images.go:329] OPENING:  /home/nanik/.minikube/cache/images/k8s.gcr.io/kube-apiserver_v1.15.20
I1113 07:40:08.489302    3062 cache_images.go:329] OPENING:  /home/nanik/.minikube/cache/images/k8s.gcr.io/kube-proxy_v1.15.20
I1113 07:40:08.993029    3062 cache_images.go:302] CacheImage: k8s.gcr.io/kube-proxy:v1.15.20 -> /home/nanik/.minikube/cache/images/k8s.gcr.io/kube-proxy_v1.15.20 completed in 1.070584092s
E1113 07:40:08.993097    3062 cache_images.go:80] CacheImage k8s.gcr.io/kube-proxy:v1.15.20 -> /home/nanik/.minikube/cache/images/k8s.gcr.io/kube-proxy_v1.15.20 failed: MANIFEST_UNKNOWN: "Failed to fetch \"v1.15.20\" from request \"/v2/kube-proxy/manifests/v1.15.20\"."
I1113 07:40:08.995040    3062 cache_images.go:302] CacheImage: k8s.gcr.io/kube-controller-manager:v1.15.20 -> /home/nanik/.minikube/cache/images/k8s.gcr.io/kube-controller-manager_v1.15.20 completed in 1.072603012s
E1113 07:40:08.995121    3062 cache_images.go:80] CacheImage k8s.gcr.io/kube-controller-manager:v1.15.20 -> /home/nanik/.minikube/cache/images/k8s.gcr.io/kube-controller-manager_v1.15.20 failed: MANIFEST_UNKNOWN: "Failed to fetch \"v1.15.20\" from request \"/v2/kube-controller-manager/manifests/v1.15.20\"."
I1113 07:40:08.995544    3062 cache_images.go:302] CacheImage: k8s.gcr.io/kube-scheduler:v1.15.20 -> /home/nanik/.minikube/cache/images/k8s.gcr.io/kube-scheduler_v1.15.20 completed in 1.07308876s
E1113 07:40:08.995622    3062 cache_images.go:80] CacheImage k8s.gcr.io/kube-scheduler:v1.15.20 -> /home/nanik/.minikube/cache/images/k8s.gcr.io/kube-scheduler_v1.15.20 failed: MANIFEST_UNKNOWN: "Failed to fetch \"v1.15.20\" from request \"/v2/kube-scheduler/manifests/v1.15.20\"."
I1113 07:40:08.996216    3062 cache_images.go:302] CacheImage: k8s.gcr.io/kube-apiserver:v1.15.20 -> /home/nanik/.minikube/cache/images/k8s.gcr.io/kube-apiserver_v1.15.20 completed in 1.073756535s
E1113 07:40:08.996272    3062 cache_images.go:80] CacheImage k8s.gcr.io/kube-apiserver:v1.15.20 -> /home/nanik/.minikube/cache/images/k8s.gcr.io/kube-apiserver_v1.15.20 failed: MANIFEST_UNKNOWN: "Failed to fetch \"v1.15.20\" from request \"/v2/kube-apiserver/manifests/v1.15.20\"."
I1113 07:40:09.744798    3062 cache_images.go:350] /home/nanik/.minikube/cache/images/k8s.gcr.io/k8s-dns-sidecar-amd64_1.14.13 exists
I1113 07:40:09.745174    3062 cache_images.go:302] CacheImage: k8s.gcr.io/k8s-dns-sidecar-amd64:1.14.13 -> /home/nanik/.minikube/cache/images/k8s.gcr.io/k8s-dns-sidecar-amd64_1.14.13 completed in 1.82243236s
I1113 07:40:09.745227    3062 cache_images.go:83] CacheImage k8s.gcr.io/k8s-dns-sidecar-amd64:1.14.13 -> /home/nanik/.minikube/cache/images/k8s.gcr.io/k8s-dns-sidecar-amd64_1.14.13 succeeded
I1113 07:40:09.763462    3062 cache_images.go:350] /home/nanik/.minikube/cache/images/k8s.gcr.io/pause_3.1 exists
I1113 07:40:09.763494    3062 cache_images.go:302] CacheImage: k8s.gcr.io/pause:3.1 -> /home/nanik/.minikube/cache/images/k8s.gcr.io/pause_3.1 completed in 1.841023643s
I1113 07:40:09.763524    3062 cache_images.go:83] CacheImage k8s.gcr.io/pause:3.1 -> /home/nanik/.minikube/cache/images/k8s.gcr.io/pause_3.1 succeeded

nanikjava · 2019-11-12T21:29:31Z

To make things works faster inside minikube the process of downloading images is running in a separate goroutine, as it will allow the other process of initializing the VM to continue.

There is a checkpoint of checking the state of success/failure of the images inside waitCacheImages(..) at the moment failure are reported as a debug log. This could potential be used to stop minikube and inform user that there is a failure in downloading the images.

nanikjava · 2019-11-12T22:27:01Z

func CacheImages(images []string, cacheDir string) error {
		....
			if err := CacheImage(image, dst); err != nil {
				//glog.Errorf("CacheImage %s -> %s failed: %v", image, dst, err)
				exit.WithError("CacheImage " +  image +  " -> " +  dst, err)
				//return errors.Wrapf(err, "caching image %s", dst)
			}
			glog.Infof("CacheImage %s -> %s succeeded", image, dst)
			return nil
		})
		...
}

Doing the above create issue as the abrupt termination of the VM as it is running on
the main thread .


func waitCacheImages(g *errgroup.Group) {
	if !viper.GetBool(cacheImages) {
		return
	}
	if err := g.Wait(); err != nil {
		exit.WithError("Error caching images: ", err)
	}
}

Exiting the app during waitCacheImages(..) will only report the last recorded error
inside the CacheImages(..) goroutine

To make it useful to the user the error should print out all the different
cache images that failed to download.

Something like this:

Following files cannot be downloaded:

kube-apiserver-15.xx
kube-proxy-15.xx

fejta-bot · 2020-02-11T22:37:28Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

medyagh · 2020-03-04T19:39:49Z

@nanikjava are you still interested in this issue ?

fejta-bot · 2020-04-03T19:51:31Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2020-05-03T20:35:33Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2020-05-03T20:35:47Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot assigned nanikjava Nov 1, 2019

medyagh added the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label Nov 4, 2019

nanikjava changed the title ~~Behaviour upgrading cluster using wrong kubernetes-version~~ Behaviour upgrading cluster using --kubernetes-version with version that cannot be found Nov 13, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 11, 2020

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 3, 2020

k8s-ci-robot closed this as completed May 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Behaviour upgrading cluster using --kubernetes-version with version that cannot be found #5809

Behaviour upgrading cluster using --kubernetes-version with version that cannot be found #5809

nanikjava commented Nov 1, 2019 •

edited

Loading

nanikjava commented Nov 1, 2019

nanikjava commented Nov 1, 2019

medyagh commented Nov 4, 2019

nanikjava commented Nov 4, 2019

medyagh commented Nov 4, 2019 •

edited

Loading

nanikjava commented Nov 5, 2019

nanikjava commented Nov 5, 2019

nanikjava commented Nov 12, 2019

nanikjava commented Nov 12, 2019 •

edited

Loading

nanikjava commented Nov 12, 2019 •

edited

Loading

fejta-bot commented Feb 11, 2020

medyagh commented Mar 4, 2020

fejta-bot commented Apr 3, 2020

fejta-bot commented May 3, 2020

k8s-ci-robot commented May 3, 2020

Behaviour upgrading cluster using --kubernetes-version with version that cannot be found #5809

Behaviour upgrading cluster using --kubernetes-version with version that cannot be found #5809

Comments

nanikjava commented Nov 1, 2019 • edited Loading

nanikjava commented Nov 1, 2019

nanikjava commented Nov 1, 2019

medyagh commented Nov 4, 2019

nanikjava commented Nov 4, 2019

medyagh commented Nov 4, 2019 • edited Loading

nanikjava commented Nov 5, 2019

nanikjava commented Nov 5, 2019

nanikjava commented Nov 12, 2019

nanikjava commented Nov 12, 2019 • edited Loading

nanikjava commented Nov 12, 2019 • edited Loading

fejta-bot commented Feb 11, 2020

medyagh commented Mar 4, 2020

fejta-bot commented Apr 3, 2020

fejta-bot commented May 3, 2020

k8s-ci-robot commented May 3, 2020

nanikjava commented Nov 1, 2019 •

edited

Loading

medyagh commented Nov 4, 2019 •

edited

Loading

nanikjava commented Nov 12, 2019 •

edited

Loading

nanikjava commented Nov 12, 2019 •

edited

Loading