Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Behaviour upgrading cluster using --kubernetes-version with version that cannot be found #5809

Closed
nanikjava opened this issue Nov 1, 2019 · 15 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.

Comments

@nanikjava
Copy link
Contributor

nanikjava commented Nov 1, 2019

While working fixing #2570 noticed a very strange behaviour. Here is how to reproduce (this is using master branch - no changes for 2570 has been applied)

  1. fresh VM with kubernetes version that is not valid

go run ./main.go start --v=7 --vm-driver=virtualbox --kubernetes-version=v1.15.20

it will throw error which is the right behaviour

W1101 17:20:06.990868   10321 kubeadm.go:670] unable to stop kubelet: Process exited with status 1 command: "/bin/bash -c \"pgrep kubelet && sudo systemctl stop kubelet\"" output: ""
* Downloading kubelet v1.15.20
* Downloading kubeadm v1.15.20
W1101 17:20:07.836890   10321 exit.go:101] Failed to update cluster: downloading binaries: downloading kubeadm: Error downloading kubeadm v1.15.20: failed to download: failed to download to temp file: download failed: 1 error(s) occurred:

* received invalid status code: 404 (expected 200)
* 
X Failed to update cluster: downloading binaries: downloading kubeadm: Error downloading kubeadm v1.15.20: failed to download: failed to download to temp file: download failed: 1 error(s) occurred:

* received invalid status code: 404 (expected 200)
* 
* Sorry that minikube crashed. If this was unexpected, we would love to hear from you:
  - https://github.com/kubernetes/minikube/issues/new/choose
exit status 70

run again the command with the correct version

go run ./main.go start --v=7 --vm-driver=virtualbox --kubernetes-version=v1.15.2

will throw an error

101 17:15:01.208745    7836 translate.go:92] Setting Language to en-AU ...
I1101 17:15:01.208927    7836 translate.go:79] Failed to load translation file for en-AU: Asset translations/en-AU.json not found
I1101 17:15:01.209068    7836 out.go:131] Setting OutFile to fd 1 ...
I1101 17:15:01.209079    7836 out.go:172] isatty.IsTerminal(1) = false
I1101 17:15:01.209083    7836 out.go:138] Setting ErrFile to fd 2...
I1101 17:15:01.209086    7836 out.go:172] isatty.IsTerminal(2) = false
I1101 17:15:01.209140    7836 root.go:284] Updating PATH: /home/nanik/.minikube/bin
I1101 17:15:01.228034    7836 start.go:251] hostinfo: {"hostname":"pop-os","uptime":521525,"bootTime":1572067376,"procs":541,"os":"linux","platform":"ubuntu","platformFamily":"debian","platformVersion":"19.04","kernelVersion":"5.0.0-32-generic","virtualizationSystem":"kvm","virtualizationRole":"host","hostid":"92ee54d9-a9b6-5383-1e03-0bdb5d3eccaa"}
I1101 17:15:01.228503    7836 start.go:261] virtualization: kvm host
* minikube v0.0.0-unset on Ubuntu 19.04
I1101 17:15:01.228684    7836 start.go:547] selectDriver: flag="virtualbox", old=&{{false false https://storage.googleapis.com/minikube/iso/minikube-v1.5.0.iso 2000 2 20000 virtualbox docker  [] [] [] [] 192.168.99.1/24  default qemu:///system false false <nil> [] false [] /nfsshares  false false true} {v1.15.20 192.168.99.247 8443 minikube minikubeCA [] [] cluster.local docker    10.96.0.0/12  [] true false}}
I1101 17:15:01.272561    7836 start.go:293] selected: virtualbox
X Error: You have selected Kubernetes v1.15.2, but the existing cluster for your profile is running Kubernetes v1.15.20. Non-destructive downgrades are not supported, but you can proceed by performing one of the following options:

* Recreate the cluster using Kubernetes v1.15.2: Run "minikube delete ", then "minikube start  --kubernetes-version=1.15.2"
* Create a second cluster with Kubernetes v1.15.2: Run "minikube start -p <new name> --kubernetes-version=1.15.2"
* Reuse the existing cluster with Kubernetes v1.15.20 or newer: Run "minikube start  --kubernetes-version=1.15.20"

which means the user will not be able to use the current profile VM unless it is deleted. Is this the correct behaviour ?

@nanikjava
Copy link
Contributor Author

/assign @nanikjava

@nanikjava
Copy link
Contributor Author

cc: @tstromberg

@medyagh
Copy link
Member

medyagh commented Nov 4, 2019

@nanikjava I noticed your two commands have one differnce
one of them is
v1.15.20 and the other one is v1.15.2

I wonder if that could be source of the problem ? since 20 doesn't exist and 2 exists?

@medyagh medyagh added the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label Nov 4, 2019
@nanikjava
Copy link
Contributor Author

@nanikjava I noticed your two commands have one differnce
one of them is
v1.15.20 and the other one is v1.15.2

I wonder if that could be source of the problem ? since 20 doesn't exist and 2 exists?

Yes, it is done intentionally to test the behaviour. What I thought should happened was when v1.15.2 specified since it is not available and cannot be installed it should not stop the user from using the correct version, in this case v1.15.20

What is happening is that minikube detect that the previous version used is v1.15.2 but it did not detect that the installation has been successful. Think minikube should be able to detect this and allow the user to install the correct version.

In my opinion this will need to be fixed but what to understand first whether this is the correct behaviour ?

@medyagh
Copy link
Member

medyagh commented Nov 4, 2019

Ah sorry I didnt missed that part you wrote @nanikjava

because of some issues, we have had, we can not Downgrade a minikube to a lower version,
but in this case, the Higher version was not a real version and was a typo mistake !

and you are right, if the kubenetes version they entered is invalid, we should not store it as something that is on the VM !

it worth noting that, we still do not plan on supporting down-grades.

thank you! you found a bug ! it would be wonderful to fix it !

@medyagh medyagh added kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Nov 4, 2019
@nanikjava
Copy link
Contributor Author

thank you! you found a bug ! it would be wonderful to fix it !

Yes...I found a bug 👍

Will assign this to me

@nanikjava
Copy link
Contributor Author

/assign @nanikjava

@nanikjava
Copy link
Contributor Author

On further testing the issue can be resolved by stopping minikube from going forward with the process if there is an error in the download process

I1113 07:40:08.305579    3062 main.go:110] libmachine: (minikube) KVM machine creation complete!
I1113 07:40:08.477322    3062 cache_images.go:329] OPENING:  /home/nanik/.minikube/cache/images/gcr.io/k8s-minikube/storage-provisioner_v1.8.1
I1113 07:40:08.478385    3062 cache_images.go:329] OPENING:  /home/nanik/.minikube/cache/images/k8s.gcr.io/kubernetes-dashboard-amd64_v1.10.1
I1113 07:40:08.479838    3062 cache_images.go:329] OPENING:  /home/nanik/.minikube/cache/images/k8s.gcr.io/etcd_3.3.10
I1113 07:40:08.481192    3062 cache_images.go:329] OPENING:  /home/nanik/.minikube/cache/images/k8s.gcr.io/kube-addon-manager_v9.0
I1113 07:40:08.482818    3062 cache_images.go:329] OPENING:  /home/nanik/.minikube/cache/images/k8s.gcr.io/k8s-dns-sidecar-amd64_1.14.13
I1113 07:40:08.483291    3062 cache_images.go:329] OPENING:  /home/nanik/.minikube/cache/images/k8s.gcr.io/k8s-dns-dnsmasq-nanny-amd64_1.14.13
I1113 07:40:08.483394    3062 cache_images.go:329] OPENING:  /home/nanik/.minikube/cache/images/k8s.gcr.io/coredns_1.3.1
I1113 07:40:08.483424    3062 cache_images.go:329] OPENING:  /home/nanik/.minikube/cache/images/k8s.gcr.io/pause_3.1
I1113 07:40:08.486010    3062 cache_images.go:329] OPENING:  /home/nanik/.minikube/cache/images/k8s.gcr.io/kube-scheduler_v1.15.20
I1113 07:40:08.488643    3062 cache_images.go:329] OPENING:  /home/nanik/.minikube/cache/images/k8s.gcr.io/k8s-dns-kube-dns-amd64_1.14.13
I1113 07:40:08.488996    3062 cache_images.go:329] OPENING:  /home/nanik/.minikube/cache/images/k8s.gcr.io/kube-controller-manager_v1.15.20
I1113 07:40:08.489129    3062 cache_images.go:329] OPENING:  /home/nanik/.minikube/cache/images/k8s.gcr.io/kube-apiserver_v1.15.20
I1113 07:40:08.489302    3062 cache_images.go:329] OPENING:  /home/nanik/.minikube/cache/images/k8s.gcr.io/kube-proxy_v1.15.20
I1113 07:40:08.993029    3062 cache_images.go:302] CacheImage: k8s.gcr.io/kube-proxy:v1.15.20 -> /home/nanik/.minikube/cache/images/k8s.gcr.io/kube-proxy_v1.15.20 completed in 1.070584092s
E1113 07:40:08.993097    3062 cache_images.go:80] CacheImage k8s.gcr.io/kube-proxy:v1.15.20 -> /home/nanik/.minikube/cache/images/k8s.gcr.io/kube-proxy_v1.15.20 failed: MANIFEST_UNKNOWN: "Failed to fetch \"v1.15.20\" from request \"/v2/kube-proxy/manifests/v1.15.20\"."
I1113 07:40:08.995040    3062 cache_images.go:302] CacheImage: k8s.gcr.io/kube-controller-manager:v1.15.20 -> /home/nanik/.minikube/cache/images/k8s.gcr.io/kube-controller-manager_v1.15.20 completed in 1.072603012s
E1113 07:40:08.995121    3062 cache_images.go:80] CacheImage k8s.gcr.io/kube-controller-manager:v1.15.20 -> /home/nanik/.minikube/cache/images/k8s.gcr.io/kube-controller-manager_v1.15.20 failed: MANIFEST_UNKNOWN: "Failed to fetch \"v1.15.20\" from request \"/v2/kube-controller-manager/manifests/v1.15.20\"."
I1113 07:40:08.995544    3062 cache_images.go:302] CacheImage: k8s.gcr.io/kube-scheduler:v1.15.20 -> /home/nanik/.minikube/cache/images/k8s.gcr.io/kube-scheduler_v1.15.20 completed in 1.07308876s
E1113 07:40:08.995622    3062 cache_images.go:80] CacheImage k8s.gcr.io/kube-scheduler:v1.15.20 -> /home/nanik/.minikube/cache/images/k8s.gcr.io/kube-scheduler_v1.15.20 failed: MANIFEST_UNKNOWN: "Failed to fetch \"v1.15.20\" from request \"/v2/kube-scheduler/manifests/v1.15.20\"."
I1113 07:40:08.996216    3062 cache_images.go:302] CacheImage: k8s.gcr.io/kube-apiserver:v1.15.20 -> /home/nanik/.minikube/cache/images/k8s.gcr.io/kube-apiserver_v1.15.20 completed in 1.073756535s
E1113 07:40:08.996272    3062 cache_images.go:80] CacheImage k8s.gcr.io/kube-apiserver:v1.15.20 -> /home/nanik/.minikube/cache/images/k8s.gcr.io/kube-apiserver_v1.15.20 failed: MANIFEST_UNKNOWN: "Failed to fetch \"v1.15.20\" from request \"/v2/kube-apiserver/manifests/v1.15.20\"."
I1113 07:40:09.744798    3062 cache_images.go:350] /home/nanik/.minikube/cache/images/k8s.gcr.io/k8s-dns-sidecar-amd64_1.14.13 exists
I1113 07:40:09.745174    3062 cache_images.go:302] CacheImage: k8s.gcr.io/k8s-dns-sidecar-amd64:1.14.13 -> /home/nanik/.minikube/cache/images/k8s.gcr.io/k8s-dns-sidecar-amd64_1.14.13 completed in 1.82243236s
I1113 07:40:09.745227    3062 cache_images.go:83] CacheImage k8s.gcr.io/k8s-dns-sidecar-amd64:1.14.13 -> /home/nanik/.minikube/cache/images/k8s.gcr.io/k8s-dns-sidecar-amd64_1.14.13 succeeded
I1113 07:40:09.763462    3062 cache_images.go:350] /home/nanik/.minikube/cache/images/k8s.gcr.io/pause_3.1 exists
I1113 07:40:09.763494    3062 cache_images.go:302] CacheImage: k8s.gcr.io/pause:3.1 -> /home/nanik/.minikube/cache/images/k8s.gcr.io/pause_3.1 completed in 1.841023643s
I1113 07:40:09.763524    3062 cache_images.go:83] CacheImage k8s.gcr.io/pause:3.1 -> /home/nanik/.minikube/cache/images/k8s.gcr.io/pause_3.1 succeeded

@nanikjava
Copy link
Contributor Author

nanikjava commented Nov 12, 2019

To make things works faster inside minikube the process of downloading images is running in a separate goroutine, as it will allow the other process of initializing the VM to continue.

There is a checkpoint of checking the state of success/failure of the images inside waitCacheImages(..) at the moment failure are reported as a debug log. This could potential be used to stop minikube and inform user that there is a failure in downloading the images.

@nanikjava
Copy link
Contributor Author

nanikjava commented Nov 12, 2019

func CacheImages(images []string, cacheDir string) error {
		....
			if err := CacheImage(image, dst); err != nil {
				//glog.Errorf("CacheImage %s -> %s failed: %v", image, dst, err)
				exit.WithError("CacheImage " +  image +  " -> " +  dst, err)
				//return errors.Wrapf(err, "caching image %s", dst)
			}
			glog.Infof("CacheImage %s -> %s succeeded", image, dst)
			return nil
		})
		...
}

Doing the above create issue as the abrupt termination of the VM as it is running on
the main thread .


func waitCacheImages(g *errgroup.Group) {
	if !viper.GetBool(cacheImages) {
		return
	}
	if err := g.Wait(); err != nil {
		exit.WithError("Error caching images: ", err)
	}
}

Exiting the app during waitCacheImages(..) will only report the last recorded error
inside the CacheImages(..) goroutine

To make it useful to the user the error should print out all the different
cache images that failed to download.

Something like this:

Following files cannot be downloaded:

kube-apiserver-15.xx
kube-proxy-15.xx

@nanikjava nanikjava changed the title Behaviour upgrading cluster using wrong kubernetes-version Behaviour upgrading cluster using --kubernetes-version with version that cannot be found Nov 13, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 11, 2020
@medyagh
Copy link
Member

medyagh commented Mar 4, 2020

@nanikjava are you still interested in this issue ?

@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 3, 2020
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests

4 participants