Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] cannot access Kubernetes dashboard after upgrading to 1.17.7 #1394

Closed
przemyslavic opened this issue Jun 29, 2020 · 6 comments · Fixed by #1519
Closed

[BUG] cannot access Kubernetes dashboard after upgrading to 1.17.7 #1394

przemyslavic opened this issue Jun 29, 2020 · 6 comments · Fixed by #1519

Comments

@przemyslavic
Copy link
Collaborator

Describe the bug
From time to time on some cluster you can't get to the dashboard by running kubectl proxy.
The issue was noticed after Kubernetes upgrade to v1.17.7 on AWS RedHat environment.

To Reproduce
Steps to reproduce the behavior:

  1. Deploy AWS RHEL cluster
  2. Run kubectl proxy on master node
  3. Try to run curl -I http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/

Expected behavior
The dashboard UI is available.
HTTP status code is 200

HTTP/1.1 200 OK
Accept-Ranges: bytes
Cache-Control: no-cache, private
Cache-Control: no-store
Content-Type: text/html; charset=utf-8
Date: Mon, 29 Jun 2020 13:18:29 GMT
Last-Modified: Fri, 06 Dec 2019 15:14:02 GMT

OS (please complete the following information):

  • OS: [RHEL 7.8]

Cloud Environment (please complete the following information):

  • Cloud Provider [AWS]

Additional context
Curl command output:

HTTP/1.1 503 Service Unavailable
Cache-Control: no-cache, private
Content-Length: 71
Content-Type: text/plain; charset=utf-8
Date: Mon, 29 Jun 2020 13:47:29 GMT
X-Content-Type-Options: nosniff
curl "http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/"
Error trying to reach service: 'dial tcp 10.244.3.15:8443: i/o timeout'
[ec2-user@ec2 ~]$ kubectl logs -n=kubernetes-dashboard kubernetes-dashboard-5d996f7d46-6tthp
2020/06/29 10:00:20 Starting overwatch
2020/06/29 10:00:20 Using namespace: kubernetes-dashboard
2020/06/29 10:00:20 Using in-cluster config to connect to apiserver
2020/06/29 10:00:20 Using secret token for csrf signing
2020/06/29 10:00:20 Initializing csrf token from kubernetes-dashboard-csrf secret
2020/06/29 10:00:20 Successful initial request to the apiserver, version: v1.17.7
2020/06/29 10:00:20 Generating JWE encryption key
2020/06/29 10:00:20 New synchronizer has been registered: kubernetes-dashboard-key-holder-kubernetes-dashboard. Starting
2020/06/29 10:00:20 Starting secret synchronizer for kubernetes-dashboard-key-holder in namespace kubernetes-dashboard
2020/06/29 10:00:20 Initializing JWE encryption key from synchronized object
2020/06/29 10:00:20 Creating in-cluster Sidecar client
2020/06/29 10:00:20 Auto-generating certificates
2020/06/29 10:00:20 Successfully created certificates
2020/06/29 10:00:20 Serving securely on HTTPS port: 8443
2020/06/29 10:00:50 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
@rafzei
Copy link
Contributor

rafzei commented Jul 2, 2020

Looks like it related to < v0.6.0 version of epicli. I cannot reproduce the issue by upgrading v0.6.0 to v.0.7.0.
ENV:

  • RHEL 7.8
  • AWS

@toszo
Copy link
Contributor

toszo commented Jul 3, 2020

Looks like a non-deterministic error. It appears usually on clusters built by pipeline.

@rafzei rafzei self-assigned this Jul 3, 2020
@rafzei
Copy link
Contributor

rafzei commented Jul 9, 2020

Additional log from kubernetes-metrics-scraper:

I0707 14:37:30.561304    4967 round_trippers.go:420] GET https://10.1.2.197:6443/api/v1/namespaces/kubernetes-dashboard/pods/kubernetes-metrics-scraper-756dd959c8-xz2vt/log
I0707 14:37:30.561316    4967 round_trippers.go:427] Request Headers:
I0707 14:37:30.561321    4967 round_trippers.go:431]     User-Agent: kubectl/v1.17.7 (linux/amd64) kubernetes/b445510
I0707 14:37:30.561325    4967 round_trippers.go:431]     Accept: application/json, */*
I0707 14:37:30.564051    4967 round_trippers.go:446] Response Status: 200 OK in 2 milliseconds
{"level":"info","msg":"Kubernetes host: https://10.96.0.1:443","time":"2020-07-07T14:35:56Z"}
10.244.1.1 - - [07/Jul/2020:14:36:28 +0000] "GET / HTTP/1.1" 200 6 "" "kube-probe/1.17"
10.244.1.1 - - [07/Jul/2020:14:36:38 +0000] "GET / HTTP/1.1" 200 6 "" "kube-probe/1.17"
10.244.1.1 - - [07/Jul/2020:14:36:48 +0000] "GET / HTTP/1.1" 200 6 "" "kube-probe/1.17"
{"level":"error","msg":"Error scraping node metrics: the server could not find the requested resource (get nodes.metrics.k8s.io)","time":"2020-07-07T14:36:56Z"}

What I observe is that issue was gone after VM restart.

@mkyc mkyc modified the milestones: 0.7.1, S20200729 Jul 17, 2020
@rafzei
Copy link
Contributor

rafzei commented Jul 29, 2020

After upgrade K8s to 1.18.6 issue is still present on Flannel & Canal

@rafzei
Copy link
Contributor

rafzei commented Jul 31, 2020

I've removed dependency due to new findings on this topic. The Dashboard working well within the Pod. Looks like it's related to manage flannel.1 interface by NetworkManager. The issue seems to be reproducible right now. I'm going to test a fix for that.

@przemyslavic
Copy link
Collaborator Author

The fix seems to have resolved the issue as I am no longer able to reproduce it.

@mkyc mkyc closed this as completed Aug 6, 2020
rafzei added a commit that referenced this issue Aug 13, 2020
* Initialized test status table

* Added next sections of test status

Refactored status table a bit, added next lines, added next section with descriptions.

* Upgrade cluster section filled

* All sections filled

* Add missing tests

* Move CNS proposition design doc to GH.

* fixed formatting

* Etcd encryption feature refactor for deployment and upgrades (#1427)

* kubernetes_master: etcd encryption simplification and refactor

* upgrade: refactor of upgrade-kubeadm-config.yml (proper yaml parsing)

* upgrade: adding etcd encryption patching procedure

* upgrade-master.yml: small coding style improvement (highlight fix)

* upgrade: enabling patching of the kubeadm config

* fact naming improvements

Co-authored-by: to-bar <[email protected]>

* patch-kubeadm-config.yml: skipping unnecessary kubectl apply

Co-authored-by: to-bar <[email protected]>

* Bumping AzureCLI to fix SP secrets with special characters.

* Added Changelog entry.

* Change move to copy build dir during an upgrade (#1429)

* Change move to copy build dir during an upgrade
* Got rid of unused backup_temp_dir

* Update to logging

- log piping for stderr.
- custom colors for different log levels
- mapping some cases of log warnings and errors from Terraform and Ansible

* helm documentation #896

* Progress:

- simplified piping

* Fix K8s upgrade: 'kubeadm upgrade apply' hangs (#1431)

* Clean up and optimize K8s upgrades

* Patch only kubeadm-config ConfigMap

* Downgrade CoreDNS to K8s built-in version before 'kubeadm upgrade apply'

* Deploy customized CoreDNS after K8s is upgraded to the latest version

* Update changelog

* Wait for API resources to propagate

* Rename vendor in VSCode recommendations (#1438)

Vendor moved owner of mauve.terraform repository to HashiCorp (https://marketplace.visualstudio.com/items?itemName=HashiCorp.terraform)

* Fix issue with Vault and Kubernetes Calico/Canal communication (#1434)

* Add vault namespace and fixes related to connection issue

* Add default policy for default namespace

* Remove service endpoint, execute certificate part if enabled, setting protocol correctly in Vault Helm chart

* Add possibility to configure manually Vault endpoint

* Added changelog.

* add howto links for helm doc

* Update Changelog for #1438 (#1460)

* Update Changelog

* Update Changelog - add PR number

* bump rabbitmq version from 3.7.10 to 3.8.3 #1395

* Changes in documentation after creating fix for calico and canal (#1459)

* Changes after creating fix for calico and canal

* Update changelog

* Got rid of pipe and grep (#1472)

* Assert that current version is upgradeable #1474 (#1476)

* Assert that upgrade from current version is supported #1474

* Update core/src/epicli/data/common/ansible/playbooks/roles/upgrade/tasks/kubernetes.yml

Co-authored-by: to-bar <[email protected]>

* Add docker_version variable support (#1477)

* add docker_version variable support
* Docker installation - 2 tasks merged into 1 to speed up the deployment
* Remove two useless packages from docker installation

Co-authored-by: Grzegorz Dajuk <[email protected]>

* Kubernetes HA upgrades (#1456)

* epicli/upgrade: reusing existing shared-config + cleanups

* upgrade: k8s HA upgrades minimal implementation

* upgrade: kubernetes cleanup and refactor

* Apply suggestions from code review

Co-authored-by: to-bar <[email protected]>

* upgrade: removing unneeded kubeconfig from k8s nodes (security fix)

* upgrade: statefulset patching refactor

* upgrade: cleanups and refactor for logs

* Make deployment manifest tasks more generic

* Improve detecting CNI plugin

* AnsibleVarsGenerator.py: fixing regression issue introducted during upgrade refactor

* Apply suggestions from code review

Co-authored-by: to-bar <[email protected]>

* upgrade: statefulset patching refactor

- patching all containers (fix)
- patching init containers also (fix)
- removing include_tasks statements (speedup)

* Ensure settings for backward compatibility

* Revert "Ensure settings for backward compatibility"

This reverts commit 5c9cdb6.

* AnsibleInventoryUpgrade.py: merging shared-config with defaults

* Adding changelog entry

* Revert "AnsibleVarsGenerator.py: fixing regression issue introducted during upgrade refactor"

This reverts commit c38eb9d.

* Revert "epicli/upgrade: reusing existing shared-config + cleanups"

This reverts commit e5957c5.

* AnsibleVarsGenerator.py: adding nicer way to handle shared config

Co-authored-by: to-bar <[email protected]>

* Fix upgrade of flannel to v0.12.0 (#1484)

* Readme and changelog update (#1493)

Readme and changelog update

* Fixing broken offline CentOS 7.8 installation (#1498)

* repository: adding the missing centos-logos package

* updating 0.7.1 changelog

* repository/centos-7: restoring alphabetical order

* Add modularization-approaches.md design document

* Kibana config always points its elasticsearch.hosts to a "logging" VM (#1347) (#1483)

* Bump elliptic from 6.5.0 to 6.5.3 in /examples/keycloak/implicit/react

Bumps [elliptic](https://github.com/indutny/elliptic) from 6.5.0 to 6.5.3.
- [Release notes](https://github.com/indutny/elliptic/releases)
- [Commits](indutny/elliptic@v6.5.0...v6.5.3)

Signed-off-by: dependabot[bot] <[email protected]>

* Bump elliptic in /examples/keycloak/authorization/react

Bumps [elliptic](https://github.com/indutny/elliptic) from 6.5.0 to 6.5.3.
- [Release notes](https://github.com/indutny/elliptic/releases)
- [Commits](indutny/elliptic@v6.5.0...v6.5.3)

Signed-off-by: dependabot[bot] <[email protected]>

* Always setting hostname on all nodes of the cluster (on-prem fix) (#1509)

* common: always setting hostname on all nodes of the cluster (on-prem fix)

* updating 0.7.1 changelog

* Workarund restart rabbitmq pods during patching #1395

* add missing changelog entry

* Upgrade Kubernetes to v1.18.6 (#1501)

* Upgrade k8s-dashboard to v2.0.3 (#1516)

* fix due to review

* Dashboard unavailability, network fix for Flannel and Canal #1394 (#1519)

* additional defaults for kafka config

* fixes after review, remove redundant code

* Named demo configuration the same as generated one

* Added deletion step description

* Added a note related to versions for upgrades

* Fixed syntax errors

* Added prerequisites section in upgrade doc

* Added key encoding troubleshooting info

* Test fixes for RabbitMQ 3.8.3 (#1533)

* fix missing variable image rabbitmq

* Add Kubernetes Dashboard to COMPONENTS.md (#1546)

* Update CHANGELOG-0.7.md

Minor changes to changelog before release.

* CHANGELOG-0.7.md update v0.7.1 release date (#1552)

* Increment version string to 0.7.1 (#1554)

Co-authored-by: Mateusz Kyc <[email protected]>
Co-authored-by: Mateusz Kyc <[email protected]>
Co-authored-by: Michał Opala <[email protected]>
Co-authored-by: to-bar <[email protected]>
Co-authored-by: Luuk van Venrooij <[email protected]>
Co-authored-by: Tomasz Arendt <[email protected]>
Co-authored-by: Marcin Pyrka <[email protected]>
Co-authored-by: erzetpe <[email protected]>
Co-authored-by: Luuk van Venrooij <[email protected]>
Co-authored-by: ar3ndt <[email protected]>
Co-authored-by: Grzegorz Dajuk <[email protected]>
Co-authored-by: Grzegorz Dajuk <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: TolikT <[email protected]>
Co-authored-by: przemyslavic <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants