Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to access openshift master via URL #16946

Closed
Carol007robot opened this issue Oct 19, 2017 · 9 comments
Closed

Unable to access openshift master via URL #16946

Carol007robot opened this issue Oct 19, 2017 · 9 comments
Assignees
Labels
kind/question lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/P2 sig/master

Comments

@Carol007robot
Copy link

[unable to access openshift master after I reboot the clusters in a one day, one time only one machine will be restated in the cluster(include all the masters and nodes]

Version

[oc v3.6.0+c4dd4cf
kubernetes v1.6.1+5115d708d7
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ooc-lb.com.cn:8443
kubernetes v1.6.1+5115d708d7
]

####My environment
3 masters
1 load balance
3 nodes

###in kube-service-catalog project there is only one pod is running, there should also have manger-contorller pod running under this project

# oc get pods
NAME              READY     STATUS    RESTARTS   AGE
apiserver-6qbdp   1/1       Running   20         37d
[root@ooc-master01 ansible]# oc project 
Steps To Reproduce
  1. shutdown system
  2. bring back system
Current Result

When I access my master in URL I got below error I'll provide the log in the attach file.
master.log

Error

Unable to load details about the server. If the problem continues, please contact your system administrator.
Check Server Connection

Return to the console.
``

Expected Result

If any suggestion to how to fix it that would be appreciated

Additional Information

[ $ oc adm diagnostics

ERROR: [DNet3001 from diagnostic NetworkCheck@openshift/origin/pkg/diagnostics/network/setup.go:64]
       Failed to create network diags test pod and service: [Creating network diagnostic test pod 'network-diag-ns-4nz96/network-diag-test-pod-nrtx6' on node "ooc-master01.com.cn" failed: pods "network-diag-test-pod-nrtx6" is forbidden: service account network-diag-ns-4nz96/default was not found, retry after the service account is created, Creating network diagnostic test pod 'network-diag-ns-4nz96/network-diag-test-pod-mtvd3' on node "ooc-master01.com.cn" failed: pods "network-diag-test-pod-mtvd3" is forbidden: service account network-diag-ns-4nz96/default was not found, retry after the service account is created, Creating network diagnostic test pod 'network-diag-ns-61j80/network-diag-test-pod-dww53' on node "ooc-master01.com.cn" failed: pods "network-diag-test-pod-dww53" is forbidden: service account network-diag-ns-61j80/default was not found, retry after the service account is created, Creating network diagnostic test pod 'network-diag-ns-61j80/network-diag-test-pod-xfnvz' on node "ooc-master01.com.cn" failed: pods "network-diag-test-pod-xfnvz" is forbidden: service account network-diag-ns-61j80/default was not found, retry after the service account is created, Creating network diagnostic test pod 'network-diag-ns-4nz96/network-diag-test-pod-kgfrd' on node "ooc-master03.com.cn" failed: pods "network-diag-test-pod-kgfrd" is forbidden: service account network-diag-ns-4nz96/default was not found, retry after the service account is created, Creating network diagnostic test pod 'network-diag-ns-4nz96/network-diag-test-pod-p3509' on node "ooc-master03.com.cn" failed: pods "network-diag-test-pod-p3509" is forbidden: service account network-diag-ns-4nz96/default was not found, retry after the service account is created, Creating network diagnostic test pod 'network-diag-ns-61j80/network-diag-test-pod-wv46g' on node "ooc-master03.com.cn" failed: pods "network-diag-test-pod-wv46g" is forbidden: service account network-diag-ns-61j80/default was not found, retry after the service account is created, Creating network diagnostic test pod 'network-diag-ns-61j80/network-diag-test-pod-n8htz' on node "ooc-master03.com.cn" failed: pods "network-diag-test-pod-n8htz" is forbidden: service account network-diag-ns-61j80/default was not found, retry after the service account is created, Creating network diagnostic test pod 'network-diag-ns-4nz96/network-diag-test-pod-30mwd' on node "ooc-node01.com.cn" failed: pods "network-diag-test-pod-30mwd" is forbidden: service account network-diag-ns-4nz96/default was not found, retry after the service account is created, Creating network diagnostic test pod 'network-diag-ns-4nz96/network-diag-test-pod-g7p4q' on node "ooc-node01.com.cn" failed: pods "network-diag-test-pod-g7p4q" is forbidden: service account network-diag-ns-4nz96/default was not found, retry after the service account is created, Creating network diagnostic test pod 'network-diag-ns-61j80/network-diag-test-pod-fksbd' on node "ooc-node01.com.cn" failed: pods "network-diag-test-pod-fksbd" is forbidden: service account network-diag-ns-61j80/default was not found, retry after the service account is created, Creating network diagnostic test pod 'network-diag-ns-61j80/network-diag-test-pod-xvghg' on node "ooc-node01.com.cn" failed: pods "network-diag-test-pod-xvghg" is forbidden: service account network-diag-ns-61j80/default was not found, retry after the service account is created, Creating network diagnostic test pod 'network-diag-ns-4nz96/network-diag-test-pod-h7d28' on node "ooc-node02.com.cn" failed: pods "network-diag-test-pod-h7d28" is forbidden: service account network-diag-ns-4nz96/default was not found, retry after the service account is created, Creating network diagnostic test pod 'network-diag-ns-4nz96/network-diag-test-pod-g0pc4' on node "ooc-node02.com.cn" failed: pods "network-diag-test-pod-g0pc4" is forbidden: service account network-diag-ns-4nz96/default was not found, retry after the service account is created, Creating network diagnostic test pod 'network-diag-ns-61j80/network-diag-test-pod-v0c59' on node "ooc-node02.com.cn" failed: pods "network-diag-test-pod-v0c59" is forbidden: service account network-diag-ns-61j80/default was not found, retry after the service account is created, Creating network diagnostic test pod 'network-diag-ns-61j80/network-diag-test-pod-92jvb' on node "ooc-node02.com.cn" failed: pods "network-diag-test-pod-92jvb" is forbidden: service account network-diag-ns-61j80/default was not found, retry after the service account is created, Creating network diagnostic test pod 'network-diag-ns-4nz96/network-diag-test-pod-w9tgc' on node "ooc-node03.com.cn" failed: pods "network-diag-test-pod-w9tgc" is forbidden: service account network-diag-ns-4nz96/default was not found, retry after the service account is created, Creating network diagnostic test pod 'network-diag-ns-4nz96/network-diag-test-pod-t6snx' on node "ooc-node03.com.cn" failed: pods "network-diag-test-pod-t6snx" is forbidden: service account network-diag-ns-4nz96/default was not found, retry after the service account is created, Creating network diagnostic test pod 'network-diag-ns-61j80/network-diag-test-pod-ccpwv' on node "ooc-node03.com.cn" failed: pods "network-diag-test-pod-ccpwv" is forbidden: service account network-diag-ns-61j80/default was not found, retry after the service account is created, Creating network diagnostic test pod 'network-diag-ns-61j80/network-diag-test-pod-dz37v' on node "ooc-node03.com.cn" failed: pods "network-diag-test-pod-dz37v" is forbidden: service account network-diag-ns-61j80/default was not found, retry after the service account is created]
              
              ERROR: [DNet2006 from diagnostic NetworkCheck@openshift/origin/pkg/diagnostics/network/run_pod.go:136]
                     Creating network diagnostic pod "network-diag-pod-d62h5" on node "ooc-master01.com.cn" with command "openshift infra network-diagnostic-pod -l 1" failed: pods "network-diag-pod-d62h5" is forbidden: service account network-diag-ns-4nz96/default was not found, retry after the service account is created

]

I debbug for couple of days, but I really don't know how to fix it, and I can't beleive this is broken because I reboot the cluster...

@Carol007robot
Copy link
Author

I noticed the controller-manager pos is keeping restarting, and here is the log of the controller-manager pod:
I1023 09:10:42.757858 1 flags.go:52] FLAG: --address="0.0.0.0" I1023 09:10:42.758310 1 flags.go:52] FLAG: --alsologtostderr="false" I1023 09:10:42.758316 1 flags.go:52] FLAG: --api-content-type="application/json" I1023 09:10:42.758321 1 flags.go:52] FLAG: --broker-relist-interval="24h0m0s" I1023 09:10:42.758325 1 flags.go:52] FLAG: --contention-profiling="false" I1023 09:10:42.758329 1 flags.go:52] FLAG: --enable-osb-api-context-profile="true" I1023 09:10:42.758332 1 flags.go:52] FLAG: --feature-gates="" I1023 09:10:42.758337 1 flags.go:52] FLAG: --k8s-api-server-url="" I1023 09:10:42.758339 1 flags.go:52] FLAG: --k8s-kubeconfig="" I1023 09:10:42.758345 1 flags.go:52] FLAG: --leader-elect="true" I1023 09:10:42.758348 1 flags.go:52] FLAG: --leader-elect-lease-duration="15s" I1023 09:10:42.758351 1 flags.go:52] FLAG: --leader-elect-renew-deadline="10s" I1023 09:10:42.758353 1 flags.go:52] FLAG: --leader-elect-resource-lock="endpoints" I1023 09:10:42.758356 1 flags.go:52] FLAG: --leader-elect-retry-period="2s" I1023 09:10:42.758359 1 flags.go:52] FLAG: --leader-election-namespace="kube-service-catalog" I1023 09:10:42.758361 1 flags.go:52] FLAG: --log-backtrace-at=":0" I1023 09:10:42.758367 1 flags.go:52] FLAG: --log-dir="" I1023 09:10:42.758370 1 flags.go:52] FLAG: --log-flush-frequency="5s" I1023 09:10:42.758373 1 flags.go:52] FLAG: --logtostderr="true" I1023 09:10:42.758375 1 flags.go:52] FLAG: --osb-api-preferred-version="2.13" I1023 09:10:42.758378 1 flags.go:52] FLAG: --port="10000" I1023 09:10:42.758383 1 flags.go:52] FLAG: --profiling="true" I1023 09:10:42.758386 1 flags.go:52] FLAG: --reconciliation-retry-duration="168h0m0s" I1023 09:10:42.758388 1 flags.go:52] FLAG: --resync-interval="5m0s" I1023 09:10:42.758391 1 flags.go:52] FLAG: --service-catalog-api-server-url="" I1023 09:10:42.758393 1 flags.go:52] FLAG: --service-catalog-insecure-skip-verify="false" I1023 09:10:42.758397 1 flags.go:52] FLAG: --service-catalog-kubeconfig="" I1023 09:10:42.758399 1 flags.go:52] FLAG: --stderrthreshold="2" I1023 09:10:42.758405 1 flags.go:52] FLAG: --v="5" I1023 09:10:42.758408 1 flags.go:52] FLAG: --version="false" I1023 09:10:42.758410 1 flags.go:52] FLAG: --vmodule="" I1023 09:10:42.758429 1 controller_manager.go:96] Building k8s kubeconfig I1023 09:10:42.759530 1 controller_manager.go:124] Building service-catalog kubeconfig for url: I1023 09:10:42.759540 1 controller_manager.go:131] Using inClusterConfig to talk to service catalog API server -- make sure your API server is registered with the aggregator I1023 09:10:42.759621 1 controller_manager.go:144] Starting http server and mux I1023 09:10:42.759633 1 controller_manager.go:173] Creating event broadcaster I1023 09:10:42.759708 1 controller_manager.go:213] Using namespace kube-service-catalog for leader election lock I1023 09:10:42.759721 1 leaderelection.go:174] attempting to acquire leader lease... I1023 09:10:42.759801 1 healthz.go:74] Installing healthz checkers:"ping", "checkAPIAvailableResources" E1023 09:10:42.812037 1 event.go:260] Could not construct reference to: '&v1.Endpoints{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"service-catalog-controller-manager", GenerateName:"", Namespace:"kube-service-catalog", SelfLink:"/api/v1/namespaces/kube-service-catalog/endpoints/service-catalog-controller-manager", UID:"0e3ccb61-b5e2-11e7-8d47-001a4a16019e", ResourceVersion:"4691457", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{sec:63644133616, nsec:0, loc:(*time.Location)(0x1a265c0)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string{"control-plane.alpha.kubernetes.io/leader":"{\"holderIdentity\":\"controller-manager-gs044-external-service-catalog-controller\",\"leaseDurationSeconds\":15,\"acquireTime\":\"2017-10-20T22:00:19Z\",\"renewTime\":\"2017-10-23T09:10:42Z\",\"leaderTransitions\":0}"}, OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Subsets:[]v1.EndpointSubset(nil)}' due to: 'no kind is registered for the type v1.Endpoints'. Will not report event: 'Normal' 'LeaderElection' 'controller-manager-gs044-external-service-catalog-controller became leader' I1023 09:10:42.812154 1 leaderelection.go:184] successfully acquired lease kube-service-catalog/service-catalog-controller-manager I1023 09:10:42.812206 1 controller_manager.go:297] Getting available resources I1023 09:10:42.812830 1 controller_manager.go:259] Created client for API discovery F1023 09:10:43.140136 1 controller_manager.go:198] error running controllers: failed to get supported resources from server: unable to retrieve the complete list of server APIs: servicecatalog.k8s.io/v1alpha1: the server could not find the requested resource
May I get some help soon, i've been stuck in here almost two weeks, but I still find any solution. And I'm not able to access the cluster before it be fixed.

Thank you!

@seans3
Copy link

seans3 commented Oct 24, 2017

I ran into this same problem today. I believe this is caused by service catalog (servicecatalog.k8s.io) API recently changing from "v1alpha1" to "v1beta1".

@Carol007robot
Copy link
Author

Carol007robot commented Oct 25, 2017 via email

@lalib
Copy link

lalib commented Oct 26, 2017

Hey, I am having the same problem. Any updates?

@kibbles-n-bytes
Copy link

@Carolhug @lalib You should search your Service Catalog installation YAML files for servicecatalog.k8s.io/v1alpha1. If that is present, your YAMLs are stale compared to your images and you should update to the latest versions in the Service Catalog repo.

@Carol007robot
Copy link
Author

Carol007robot commented Oct 27, 2017 via email

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 24, 2018
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 26, 2018
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/P2 sig/master
Projects
None yet
Development

No branches or pull requests

8 participants