Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oc adm diagnostics: unable to parse requirement #18127

Closed
micah opened this issue Jan 16, 2018 · 15 comments
Closed

oc adm diagnostics: unable to parse requirement #18127

micah opened this issue Jan 16, 2018 · 15 comments
Assignees
Labels
component/diagnostics kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/P2

Comments

@micah
Copy link

micah commented Jan 16, 2018

Running ocm adm diagnostics produces an error about the diagnostic pod.

Version

openshift v3.7.0+7ed6862
kubernetes v1.7.6+a08f5eeb62
etcd 3.2.8

Steps To Reproduce
  1. run oc adm diagnostics
Current Result
ERROR: [DCli2001 from diagnostic DiagnosticPod@openshift/origin/pkg/diagnostics/client/run_diagnostics_pod.go:81]
              Creating diagnostic pod with image openshift/origin-deployer:v3.7.0 failed. Error: (*errors.StatusError) unable to parse requirement: found '', expected: '='
Expected Result

No error.

Additional Information

The router deployment config also seems to exhibit this error:

[root@atomicmaster ~]# oc describe dc/router
Name:		router
Namespace:	default
Created:	5 weeks ago
Labels:		router=router
Annotations:	<none>
Latest Version:	9
Selector:	router=router
Replicas:	1
Triggers:	Config
Strategy:	Rolling
Template:
Pod Template:
  Labels:		router=router
  Service Account:	router
  Containers:
   router:
    Image:	openshift/origin-haproxy-router:v3.7.0
    Ports:	80/TCP, 443/TCP, 1936/TCP
    Requests:
      cpu:	100m
      memory:	256Mi
    Liveness:	http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Readiness:	http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      DEFAULT_CERTIFICATE_DIR:			/etc/pki/tls/private
      DEFAULT_CERTIFICATE_PATH:			/etc/pki/tls/private/tls.crt
      ROUTER_CIPHERS:				
      ROUTER_EXTERNAL_HOST_HOSTNAME:		
      ROUTER_EXTERNAL_HOST_HTTPS_VSERVER:	
      ROUTER_EXTERNAL_HOST_HTTP_VSERVER:	
      ROUTER_EXTERNAL_HOST_INSECURE:		false
      ROUTER_EXTERNAL_HOST_INTERNAL_ADDRESS:	
      ROUTER_EXTERNAL_HOST_PARTITION_PATH:	
      ROUTER_EXTERNAL_HOST_PASSWORD:		
      ROUTER_EXTERNAL_HOST_PRIVKEY:		/etc/secret-volume/router.pem
      ROUTER_EXTERNAL_HOST_USERNAME:		
      ROUTER_EXTERNAL_HOST_VXLAN_GW_CIDR:	
      ROUTER_LISTEN_ADDR:			0.0.0.0:1936
      ROUTER_METRICS_TYPE:			haproxy
      ROUTER_SERVICE_HTTPS_PORT:		443
      ROUTER_SERVICE_HTTP_PORT:			80
      ROUTER_SERVICE_NAME:			router
      ROUTER_SERVICE_NAMESPACE:			default
      ROUTER_SUBDOMAIN:				
      STATS_PASSWORD:				xxx
      STATS_PORT:				1936
      STATS_USERNAME:				admin
    Mounts:
      /etc/pki/tls/private from server-certificate (ro)
  Volumes:
   server-certificate:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	router-certs
    Optional:	false

Deployment #9 (latest):
	Created:	27 minutes ago
	Status:		New
	Replicas:	0 current / 0 desired
Deployment #8:
	Created:	43 minutes ago
	Status:		Failed
	Replicas:	0 current / 0 desired
Deployment #7:
	Created:	2 weeks ago
	Status:		Failed
	Replicas:	0 current / 0 desired

Events:
  FirstSeen	LastSeen	Count	From				SubObjectPath	Type		Reason				Message
  ---------	--------	-----	----				-------------	--------	------				-------
  43m		43m		1	deploymentconfig-controller			Normal		DeploymentCreated		Created new replication controller "router-8" for version 8
  41m		41m		1	deployer-controller				Warning		FailedRetry			Stop retrying: couldn't create deployer pod for "default/router-8": unable to parse requirement: found '', expected: '='
  27m		27m		1	deployer-controller				Normal		RolloutCancelled		Rollout for "default/router-8" cancelled
  27m		27m		5	deploymentconfig-controller			Normal		DeploymentAwaitingCancellation	Deployment of version 9 awaiting cancellation of older running deployments
  27m		27m		1	deploymentconfig-controller			Normal		DeploymentCancelled		Cancelled deployment "router-8" superceded by version 9
  27m		27m		1	deploymentconfig-controller			Normal		DeploymentCreated		Created new replication controller "router-9" for version 9
  43m		27m		26	deployer-controller				Warning		FailedCreate			Error creating deployer pod: unable to parse requirement: found '', expected: '='
@php-coder
Copy link
Contributor

Looks like it was broken by the following commit: 357071f
CC @sosiouxme

@sosiouxme
Copy link
Member

sosiouxme commented Jan 17, 2018

@php-coder that commit was against master (3.9), so I don't think it's relevant to 3.7. Also since the router deployment is seeing the same message...

My guess would be admission config, default nodeSelector, or something like that is breaking things. @micah can you post the master-config.yaml? Or at least give some information about the server setup.

@micah
Copy link
Author

micah commented Jan 17, 2018

@sosiouxme - sure, I'm happy to provide any details, or do any debugging that can help!

The server is an atomic centos system, at version 7.1712. I deployed the openshift-ansible advanced recipes to get things going on the system. I tried to disable the stats port for the router, according to the docs, but it did not work, and may be the cause of this issue. Because the documentation did not work, I filed an issue on that: openshift/openshift-docs#6969

Here is my master-config.yaml as requested (with ips, domains and secrets obfuscated):

admissionConfig:
  pluginConfig:
    BuildDefaults:
      configuration:
        apiVersion: v1
        env: []
        kind: BuildDefaultsConfig
        resources:
          limits: {}
          requests: {}
    BuildOverrides:
      configuration:
        apiVersion: v1
        kind: BuildOverridesConfig
    PodPreset:
      configuration:
        apiVersion: v1
        disable: false
        kind: DefaultAdmissionConfig
    openshift.io/ImagePolicy:
      configuration:
        apiVersion: v1
        executionRules:
        - matchImageAnnotations:
          - key: images.openshift.io/deny-execution
            value: 'true'
          name: execution-denied
          onResources:
          - resource: pods
          - resource: builds
          reject: true
          skipOnResolutionFailure: true
        kind: ImagePolicyConfig
aggregatorConfig:
  proxyClientInfo:
    certFile: aggregator-front-proxy.crt
    keyFile: aggregator-front-proxy.key
apiLevels:
- v1
apiVersion: v1
assetConfig:
  extensionScripts:
  - /etc/origin/master/openshift-ansible-catalog-console.js
  logoutURL: ''
  masterPublicURL: https://example.org:8443
  publicURL: https://example.org:8443/console/
  servingInfo:
    bindAddress: 0.0.0.0:8443
    bindNetwork: tcp4
    certFile: master.server.crt
    clientCA: ''
    keyFile: master.server.key
    maxRequestsInFlight: 0
    requestTimeoutSeconds: 0
authConfig:
  requestHeader:
    clientCA: front-proxy-ca.crt
    clientCommonNames:
    - aggregator-front-proxy
    extraHeaderPrefixes:
    - X-Remote-Extra-
    groupHeaders:
    - X-Remote-Group
    usernameHeaders:
    - X-Remote-User
controllerConfig:
  election:
    lockName: openshift-master-controllers
  serviceServingCert:
    signer:
      certFile: service-signer.crt
      keyFile: service-signer.key
controllers: '*'
corsAllowedOrigins:
- (?i)//127\.0\.0\.1(:|\z)
- (?i)//localhost(:|\z)
- (?i)//198\.252\.153\.254(:|\z)
- (?i)//atomicmaster\.example\.net(:|\z)
- (?i)//kubernetes\.default(:|\z)
- (?i)//kubernetes\.default\.svc\.cluster\.local(:|\z)
- (?i)//kubernetes(:|\z)
- (?i)//example\.org(:|\z)
- (?i)//openshift\.default\.svc(:|\z)
- (?i)//openshift\.default(:|\z)
- (?i)//172\.30\.0\.1(:|\z)
- (?i)//openshift\.default\.svc\.cluster\.local(:|\z)
- (?i)//kubernetes\.default\.svc(:|\z)
- (?i)//openshift(:|\z)
dnsConfig:
  bindAddress: 0.0.0.0:8053
  bindNetwork: tcp4
etcdClientInfo:
  ca: master.etcd-ca.crt
  certFile: master.etcd-client.crt
  keyFile: master.etcd-client.key
  urls:
  - https://atomicmaster.example.net:2379
etcdStorageConfig:
  kubernetesStoragePrefix: kubernetes.io
  kubernetesStorageVersion: v1
  openShiftStoragePrefix: openshift.io
  openShiftStorageVersion: v1
imageConfig:
  format: openshift/origin-${component}:${version}
  latest: false
kind: MasterConfig
kubeletClientInfo:
  ca: ca-bundle.crt
  certFile: master.kubelet-client.crt
  keyFile: master.kubelet-client.key
  port: 10250
kubernetesMasterConfig:
  apiServerArguments:
    runtime-config:
    - apis/settings.k8s.io/v1alpha1=true
    storage-backend:
    - etcd3
    storage-media-type:
    - application/vnd.kubernetes.protobuf
  controllerArguments: null
  masterCount: 1
  masterIP: x.x.x.x
  podEvictionTimeout: null
  proxyClientInfo:
    certFile: master.proxy-client.crt
    keyFile: master.proxy-client.key
  schedulerArguments: null
  schedulerConfigFile: /etc/origin/master/scheduler.json
  servicesNodePortRange: ''
  servicesSubnet: 172.30.0.0/16
  staticNodeNames: []
masterClients:
  externalKubernetesClientConnectionOverrides:
    acceptContentTypes: application/vnd.kubernetes.protobuf,application/json
    burst: 400
    contentType: application/vnd.kubernetes.protobuf
    qps: 200
  externalKubernetesKubeConfig: ''
  openshiftLoopbackClientConnectionOverrides:
    acceptContentTypes: application/vnd.kubernetes.protobuf,application/json
    burst: 600
    contentType: application/vnd.kubernetes.protobuf
    qps: 300
  openshiftLoopbackKubeConfig: openshift-master.kubeconfig
masterPublicURL: https://example.org:8443
networkConfig:
  clusterNetworkCIDR: 10.128.0.0/14
  clusterNetworks:
  - cidr: 10.128.0.0/14
    hostSubnetLength: 9
  externalIPNetworkCIDRs:
  - 0.0.0.0/0
  hostSubnetLength: 9
  networkPluginName: redhat/openshift-ovs-subnet
  serviceNetworkCIDR: 172.30.0.0/16
oauthConfig:
  assetPublicURL: https://example.org:8443/console/
  grantConfig:
    method: auto
  identityProviders:
  - challenge: true
    login: true
    mappingMethod: claim
    name: gitlab
    provider:
      apiVersion: v1
      clientID: xxx
      clientSecret: xxx
      kind: GitLabIdentityProvider
      url: https://otherexample.org
  masterCA: ca-bundle.crt
  masterPublicURL: https://example.org:8443
  masterURL: https://atomicmaster.example.net:8443
  sessionConfig:
    sessionMaxAgeSeconds: 3600
    sessionName: ssn
    sessionSecretsFile: /etc/origin/master/session-secrets.yaml
  tokenConfig:
    accessTokenMaxAgeSeconds: 86400
    authorizeTokenMaxAgeSeconds: 500
pauseControllers: false
policyConfig:
  bootstrapPolicyFile: /etc/origin/master/policy.json
  openshiftInfrastructureNamespace: openshift-infra
  openshiftSharedResourcesNamespace: openshift
projectConfig:
  defaultNodeSelector: 'region=primary'
  projectRequestMessage: ''
  projectRequestTemplate: ''
  securityAllocator:
    mcsAllocatorRange: s0:/2
    mcsLabelsPerProject: 5
    uidAllocatorRange: 1000000000-1999999999/10000
routingConfig:
  subdomain: apps.example.org
serviceAccountConfig:
  limitSecretReferences: false
  managedNames:
  - default
  - builder
  - deployer
  masterCA: ca-bundle.crt
  privateKeyFile: serviceaccounts.private.key
  publicKeyFiles:
  - serviceaccounts.public.key
servingInfo:
  bindAddress: 0.0.0.0:8443
  bindNetwork: tcp4
  certFile: master.server.crt
  clientCA: ca.crt
  keyFile: master.server.key
  maxRequestsInFlight: 500
  namedCertificates:
  - certFile: /etc/origin/master/named_certificates/fullchain.pem
    keyFile: /etc/origin/master/named_certificates/privkey.pem
    names:
    - example.org
  requestTimeoutSeconds: 3600
volumeConfig:
  dynamicProvisioningEnabled: true

@pweil- pweil- added kind/bug Categorizes issue or PR as related to a bug. priority/P2 component/diagnostics labels Jan 18, 2018
@sosiouxme
Copy link
Member

sosiouxme commented Jan 18, 2018

It looks like this error message comes from parsing a label or label selector, the combination of this and this. It looks like something was empty or simply lacked the = it was expecting, but I can't see what that would be.

@pweil- both the router pod and the diagnostic pod seem to exhibit this problem for the user, and I don't see it (naturally). Any pointers how something might get here in a broken state? The specific code hasn't been touched in years (it was @pravisankar FWIW) so I doubt it's that; more likely something is getting mangled in between the definition given above and this parsing code. Who would know?

@pweil-
Copy link

pweil- commented Jan 18, 2018

@mfojtik do you have any thoughts here? I have also never seen this launching a router pod.

@sosiouxme
Copy link
Member

@pweil- @mfojtik any idea how this issue should be directed?

@pweil-
Copy link

pweil- commented Feb 14, 2018

@sosiouxme since we have no diagnosis other than it possibly being admission and cannot reproduce this I don't have a good suggestion. The fact that the error message wants an = and didn't find one makes me lean towards configuration however the run_diagnostic_pod is through code which is concerning. I do think this bug would be showing all over the place if it was an issue with parsing code (vs definition)

@sosiouxme
Copy link
Member

@pweil- since it also showed up in the router pod it seems likely to be a wider issue. But obviously we're not getting other reports of this, so there's something triggering about this reported environment.

@micah by any chance do you have a reproducing environment we can poke around in to see if we can track it down? Otherwise, I think we'll need to close this as no one has any further ideas.

@tnozicka
Copy link
Contributor

tnozicka commented May 9, 2018

@micah could you also dump the namespace? oc get namespace <name> -o yaml There is additional annotation configuring project default node selector which might possibly cause this.

@marthenlt
Copy link

Yup, might worth to check openshift.io/node-selector property inside namespace.

@micah
Copy link
Author

micah commented May 12, 2018 via email

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 10, 2018
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 9, 2018
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci-robot
Copy link

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/diagnostics kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/P2
Projects
None yet
Development

No branches or pull requests

8 participants