oc adm diagnostics: unable to parse requirement #18127

micah · 2018-01-16T16:19:14Z

Running ocm adm diagnostics produces an error about the diagnostic pod.

Version

openshift v3.7.0+7ed6862
kubernetes v1.7.6+a08f5eeb62
etcd 3.2.8

Steps To Reproduce

run oc adm diagnostics

Current Result

ERROR: [DCli2001 from diagnostic DiagnosticPod@openshift/origin/pkg/diagnostics/client/run_diagnostics_pod.go:81]
              Creating diagnostic pod with image openshift/origin-deployer:v3.7.0 failed. Error: (*errors.StatusError) unable to parse requirement: found '', expected: '='

Expected Result

No error.

Additional Information

The router deployment config also seems to exhibit this error:

[root@atomicmaster ~]# oc describe dc/router
Name:		router
Namespace:	default
Created:	5 weeks ago
Labels:		router=router
Annotations:	<none>
Latest Version:	9
Selector:	router=router
Replicas:	1
Triggers:	Config
Strategy:	Rolling
Template:
Pod Template:
  Labels:		router=router
  Service Account:	router
  Containers:
   router:
    Image:	openshift/origin-haproxy-router:v3.7.0
    Ports:	80/TCP, 443/TCP, 1936/TCP
    Requests:
      cpu:	100m
      memory:	256Mi
    Liveness:	http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Readiness:	http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      DEFAULT_CERTIFICATE_DIR:			/etc/pki/tls/private
      DEFAULT_CERTIFICATE_PATH:			/etc/pki/tls/private/tls.crt
      ROUTER_CIPHERS:				
      ROUTER_EXTERNAL_HOST_HOSTNAME:		
      ROUTER_EXTERNAL_HOST_HTTPS_VSERVER:	
      ROUTER_EXTERNAL_HOST_HTTP_VSERVER:	
      ROUTER_EXTERNAL_HOST_INSECURE:		false
      ROUTER_EXTERNAL_HOST_INTERNAL_ADDRESS:	
      ROUTER_EXTERNAL_HOST_PARTITION_PATH:	
      ROUTER_EXTERNAL_HOST_PASSWORD:		
      ROUTER_EXTERNAL_HOST_PRIVKEY:		/etc/secret-volume/router.pem
      ROUTER_EXTERNAL_HOST_USERNAME:		
      ROUTER_EXTERNAL_HOST_VXLAN_GW_CIDR:	
      ROUTER_LISTEN_ADDR:			0.0.0.0:1936
      ROUTER_METRICS_TYPE:			haproxy
      ROUTER_SERVICE_HTTPS_PORT:		443
      ROUTER_SERVICE_HTTP_PORT:			80
      ROUTER_SERVICE_NAME:			router
      ROUTER_SERVICE_NAMESPACE:			default
      ROUTER_SUBDOMAIN:				
      STATS_PASSWORD:				xxx
      STATS_PORT:				1936
      STATS_USERNAME:				admin
    Mounts:
      /etc/pki/tls/private from server-certificate (ro)
  Volumes:
   server-certificate:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	router-certs
    Optional:	false

Deployment #9 (latest):
	Created:	27 minutes ago
	Status:		New
	Replicas:	0 current / 0 desired
Deployment #8:
	Created:	43 minutes ago
	Status:		Failed
	Replicas:	0 current / 0 desired
Deployment #7:
	Created:	2 weeks ago
	Status:		Failed
	Replicas:	0 current / 0 desired

Events:
  FirstSeen	LastSeen	Count	From				SubObjectPath	Type		Reason				Message
  ---------	--------	-----	----				-------------	--------	------				-------
  43m		43m		1	deploymentconfig-controller			Normal		DeploymentCreated		Created new replication controller "router-8" for version 8
  41m		41m		1	deployer-controller				Warning		FailedRetry			Stop retrying: couldn't create deployer pod for "default/router-8": unable to parse requirement: found '', expected: '='
  27m		27m		1	deployer-controller				Normal		RolloutCancelled		Rollout for "default/router-8" cancelled
  27m		27m		5	deploymentconfig-controller			Normal		DeploymentAwaitingCancellation	Deployment of version 9 awaiting cancellation of older running deployments
  27m		27m		1	deploymentconfig-controller			Normal		DeploymentCancelled		Cancelled deployment "router-8" superceded by version 9
  27m		27m		1	deploymentconfig-controller			Normal		DeploymentCreated		Created new replication controller "router-9" for version 9
  43m		27m		26	deployer-controller				Warning		FailedCreate			Error creating deployer pod: unable to parse requirement: found '', expected: '='

The text was updated successfully, but these errors were encountered:

php-coder · 2018-01-17T11:30:59Z

Looks like it was broken by the following commit: 357071f
CC @sosiouxme

sosiouxme · 2018-01-17T16:25:15Z

@php-coder that commit was against master (3.9), so I don't think it's relevant to 3.7. Also since the router deployment is seeing the same message...

My guess would be admission config, default nodeSelector, or something like that is breaking things. @micah can you post the master-config.yaml? Or at least give some information about the server setup.

micah · 2018-01-17T18:53:28Z

@sosiouxme - sure, I'm happy to provide any details, or do any debugging that can help!

The server is an atomic centos system, at version 7.1712. I deployed the openshift-ansible advanced recipes to get things going on the system. I tried to disable the stats port for the router, according to the docs, but it did not work, and may be the cause of this issue. Because the documentation did not work, I filed an issue on that: openshift/openshift-docs#6969

Here is my master-config.yaml as requested (with ips, domains and secrets obfuscated):

admissionConfig:
  pluginConfig:
    BuildDefaults:
      configuration:
        apiVersion: v1
        env: []
        kind: BuildDefaultsConfig
        resources:
          limits: {}
          requests: {}
    BuildOverrides:
      configuration:
        apiVersion: v1
        kind: BuildOverridesConfig
    PodPreset:
      configuration:
        apiVersion: v1
        disable: false
        kind: DefaultAdmissionConfig
    openshift.io/ImagePolicy:
      configuration:
        apiVersion: v1
        executionRules:
        - matchImageAnnotations:
          - key: images.openshift.io/deny-execution
            value: 'true'
          name: execution-denied
          onResources:
          - resource: pods
          - resource: builds
          reject: true
          skipOnResolutionFailure: true
        kind: ImagePolicyConfig
aggregatorConfig:
  proxyClientInfo:
    certFile: aggregator-front-proxy.crt
    keyFile: aggregator-front-proxy.key
apiLevels:
- v1
apiVersion: v1
assetConfig:
  extensionScripts:
  - /etc/origin/master/openshift-ansible-catalog-console.js
  logoutURL: ''
  masterPublicURL: https://example.org:8443
  publicURL: https://example.org:8443/console/
  servingInfo:
    bindAddress: 0.0.0.0:8443
    bindNetwork: tcp4
    certFile: master.server.crt
    clientCA: ''
    keyFile: master.server.key
    maxRequestsInFlight: 0
    requestTimeoutSeconds: 0
authConfig:
  requestHeader:
    clientCA: front-proxy-ca.crt
    clientCommonNames:
    - aggregator-front-proxy
    extraHeaderPrefixes:
    - X-Remote-Extra-
    groupHeaders:
    - X-Remote-Group
    usernameHeaders:
    - X-Remote-User
controllerConfig:
  election:
    lockName: openshift-master-controllers
  serviceServingCert:
    signer:
      certFile: service-signer.crt
      keyFile: service-signer.key
controllers: '*'
corsAllowedOrigins:
- (?i)//127\.0\.0\.1(:|\z)
- (?i)//localhost(:|\z)
- (?i)//198\.252\.153\.254(:|\z)
- (?i)//atomicmaster\.example\.net(:|\z)
- (?i)//kubernetes\.default(:|\z)
- (?i)//kubernetes\.default\.svc\.cluster\.local(:|\z)
- (?i)//kubernetes(:|\z)
- (?i)//example\.org(:|\z)
- (?i)//openshift\.default\.svc(:|\z)
- (?i)//openshift\.default(:|\z)
- (?i)//172\.30\.0\.1(:|\z)
- (?i)//openshift\.default\.svc\.cluster\.local(:|\z)
- (?i)//kubernetes\.default\.svc(:|\z)
- (?i)//openshift(:|\z)
dnsConfig:
  bindAddress: 0.0.0.0:8053
  bindNetwork: tcp4
etcdClientInfo:
  ca: master.etcd-ca.crt
  certFile: master.etcd-client.crt
  keyFile: master.etcd-client.key
  urls:
  - https://atomicmaster.example.net:2379
etcdStorageConfig:
  kubernetesStoragePrefix: kubernetes.io
  kubernetesStorageVersion: v1
  openShiftStoragePrefix: openshift.io
  openShiftStorageVersion: v1
imageConfig:
  format: openshift/origin-${component}:${version}
  latest: false
kind: MasterConfig
kubeletClientInfo:
  ca: ca-bundle.crt
  certFile: master.kubelet-client.crt
  keyFile: master.kubelet-client.key
  port: 10250
kubernetesMasterConfig:
  apiServerArguments:
    runtime-config:
    - apis/settings.k8s.io/v1alpha1=true
    storage-backend:
    - etcd3
    storage-media-type:
    - application/vnd.kubernetes.protobuf
  controllerArguments: null
  masterCount: 1
  masterIP: x.x.x.x
  podEvictionTimeout: null
  proxyClientInfo:
    certFile: master.proxy-client.crt
    keyFile: master.proxy-client.key
  schedulerArguments: null
  schedulerConfigFile: /etc/origin/master/scheduler.json
  servicesNodePortRange: ''
  servicesSubnet: 172.30.0.0/16
  staticNodeNames: []
masterClients:
  externalKubernetesClientConnectionOverrides:
    acceptContentTypes: application/vnd.kubernetes.protobuf,application/json
    burst: 400
    contentType: application/vnd.kubernetes.protobuf
    qps: 200
  externalKubernetesKubeConfig: ''
  openshiftLoopbackClientConnectionOverrides:
    acceptContentTypes: application/vnd.kubernetes.protobuf,application/json
    burst: 600
    contentType: application/vnd.kubernetes.protobuf
    qps: 300
  openshiftLoopbackKubeConfig: openshift-master.kubeconfig
masterPublicURL: https://example.org:8443
networkConfig:
  clusterNetworkCIDR: 10.128.0.0/14
  clusterNetworks:
  - cidr: 10.128.0.0/14
    hostSubnetLength: 9
  externalIPNetworkCIDRs:
  - 0.0.0.0/0
  hostSubnetLength: 9
  networkPluginName: redhat/openshift-ovs-subnet
  serviceNetworkCIDR: 172.30.0.0/16
oauthConfig:
  assetPublicURL: https://example.org:8443/console/
  grantConfig:
    method: auto
  identityProviders:
  - challenge: true
    login: true
    mappingMethod: claim
    name: gitlab
    provider:
      apiVersion: v1
      clientID: xxx
      clientSecret: xxx
      kind: GitLabIdentityProvider
      url: https://otherexample.org
  masterCA: ca-bundle.crt
  masterPublicURL: https://example.org:8443
  masterURL: https://atomicmaster.example.net:8443
  sessionConfig:
    sessionMaxAgeSeconds: 3600
    sessionName: ssn
    sessionSecretsFile: /etc/origin/master/session-secrets.yaml
  tokenConfig:
    accessTokenMaxAgeSeconds: 86400
    authorizeTokenMaxAgeSeconds: 500
pauseControllers: false
policyConfig:
  bootstrapPolicyFile: /etc/origin/master/policy.json
  openshiftInfrastructureNamespace: openshift-infra
  openshiftSharedResourcesNamespace: openshift
projectConfig:
  defaultNodeSelector: 'region=primary'
  projectRequestMessage: ''
  projectRequestTemplate: ''
  securityAllocator:
    mcsAllocatorRange: s0:/2
    mcsLabelsPerProject: 5
    uidAllocatorRange: 1000000000-1999999999/10000
routingConfig:
  subdomain: apps.example.org
serviceAccountConfig:
  limitSecretReferences: false
  managedNames:
  - default
  - builder
  - deployer
  masterCA: ca-bundle.crt
  privateKeyFile: serviceaccounts.private.key
  publicKeyFiles:
  - serviceaccounts.public.key
servingInfo:
  bindAddress: 0.0.0.0:8443
  bindNetwork: tcp4
  certFile: master.server.crt
  clientCA: ca.crt
  keyFile: master.server.key
  maxRequestsInFlight: 500
  namedCertificates:
  - certFile: /etc/origin/master/named_certificates/fullchain.pem
    keyFile: /etc/origin/master/named_certificates/privkey.pem
    names:
    - example.org
  requestTimeoutSeconds: 3600
volumeConfig:
  dynamicProvisioningEnabled: true

sosiouxme · 2018-01-18T17:32:46Z

It looks like this error message comes from parsing a label or label selector, the combination of this and this. It looks like something was empty or simply lacked the = it was expecting, but I can't see what that would be.

@pweil- both the router pod and the diagnostic pod seem to exhibit this problem for the user, and I don't see it (naturally). Any pointers how something might get here in a broken state? The specific code hasn't been touched in years (it was @pravisankar FWIW) so I doubt it's that; more likely something is getting mangled in between the definition given above and this parsing code. Who would know?

pweil- · 2018-01-18T17:53:46Z

@mfojtik do you have any thoughts here? I have also never seen this launching a router pod.

sosiouxme · 2018-02-13T21:36:00Z

@pweil- @mfojtik any idea how this issue should be directed?

pweil- · 2018-02-14T13:55:50Z

@sosiouxme since we have no diagnosis other than it possibly being admission and cannot reproduce this I don't have a good suggestion. The fact that the error message wants an = and didn't find one makes me lean towards configuration however the run_diagnostic_pod is through code which is concerning. I do think this bug would be showing all over the place if it was an issue with parsing code (vs definition)

sosiouxme · 2018-02-14T14:18:54Z

@pweil- since it also showed up in the router pod it seems likely to be a wider issue. But obviously we're not getting other reports of this, so there's something triggering about this reported environment.

@micah by any chance do you have a reproducing environment we can poke around in to see if we can track it down? Otherwise, I think we'll need to close this as no one has any further ideas.

tnozicka · 2018-05-09T09:30:24Z

@micah could you also dump the namespace? oc get namespace <name> -o yaml There is additional annotation configuring project default node selector which might possibly cause this.

marthenlt · 2018-05-11T04:22:26Z

Yup, might worth to check openshift.io/node-selector property inside namespace.

micah · 2018-05-12T13:41:47Z

Tomáš Nožička <[email protected]> writes:

@micah could you also dump the namespace? `oc get namespace <name> -o yaml` There is additional annotation configuring project default node selector which might possibly cause this.

sorry, I've since destroyed the cluster, so I cannot gather anymore data :(

…

-- micah

openshift-bot · 2018-08-10T14:08:03Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2018-09-09T15:23:00Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2018-10-09T17:02:47Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci-robot · 2018-10-09T17:02:48Z

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

pweil- added kind/bug Categorizes issue or PR as related to a bug. priority/P2 component/diagnostics labels Jan 18, 2018

pweil- assigned sosiouxme Jan 18, 2018

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 10, 2018

openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 9, 2018

openshift-ci-robot closed this as completed Oct 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

oc adm diagnostics: unable to parse requirement #18127

oc adm diagnostics: unable to parse requirement #18127

micah commented Jan 16, 2018 •

edited

Loading

php-coder commented Jan 17, 2018

sosiouxme commented Jan 17, 2018 •

edited

Loading

micah commented Jan 17, 2018

sosiouxme commented Jan 18, 2018 •

edited

Loading

pweil- commented Jan 18, 2018

sosiouxme commented Feb 13, 2018

pweil- commented Feb 14, 2018

sosiouxme commented Feb 14, 2018

tnozicka commented May 9, 2018

marthenlt commented May 11, 2018

micah commented May 12, 2018 via email

openshift-bot commented Aug 10, 2018

openshift-bot commented Sep 9, 2018

openshift-bot commented Oct 9, 2018

openshift-ci-robot commented Oct 9, 2018

oc adm diagnostics: unable to parse requirement #18127

oc adm diagnostics: unable to parse requirement #18127

Comments

micah commented Jan 16, 2018 • edited Loading

Version

Steps To Reproduce

Current Result

Expected Result

Additional Information

php-coder commented Jan 17, 2018

sosiouxme commented Jan 17, 2018 • edited Loading

micah commented Jan 17, 2018

sosiouxme commented Jan 18, 2018 • edited Loading

pweil- commented Jan 18, 2018

sosiouxme commented Feb 13, 2018

pweil- commented Feb 14, 2018

sosiouxme commented Feb 14, 2018

tnozicka commented May 9, 2018

marthenlt commented May 11, 2018

micah commented May 12, 2018 via email

openshift-bot commented Aug 10, 2018

openshift-bot commented Sep 9, 2018

openshift-bot commented Oct 9, 2018

openshift-ci-robot commented Oct 9, 2018

micah commented Jan 16, 2018 •

edited

Loading

sosiouxme commented Jan 17, 2018 •

edited

Loading

sosiouxme commented Jan 18, 2018 •

edited

Loading