Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[🐛 Bug]: Nodes Disconnecting from Hub after AKS Deployment with Helm Chart #2065

Closed
michaelmowry opened this issue Dec 14, 2023 · 27 comments · Fixed by #2429
Closed

[🐛 Bug]: Nodes Disconnecting from Hub after AKS Deployment with Helm Chart #2065

michaelmowry opened this issue Dec 14, 2023 · 27 comments · Fixed by #2429

Comments

@michaelmowry
Copy link

What happened?

Our team has deployed Selenium Grid to AKS using the helm templates in the repository. Our problem is that the nodes connect to the hub very briefly and are visible in the UI and then disappear and do not show up again. In the logs below we can see that the registration event between the node and hub is not successful. We are attempting to use a basic hub/node architecture with isolateComponents=false. We have disabled ingress and basic auth and are using istio. We are able to access the Selenium Grid UI on the Hub and we are able to queue tests but they timeout as no nodes are available for processing. Thanks in advance for any help on resolving this.

Command used to start Selenium Grid with Docker (or Kubernetes)

global:
  seleniumGrid:
    # Image tag for all selenium components
    imageTag: 4.14.1-20231025
    #imageTag: latest
    # Image tag for browser's nodes
    nodesImageTag: 4.14.1-20231025
    #nodesImageTag: latest
    # Pull secret for all components, can be overridden individually
    imagePullSecret: secret
 
# Basic auth settings for Selenium Grid
basicAuth:
      # Enable or disable basic auth
      enabled: false
      # Username for basic auth
      username: admin
      # Password for basic auth
      password: admin    
 
# Deploy Router, Distributor, EventBus, SessionMap and Nodes separately
isolateComponents: false
 
# Service Account for all components
serviceAccount:
  create: true
  name: ""
  annotations: {}
  #  eks.amazonaws.com/role-arn: "arn:aws:iam::12345678:role/video-bucket-permissions"
istio:
  # enable flags can be used to turn on or off specific istio features
  flags:
    virtualServiceEnabled: true
  # istioVirtualService
  virtualService:
    namespace: seleniumgridpoc
    gateways:
      - seleniumgridpoc-ig
    match:
      - appEndpoints:
          - /
        destinations:
          - portNumber: 4444
            host: selenium-hub
    appTopLevelDomains:
      - seleniumgrid-sbx.company.com
 
# Configure the ingress resource to access the Grid installation.
ingress:
  # Enable or disable ingress resource
  enabled: false
  # Name of ingress class to select which controller will implement ingress resource
  className: ""
  # Custom annotations for ingress resource
  annotations: {}
  # Default host for the ingress resource
  #hostname: selenium-grid.local
  #hostname: seleniumgrid-sbx.company.com
  hostname: seleniumgrid-sbx.company.com
  # Default host path for the ingress resource
  path: /
  # TLS backend configuration for ingress resource
  tls: []
 
# ConfigMap that contains SE_EVENT_BUS_HOST, SE_EVENT_BUS_PUBLISH_PORT and SE_EVENT_BUS_SUBSCRIBE_PORT variables
busConfigMap:
  # Name of the configmap
  name: selenium-event-bus-config
  # Custom annotations for configmap
  annotations: {}
 
# ConfigMap that contains common environment variables for browser nodes
nodeConfigMap:
  name: selenium-node-config
  # Custom annotations for configmap
  annotations: {}
 
# Configuration for isolated components (applied only if `isolateComponents: true`)
components:
 
  # Configuration for router component
  router:
    # Router image name
    imageName: company-seleniumgrid-docker-virtual.jfrog.io/selenium/router
    # Router image tag (this overwrites global.seleniumGrid.imageTag parameter)
    # imageTag: 4.14.1-20231025
 
    # Image pull policy (see https://kubernetes.io/docs/concepts/containers/images/#updating-images)
    imagePullPolicy: IfNotPresent
    # Image pull secret (see https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/)
    imagePullSecret: ""
 
    # Custom annotations for router pods
    annotations: {}
    # Router port
    port: 4444
    # Liveness probe settings
    livenessProbe:
      enabled: true
      path: /readyz
      initialDelaySeconds: 10
      failureThreshold: 10
      timeoutSeconds: 10
      periodSeconds: 10
      successThreshold: 1
    # Readiness probe settings
    readinessProbe:
      enabled: true
      path: /readyz
      initialDelaySeconds: 12
      failureThreshold: 10
      timeoutSeconds: 10
      periodSeconds: 10
      successThreshold: 1
    # Resources for router container
    resources: {}
    # SecurityContext for router container
    securityContext: {}
    # Kubernetes service type (see https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services-service-types)
    serviceType: ClusterIP
    # Set specific loadBalancerIP when serviceType is LoadBalancer (see https://kubernetes.io/docs/concepts/services-networking/service/#loadbalancer)
    loadBalancerIP: ""
    # Custom annotations for router service
    serviceAnnotations: {}
    # Tolerations for router pods
    tolerations: []
    # Node selector for router pods
    nodeSelector: {}
    # Priority class name for router pods
    priorityClassName: ""
 
  # Configuration for distributor component
  distributor:
    # Distributor image name
    imageName: company-seleniumgrid-docker-virtual.jfrog.io/selenium/distributor
    # Distributor image tag (this overwrites global.seleniumGrid.imageTag parameter)
    # imageTag: 4.14.1-20231025
 
    # Image pull policy (see https://kubernetes.io/docs/concepts/containers/images/#updating-images)
    imagePullPolicy: IfNotPresent
    # Image pull secret (see https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/)
    imagePullSecret: ""
 
    # Custom annotations for Distributor pods
    annotations: {}
    # Distributor port
    port: 5553
    # Resources for Distributor container
    resources: {}
    # SecurityContext for Distributor container
    securityContext: {}
    # Kubernetes service type (see https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services-service-types)
    serviceType: ClusterIP
    # Custom annotations for Distributor service
    serviceAnnotations: {}
    # Tolerations for Distributor pods
    tolerations: []
    # Node selector for Distributor pods
    nodeSelector: {}
    # Priority class name for Distributor pods
    priorityClassName: ""
 
  # Configuration for Event Bus component
  eventBus:
    # Event Bus image name
    imageName: company-seleniumgrid-docker-virtual.jfrog.io/selenium/event-bus
    # Event Bus image tag (this overwrites global.seleniumGrid.imageTag parameter)
    # imageTag: 4.14.1-20231025
 
    # Image pull policy (see https://kubernetes.io/docs/concepts/containers/images/#updating-images)
    imagePullPolicy: IfNotPresent
    # Image pull secret (see https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/)
    imagePullSecret: ""
 
    # Custom annotations for Event Bus pods
    annotations: {}
    # Event Bus port
    port: 5557
    # Port where events are published
    publishPort: 4442
    # Port where to subscribe for events
    subscribePort: 4443
    # Resources for event-bus container
    resources: {}
    # SecurityContext for event-bus container
    securityContext: {}
    # Kubernetes service type (see https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services-service-types)
    serviceType: ClusterIP
    # Custom annotations for Event Bus service
    serviceAnnotations: {}
    # Tolerations for Event Bus pods
    tolerations: []
    # Node selector for Event Bus pods
    nodeSelector: {}
    # Priority class name for Event Bus pods
    priorityClassName: ""
 
  # Configuration for Session Map component
  sessionMap:
    # Session Map image name
    imageName: company-seleniumgrid-docker-virtual.jfrog.io/selenium/sessions
    # Session Map image tag (this overwrites global.seleniumGrid.imageTag parameter)
    # imageTag: 4.14.1-20231025
 
    # Image pull policy (see https://kubernetes.io/docs/concepts/containers/images/#updating-images)
    imagePullPolicy: IfNotPresent
    # Image pull secret (see https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/)
    imagePullSecret: ""
 
    # Custom annotations for Session Map pods
    annotations: {}
    port: 5556
    # Resources for Session Map container
    resources: {}
    # SecurityContext for Session Map container
    securityContext: {}
    # Kubernetes service type (see https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services-service-types)
    serviceType: ClusterIP
    # Custom annotations for Session Map service
    serviceAnnotations: {}
    # Tolerations for Session Map pods
    tolerations: []
    # Node selector for Session Map pods
    nodeSelector: {}
    # Priority class name for Session Map pods
    priorityClassName: ""
 
  # Configuration for Session Queue component
  sessionQueue:
    # Session Queue image name
    imageName: company-seleniumgrid-docker-virtual.jfrog.io/selenium/session-queue
    # Session Queue image tag (this overwrites global.seleniumGrid.imageTag parameter)
    # imageTag: 4.14.1-20231025
 
    # Image pull policy (see https://kubernetes.io/docs/concepts/containers/images/#updating-images)
    imagePullPolicy: IfNotPresent
    # Image pull secret (see https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/)
    imagePullSecret: ""
 
    # Custom annotations for Session Queue pods
    annotations: {}
    port: 5559
    # Resources for Session Queue container
    resources: {}
    # SecurityContext for Session Queue container
    securityContext: {}
    # Kubernetes service type (see https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services-service-types)
    serviceType: ClusterIP
    # Custom annotations for Session Queue service
    serviceAnnotations: {}
    # Tolerations for Session Queue pods
    tolerations: []
    # Node selector for Session Queue pods
    nodeSelector: {}
    # Priority class name for Session Queue pods
    priorityClassName: ""
 
  # Custom sub path for all components
  subPath: /
 
  # Custom environment variables for all components
  extraEnvironmentVariables:
    # - name: SE_JAVA_OPTS
    #   value: "-Xmx512m"
    # - name:
    #   valueFrom:
    #     secretKeyRef:
    #       name: secret-name
    #       key: secret-key
 
  # Custom environment variables by sourcing entire configMap, Secret, etc. for all components
  extraEnvFrom:
    # - configMapRef:
    #   name: proxy-settings
    # - secretRef:
    #   name: mysecret
 
# Configuration for selenium hub deployment (applied only if `isolateComponents: false`)
hub:
  # Selenium Hub image name
  imageName: company-seleniumgrid-docker-virtual.jfrog.io/selenium/hub
  # Selenium Hub image tag (this overwrites global.seleniumGrid.imageTag parameter)
  # imageTag: 4.14.1-20231025
  # Image pull policy (see https://kubernetes.io/docs/concepts/containers/images/#updating-images)
  imagePullPolicy: IfNotPresent
  # Image pull secret (see https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/)
  imagePullSecret: ""
 
  # Custom annotations for Selenium Hub pods
  annotations: {}
  # Custom labels for Selenium Hub pods
  labels: {}
  # Port where events are published
  publishPort: 4442
  # Port where to subscribe for events
  subscribePort: 4443
  # Selenium Hub port
  port: 4444
  # Liveness probe settings
  livenessProbe:
    enabled: true
    path: /readyz
    initialDelaySeconds: 10
    failureThreshold: 10
    timeoutSeconds: 10
    periodSeconds: 10
    successThreshold: 1
  # Readiness probe settings
  readinessProbe:
    enabled: true
    path: /readyz
    initialDelaySeconds: 12
    failureThreshold: 10
    timeoutSeconds: 10
    periodSeconds: 10
    successThreshold: 1
  # Custom sub path for the hub deployment
  subPath: /
  # Custom environment variables for selenium-hub
  extraEnvironmentVariables:
    # - name: SE_JAVA_OPTS
    #   value: "-Xmx512m"
    # - name: SECRET_VARIABLE
    #   valueFrom:
    #     secretKeyRef:
    #       name: secret-name
    #       key: secret-key
  # Custom environment variables by sourcing entire configMap, Secret, etc. for selenium-hub
  extraEnvFrom:
    # - configMapRef:
    #   name: proxy-settings
    # - secretRef:
    #   name: mysecret
  extraVolumeMounts: []
  # - name: my-extra-volume
  #   mountPath: /home/seluser/Downloads
 
  extraVolumes: []
  # - name: my-extra-volume
  #   emptyDir: {}
  # - name: my-extra-volume-from-pvc
  #   persistentVolumeClaim:
  #     claimName: my-pv-claim
  # Resources for selenium-hub container
  resources: {}
  # SecurityContext for selenium-hub container
  securityContext: {}
  # Kubernetes service type (see https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services-service-types)
  serviceType: ClusterIP
  # Set specific loadBalancerIP when serviceType is LoadBalancer (see https://kubernetes.io/docs/concepts/services-networking/service/#loadbalancer)
  loadBalancerIP: ""
  # Custom annotations for Selenium Hub service
  serviceAnnotations: {}
  # Tolerations for selenium-hub pods
  tolerations: []
  # Node selector for selenium-hub pods
  nodeSelector: {}
  # Priority class name for selenium-hub pods
  priorityClassName: ""
 
# Keda scaled object configuration
autoscaling:
  # Enable autoscaling. Implies installing KEDA
  enabled: false
  # Enable autoscaling without automatically installing KEDA
  enableWithExistingKEDA: false
  # Which typ of KEDA scaling to use: job or deployment
  scalingType: job
  # Annotations for KEDA resources: ScaledObject and ScaledJob
  annotations:
    helm.sh/hook: post-install,post-upgrade
  # Options for KEDA ScaledJobs
  scaledJobOptions:
    pollingInterval: 10
    scalingStrategy:
      strategy: accurate
  deregisterLifecycle:
    preStop:
      exec:
        command:
          - bash
          - -c
          - |
            curl -X POST 127.0.0.1:5555/se/grid/node/drain --header 'X-REGISTRATION-SECRET;' && \
            while curl 127.0.0.1:5555/status; do sleep 1; done;
 
# Configuration for chrome nodes
chromeNode:
  # Enable chrome nodes
  enabled: true
 
  # NOTE: Only used when autoscaling.enabled is false
  # Enable creation of Deployment
  # true (default) - if you want long living pods
  # false - for provisioning your own custom type such as Jobs
  deploymentEnabled: true
 
  # Number of chrome nodes
  replicas: 1
  # Image of chrome nodes
  imageName: company-seleniumgrid-docker-virtual.jfrog.io/selenium/node-chrome
  # Image of chrome nodes (this overwrites global.seleniumGrid.nodesImageTag)
  # imageTag: 4.14.1-20231025
  # Image pull policy (see https://kubernetes.io/docs/concepts/containers/images/#updating-images)
  imagePullPolicy: IfNotPresent
  # Image pull secret (see https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/)
  imagePullSecret: ""
 
  # Port list to enable on container
  ports:
    - 4442
    - 4443
  # Selenium port (spec.ports[0].targetPort in kubernetes service)
  seleniumPort: 5900
  # Selenium port exposed in service (spec.ports[0].port in kubernetes service)
  seleniumServicePort: 6900
  # Annotations for chrome-node pods
  annotations: {}
  # Labels for chrome-node pods
  labels: {}
  # Resources for chrome-node container
  resources:
    requests:
      memory: "1Gi"
      cpu: "1"
    limits:
      memory: "1Gi"
      cpu: "1"
  # SecurityContext for chrome-node container
  securityContext: {}
  # Tolerations for chrome-node pods
  tolerations: []
  # Node selector for chrome-node pods
  nodeSelector: {}
  # Custom host aliases for chrome nodes
  hostAliases:
    # - ip: "198.51.100.0"
    #   hostnames:
    #     - "example.com"
    #     - "example.net"
    # - ip: "203.0.113.0"
    #   hostnames:
    #     - "example.org"
  # Custom environment variables for chrome nodes
  extraEnvironmentVariables:
    # - name: SE_JAVA_OPTS
    #   value: "-Xmx512m"
    # - name:
    #   valueFrom:
    #     secretKeyRef:
    #       name: secret-name
    #       key: secret-key
  # Custom environment variables by sourcing entire configMap, Secret, etc. for chrome nodes
  extraEnvFrom:
    # - configMapRef:
    #   name: proxy-settings
    # - secretRef:
    #   name: mysecret
  # Service configuration
  service:
    # Create a service for node
    enabled: true
    # Service type
    type: ClusterIP
    # Custom annotations for service
    annotations: {}
  # Size limit for DSH volume mounted in container (if not set, default is "1Gi")
  dshmVolumeSizeLimit: 1Gi
  # Priority class name for chrome-node pods
  priorityClassName: ""
 
  # Wait for pod startup
  startupProbe: {}
    # httpGet:
    #   path: /status
    #   port: 5555
    # failureThreshold: 120
    # periodSeconds: 5
 
  # Liveness probe settings
  livenessProbe: {}
 
  # Time to wait for pod termination
  terminationGracePeriodSeconds: 30
  lifecycle: {}
  extraVolumeMounts: []
  # - name: my-extra-volume
  #   mountPath: /home/seluser/Downloads
 
  extraVolumes: []
  # - name: my-extra-volume
  #   emptyDir: {}
  # - name: my-extra-volume-from-pvc
  #   persistentVolumeClaim:
  #     claimName: my-pv-claim
 
  maxReplicaCount: 8
  minReplicaCount: 1
  hpa:
    url: '{{ include "seleniumGrid.graphqlURL" . }}'
    browserName: chrome
    # browserVersion: '91.0' # Optional. Only required when supporting multiple versions of browser in your Selenium Grid.
    unsafeSsl : 'true' # Optional
 
  # It is used to add a sidecars proxy in the same pod of the browser node.
  # It means it will add a new container to the deployment itself.
  # It should be set using the --set-json option
  sidecars: []
 
# Configuration for firefox nodes
firefoxNode:
  # Enable firefox nodes
  enabled: true
 
  # NOTE: Only used when autoscaling.enabled is false
  # Enable creation of Deployment
  # true (default) - if you want long living pods
  # false - for provisioning your own custom type such as Jobs
  deploymentEnabled: true
 
  # Number of firefox nodes
  replicas: 1
  # Image of firefox nodes
  imageName: company-seleniumgrid-docker-virtual.jfrog.io/selenium/node-firefox
  # Image of firefox nodes (this overwrites global.seleniumGrid.nodesImageTag)
  # imageTag: 4.14.1-20231025
  # Image pull policy (see https://kubernetes.io/docs/concepts/containers/images/#updating-images)
  imagePullPolicy: IfNotPresent
  # Image pull secret (see https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/)
  imagePullSecret: ""
 
  # Port list to enable on container
  ports:
    - 5555
  # Selenium port (spec.ports[0].targetPort in kubernetes service)
  seleniumPort: 5900
  # Selenium port exposed in service (spec.ports[0].port in kubernetes service)
  seleniumServicePort: 6900
  # Annotations for firefox-node pods
  annotations: {}
  # Labels for firefox-node pods
  labels: {}
  # Tolerations for firefox-node pods
  tolerations: []
  # Node selector for firefox-node pods
  nodeSelector: {}
  # Resources for firefox-node container
  resources:
    requests:
      memory: "1Gi"
      cpu: "1"
    limits:
      memory: "1Gi"
      cpu: "1"
  # SecurityContext for firefox-node container
  securityContext: {}
  # Custom host aliases for firefox nodes
  hostAliases:
    # - ip: "198.51.100.0"
    #   hostnames:
    #     - "example.com"
    #     - "example.net"
    # - ip: "203.0.113.0"
    #   hostnames:
    #     - "example.org"
  # Custom environment variables for firefox nodes
  extraEnvironmentVariables:
    # - name: SE_JAVA_OPTS
    #   value: "-Xmx512m"
    # - name:
    #   valueFrom:
    #     secretKeyRef:
    #       name: secret-name
    #       key: secret-key
  # Custom environment variables by sourcing entire configMap, Secret, etc. for firefox nodes
  extraEnvFrom:
    # - configMapRef:
    #   name: proxy-settings
    # - secretRef:
    #   name: mysecret
  # Service configuration
  service:
    # Create a service for node
    enabled: true
    # Service type
    type: ClusterIP
    # Custom annotations for service
    annotations: {}
  # Size limit for DSH volume mounted in container (if not set, default is "1Gi")
  dshmVolumeSizeLimit: 1Gi
  # Priority class name for firefox-node pods
  priorityClassName: ""
 
  # Wait for pod startup
  startupProbe: {}
    # httpGet:
    #   path: /status
    #   port: 5555
    # failureThreshold: 120
    # periodSeconds: 5
 
  # Liveness probe settings
  livenessProbe: {}
 
  # Time to wait for pod termination
  terminationGracePeriodSeconds: 30
  lifecycle: {}
  extraVolumeMounts: []
  # - name: my-extra-volume
  #   mountPath: /home/seluser/Downloads
 
  extraVolumes: []
  # - name: my-extra-volume
  #   emptyDir: {}
  # - name: my-extra-volume-from-pvc
  #   persistentVolumeClaim:
  #     claimName: my-pv-claim
  maxReplicaCount: 8
  minReplicaCount: 1
  hpa:
    url: '{{ include "seleniumGrid.graphqlURL" . }}'
    browserName: firefox
 
  # It is used to add a sidecars proxy in the same pod of the browser node.
  # It means it will add a new container to the deployment itself.
  # It should be set using the --set-json option
  sidecars: []
 
# Configuration for edge nodes
edgeNode:
  # Enable edge nodes
  enabled: true
 
  # NOTE: Only used when autoscaling.enabled is false
  # Enable creation of Deployment
  # true (default) - if you want long living pods
  # false - for provisioning your own custom type such as Jobs
  deploymentEnabled: true
 
  # Number of edge nodes
  replicas: 1
  # Image of edge nodes
  imageName: company-seleniumgrid-docker-virtual.jfrog.io/selenium/node-edge
  # Image of edge nodes (this overwrites global.seleniumGrid.nodesImageTag)
  # imageTag: 4.14.1-20231025
  # Image pull policy (see https://kubernetes.io/docs/concepts/containers/images/#updating-images)
  imagePullPolicy: IfNotPresent
  # Image pull secret (see https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/)
  imagePullSecret: ""
 
  ports:
    - 5555
  # Selenium port (spec.ports[0].targetPort in kubernetes service)
  seleniumPort: 5900
  # Selenium port exposed in service (spec.ports[0].port in kubernetes service)
  seleniumServicePort: 6900
  # Annotations for edge-node pods
  annotations: {}
  # Labels for edge-node pods
  labels: {}
  # Tolerations for edge-node pods
  tolerations: []
  # Node selector for edge-node pods
  nodeSelector: {}
  # Resources for edge-node container
  resources:
    requests:
      memory: "1Gi"
      cpu: "1"
    limits:
      memory: "1Gi"
      cpu: "1"
  # SecurityContext for edge-node container
  securityContext: {}
  # Custom host aliases for edge nodes
  hostAliases:
    # - ip: "198.51.100.0"
    #   hostnames:
    #     - "example.com"
    #     - "example.net"
    # - ip: "203.0.113.0"
    #   hostnames:
    #     - "example.org"
  # Custom environment variables for edge nodes
  extraEnvironmentVariables:
    # - name: SE_JAVA_OPTS
    #   value: "-Xmx512m"
    # - name:
    #   valueFrom:
    #     secretKeyRef:
    #       name: secret-name
    #       key: secret-key
  # Custom environment variables by sourcing entire configMap, Secret, etc. for edge nodes
  extraEnvFrom:
    # - configMapRef:
    #   name: proxy-settings
    # - secretRef:
    #   name: mysecret
  # Service configuration
  service:
    # Create a service for node
    enabled: true
    # Service type
    type: ClusterIP
    # Custom annotations for service
    annotations:
      hello: world
  # Size limit for DSH volume mounted in container (if not set, default is "1Gi")
  dshmVolumeSizeLimit: 1Gi
  # Priority class name for edge-node pods
  priorityClassName: ""
 
  # Wait for pod startup
  startupProbe: {}
    # httpGet:
    #   path: /status
    #   port: 5555
    # failureThreshold: 120
    # periodSeconds: 5
 
  # Liveness probe settings
  livenessProbe: {}
 
  # Time to wait for pod termination
  terminationGracePeriodSeconds: 30
  lifecycle: {}
  extraVolumeMounts: []
  # - name: my-extra-volume
  #   mountPath: /home/seluser/Downloads
 
  extraVolumes: []
  # - name: my-extra-volume
  #   emptyDir: {}
  # - name: my-extra-volume-from-pvc
  #   persistentVolumeClaim:
  #     claimName: my-pv-claim
  maxReplicaCount: 8
  minReplicaCount: 1
  hpa:
    url: '{{ include "seleniumGrid.graphqlURL" . }}'
    browserName: MicrosoftEdge
    sessionBrowserName: 'msedge'
 
  # It is used to add a sidecars proxy in the same pod of the browser node.
  # It means it will add a new container to the deployment itself.
  # It should be set using the --set-json option
  sidecars: []
 
videoRecorder:
  enabled: false
  # Image of video recorder
  imageName: company-seleniumgrid-docker-virtual.jfrog.io/selenium/video
  # Image of video recorder
  imageTag: ffmpeg-6.0-20231025
  # Image pull policy (see https://kubernetes.io/docs/concepts/containers/images/#updating-images)
  imagePullPolicy: IfNotPresent
  # Image pull secret (see https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/)
 
  # What uploader to use. See .videRecorder.s3 for how to create a new one.
  # uploader: s3
  uploader: false
  # Where to upload the video file. Should be set to something like 's3://myvideobucket/'
  uploadDestinationPrefix: false
 
  ports:
  - 9000
  resources:
    requests:
      memory: "1Gi"
      cpu: "1"
    limits:
      memory: "1Gi"
      cpu: "1"
  extraEnvironmentVariables:
  # - name: SE_VIDEO_FOLDER
  #   value: /videos
  # Custom environment variables by sourcing entire configMap, Secret, etc. for video recorder.
  extraEnvFrom:
  # - configMapRef:
  #   name: proxy-settings
  # - secretRef:
  #   name: mysecret
  # Wait for pod startup
  terminationGracePeriodSeconds: 30
 
  # Wait for pod startup
  startupProbe: {}
  #   httpGet:
  #     path: /
  #     port: 9000
  #   failureThreshold: 120
  # periodSeconds: 5
 
  # Liveness probe settings
  livenessProbe: {}
 
  volume:
  # name:
  #   folder: video
  #   scripts: video-scripts
  # Custom video recorder back-end scripts (video.sh, video_ready.py, etc.) further by ConfigMap.
  # NOTE: For the mount point with the name "video", or "video-scripts", it will override the default. For other names, it will be appended.
  extraVolumeMounts: []
  # - name: video-scripts
  #   mountPath: /opt/bin/video.sh
  #   subPath: custom_video.sh
  # - name: video-scripts
  #   mountPath: /opt/bin/video_ready.py
  #   subPath: video_ready.py
 
  extraVolumes: []
  # - name: video-scripts
  #   configMap:
  #     name: my-video-scripts-cm
  #     defaultMode: 0500
  # - name: video
  #   persistentVolumeClaim:
  #     claimName: video-pv-claim
 
  # Container spec for the uploader if above it is defined as "uploader: s3"
  s3:
    imageName: public.ecr.aws/bitnami/aws-cli
    imageTag: "2"
    imagePullPolicy: IfNotPresent
    securityContext:
      runAsUser: 0
    command:
    - /bin/sh
    args:
    - -c
    - |
      while ! [ -p /videos/uploadpipe ]
      do
          echo Waiting for /videos/uploadpipe to be created
          sleep 1
      done
      echo Waiting for files to upload
      while read FILE DESTINATION < /videos/uploadpipe
      do
          if [ "$FILE" = "exit" ]
          then
              break
          else
              aws s3 cp --no-progress $FILE $DESTINATION
          fi
      done
    extraEnvironmentVariables:
    # - name: AWS_ACCESS_KEY_ID
    #   value: aws_access_key_id
    # - name: AWS_SECRET_ACCESS_KEY
    #   value: aws_secret_access_key
    # - name:
    #   valueFrom:
    #     secretKeyRef:
    #       name: secret-name
    #       key: secret-key
 
# Custom labels for k8s resources
customLabels: {}

Relevant log output

Logs from the chrome node that is not able to register with the hub:

2023-12-14 10:25:15,457 INFO Included extra file "/etc/supervisor/conf.d/selenium.conf" during parsing
2023-12-14 10:25:15,460 INFO RPC interface 'supervisor' initialized
2023-12-14 10:25:15,460 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2023-12-14 10:25:15,460 INFO supervisord started with pid 8
2023-12-14 10:25:16,462 INFO spawned: 'xvfb' with pid 10
2023-12-14 10:25:16,464 INFO spawned: 'vnc' with pid 11
2023-12-14 10:25:16,465 INFO spawned: 'novnc' with pid 12
2023-12-14 10:25:16,467 INFO spawned: 'selenium-node' with pid 13
2023-12-14 10:25:16,484 INFO success: selenium-node entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
Generating Selenium Config
Configuring server...
Setting up SE_NODE_HOST...
Setting up SE_NODE_PORT...
2023-12-14 10:25:17,538 INFO success: xvfb entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-12-14 10:25:17,538 INFO success: vnc entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-12-14 10:25:17,538 INFO success: novnc entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Tracing is disabled
Selenium Grid Node configuration:
[events]
publish = "tcp://selenium-hub:4442"
subscribe = "tcp://selenium-hub:4443"
 
[node]
grid-url = http://selenium-hub.seleniumgridpoc:4444
session-timeout = "300"
override-max-sessions = false
detect-drivers = false
drain-after-session-count = 0
max-sessions = 1
 
[[node.driver-configuration]]
display-name = "chrome"
stereotype = '{"browserName": "chrome", "browserVersion": "118.0", "platformName": "Linux"}'
max-sessions = 1
 
Starting Selenium Grid Node...
Dec 14, 2023 10:25:17 AM org.openqa.selenium.grid.Bootstrap createExtendedClassLoader
WARNING: Extension file or directory does not exist: /opt/selenium/selenium-http-jdk-client.jar
10:25:18.527 INFO [LoggingOptions.configureLogEncoding] - Using the system default encoding
10:25:18.538 INFO [OpenTelemetryTracer.createTracer] - Using OpenTelemetry for tracing
10:25:18.935 INFO [UnboundZmqEventBus.<init>] - Connecting to tcp://selenium-hub:4442 and tcp://selenium-hub:4443
10:25:19.133 INFO [UnboundZmqEventBus.<init>] - Sockets created
10:25:20.136 INFO [UnboundZmqEventBus.<init>] - Event bus ready
10:25:20.320 INFO [NodeServer.createHandlers] - Reporting self as: http://10.244.4.8:5555
10:25:20.339 INFO [NodeOptions.getSessionFactories] - Detected 1 available processors
10:25:20.437 INFO [NodeOptions.report] - Adding chrome for {"browserName": "chrome","browserVersion": "118.0","platformName": "linux","se:noVncPort": 7900,"se:vncEnabled": true} 1 times
10:25:20.452 INFO [Node.<init>] - Binding additional locator mechanisms: relative
10:25:20.756 INFO [NodeServer$1.start] - Starting registration process for Node http://10.244.4.8:5555
10:25:20.758 INFO [NodeServer.execute] - Started Selenium node 4.14.1 (revision 03f8ede370): http://10.244.4.8:5555
10:25:20.777 INFO [NodeServer$1.lambda$start$1] - Sending registration event...
10:25:30.781 INFO [NodeServer$1.lambda$start$1] - Sending registration event...
10:25:40.782 INFO [NodeServer$1.lambda$start$1] - Sending registration event...
10:25:50.785 INFO [NodeServer$1.lambda$start$1] - Sending registration event...
10:26:00.787 INFO [NodeServer$1.lambda$start$1] - Sending registration event...
10:26:10.789 INFO [NodeServer$1.lambda$start$1] - Sending registration event...
10:26:20.794 INFO [NodeServer$1.lambda$start$1] - Sending registration event...
10:26:30.796 INFO [NodeServer$1.lambda$start$1] - Sending registration event...
10:26:40.798 INFO [NodeServer$1.lambda$start$1] - Sending registration event...
10:26:50.800 INFO [NodeServer$1.lambda$start$1] - Sending registration event...
10:27:00.802 INFO [NodeServer$1.lambda$start$1] - Sending registration event...
10:27:10.804 INFO [NodeServer$1.lambda$start$1] - Sending registration event...
10:27:20.761 INFO [NodeServer$1.lambda$start$1] - Sending registration event...

Operating System

AKS

Docker Selenium version (image tag)

4.14.1-20231025

Selenium Grid chart version (chart version)

0.23

Copy link

@michaelmowry, thank you for creating this issue. We will troubleshoot it as soon as we can.


Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

@VietND96
Copy link
Member

Hi @michaelmowry, can you try to kubectl describe selenium-node-config to see what is SE_NODE_GRID_URL is set there?

@michaelmowry
Copy link
Author

Thanks for the reply. SE_NODE_GRID_URL = http://selenium-hub.seleniumgridpoc:4444

@VietND96
Copy link
Member

@michaelmowry, can you try to enable FINE logs in Node, what's wrong behind Sending registration event...

chromeNode:
  extraEnvironmentVariables:
    - name: SE_OPTS
      value: "--log-level FINE"

If there is no dependency can you try the latest chart 0.26.3 with this config passing when installing the chart
--set global.seleniumGrid.logLevel=FINE, it would simply enable FINE logs for all components

@michaelmowry
Copy link
Author

@VietND96,

We upgraded to 0.26.3 and still get the same issue with the chrome node not connecting. The only items to note are:

  1. We disable basic auth
  2. We use istio for traffic control see line 39 in values.yaml
  3. IsolateComponents = false and we disable Edge, Firefox, Video, and scaling just to work on connectivity with chrome nodes
  4. We set a hostname on line 75 but disable ingress because we are using istio. I don't think this is an issue because we are able to access the selenium grid web console at http://seleniumgrid-sbx.company.com
  5. We updated logging to FINEST

The updated values files and logs are attached. We also validated connectivity between the chrome node and the hub via curl and have attached the logs with the failed registration for the chrome node. We still get a timeout on "Sending registration event...". We can queue tests for execution but they also timeout due to no available chrome nodes. We have tried quite a few things but haven't been able to solve this...would appreciate any ideas.

values.yaml.txt
values-istio.yaml.txt
chrome-node-argocd-logs.txt
chrome-node-curl-logs

@VietND96
Copy link
Member

Honestly, I don't have much experience with Istio. Let me look around to see any clue.
How about other kinds of service deployment? without Istio, NodePort, or Ingress?

@VietND96
Copy link
Member

@michaelmowry, there is another ticket that also mentioned the same problem when Node registers - #1645 (comment). There was a comment mentioned that can be resolved by disabling Java Opentelemetry feature on the Selenium process.
Can you try to add the below configs under chromeNode

chromeNode:
  extraEnvironmentVariables:
    - name: SE_JAVA_OPTS
      value: "-Dotel.javaagent.enabled=false -Dotel.metrics.exporter=none -Dotel.sdk.disabled=true"

@michaelmowry
Copy link
Author

@VietND96 thank you for your continued support. I tried adding the SE_JAVA_OPTS above and still no change to the connectivity issue. I will also look for a response from @eowoyn in the comment linked above.

@amardeep2006
Copy link
Contributor

amardeep2006 commented Dec 22, 2023

@michaelmowry What role does istio play in your kubernetes cluster ? Can it block the traffic within kubernetes namespace among pods ?
I faced a different issue of similar nature due to Calico networking policy. The calico by default is zero trust in my setup.
Had to apply the appropriate network policy so that Node and Hub can talk to each other.

@michaelmowry
Copy link
Author

Istio is a traffic manager within our cluster. It can block traffic within the namespace, however we have it configured to allow all traffic within the namespace.

Calico is disabled in our namespace.

The chrome node and hub run on seperate pods and have different IPs. From the chrome node log snippet below, it appears that selenium-hub is accessible on 4442 and 4443 as the sockets are created. Can anyone tell us more about how the registration event works? What port does it occur on and what endpoint does it use to register with the hub? It is strange that the 4442/4443 connection works but the registration does not, right?

10:15:27.108 INFO [UnboundZmqEventBus.<init>] - Connecting to tcp://selenium-hub:4442 and tcp://selenium-hub:4443
10:15:27.293 INFO [UnboundZmqEventBus.<init>] - Sockets created
10:15:28.303 INFO [UnboundZmqEventBus.<init>] - Event bus ready
10:15:28.514 INFO [NodeServer.createHandlers] - Reporting self as: http://10.244.3.8:5555/
10:15:28.585 INFO [NodeOptions.getSessionFactories] - Detected 1 available processors
10:15:28.710 INFO [NodeOptions.report] - Adding chrome for {"browserName": "chrome","browserVersion": "120.0","goog:chromeOptions": {"binary": "\u002fusr\u002fbin\u002fgoogle-chrome"},"platformName": "linux","se:noVncPort": 7900,"se:vncEnabled": true} 1 times
10:15:28.796 INFO [Node.<init>] - Binding additional locator mechanisms: relative
10:15:29.214 INFO [NodeServer$1.start] - Starting registration process for Node http://10.244.3.8:5555/
10:15:29.216 INFO [NodeServer.execute] - Started Selenium node 4.16.1 (revision 9b4c83354e): http://10.244.3.8:5555/
10:15:29.280 INFO [NodeServer$1.lambda$start$1] - Sending registration event...
10:15:39.283 INFO [NodeServer$1.lambda$start$1] - Sending registration event...
10:15:49.289 INFO [NodeServer$1.lambda$start$1] - Sending registration event...

@amardeep2006
Copy link
Contributor

amardeep2006 commented Jan 2, 2024

Not specific to kubernetes but this link may be helpful on ports used in registration https://www.selenium.dev/documentation/grid/getting_started/#node-and-hub-on-different-machines
Few things I will try in your situation assuming you are using hub mode:

  1. Try enabling DEBUG mode via helm chart if that prints more details around registration.
  2. exec into hub/node containers and check if pods can connect on desired ports via kubernetes services.
  3. Check istio logs /ui . Does istio offer some interactive ui where I can see the traffic ?

@Thomas-Personal
Copy link

Thomas-Personal commented Feb 7, 2024

Hi, I am continuing Micheal's effort from our team. The issue is still not received. I tried with diabling the open telemetry feature as mentioned in the comment - [https://github.com//issues/1645#issuecomment-1851895016.)].But it didnt work out.

Also I am attaching the response from hub and nodes when doing curl from one another. Please let me know if it does ring a bell on an possible cause?

Hub to Node:

hub to node

Node to Hub:

node to hub

@VietND96
Copy link
Member

VietND96 commented Feb 8, 2024

Node also needs to reach EventBus (port 4442, 4443) inside the Hub, that communication is done via TCP. Can you check if that is enabled?

@Thomas-Personal
Copy link

Hi Everyone, I am able to register the nodes by passing the environment variables of Pod names.

I have another question on https:// calls inside nodes.

When i trigger a test using my selenium grid on AKS, by default the webpage under test are routed to http:// instead of HTTPS://

Can you please help me to understand the root cause of this issue.

@VietND96
Copy link
Member

Hi @Thomas-Personal, may I know the details on passing the environment variables of Pod names. Which env vars and it belongs to which component? With Istio (service mesh) if using Service names it won't work?

@VietND96
Copy link
Member

I just tried to understand Istio and service mesh, it looks like one proxy sidecar per pod, so I guess that's the reason Pod names are needed for components communication.
Currently, by default in chart, Service names are used only. So I am thinking on how to extend the supports, then we can simplify this kind of deployment.

@Thomas-Personal
Copy link

Hi @VietND96 , We have updated the service names in the node env. by default , it was using the POD IP tp register the nodes. when we passed the service names , it got registered

@Thomas-Personal
Copy link

@VietND96 , Can you please let me know the release from which the service names are used by default. Passing the service names in the extra env variables causing some issues during autoscaled jobs . I am using 0.26.3.but it seems to have taken the POD IP for registration

@VietND96
Copy link
Member

VietND96 commented Apr 3, 2024

Hi @Thomas-Personal, you can check the chart version 0.28.0 onwards

@Thomas-Personal
Copy link

Thank you @VietND96 . I have issues with autoscaling . When the queue size is 2 , there are two scaled jobs triggered for chrome node. But only one node was successful and one test case picked up and run and the other test case failed. I could see only one node in the UI . the other node also says the node registration is successful.

But I am not sure what was the error. Is it because the both scaled jobs using the same port ? do we need to change any configuration to see both queued test cases picked up successfully ?

@Thomas-Personal
Copy link

Hi @VietND96 , In Istio mesh, the POD IP based node registration seems to be causing the problem. So i added the below in the helpers.tpl
- name: SE_NODE_HOST
value: {{ .name | quote }}

Node registration is successful after including this part. But I couldn't get more than one node registered. Could you please help me with this issue.

@VietND96
Copy link
Member

VietND96 commented Apr 8, 2024

@Thomas-Personal, I have not tried this way yet, let me try to see any clue and get back to you.

@Thomas-Personal
Copy link

Thank you so much . Please let me know the results once you tried it. I am trying to implement it with ISTIO mesh for the organization that i work.

@Thomas-Personal
Copy link

Hi @VietND96 , I have made the clusterIP: none in the node service which made the service as headless without cluster IP and node started registering without issues.

I have tried with KEDA autoscalar. I am facing two issues ,

  1. After completion, the sidecar proxy(istio-proxy) is not terminated. beacuse of which the pod continue to exist
  2. If test cases timeout before pod spin up, the jobs are not terminating the container

Please help me with the above two issues

@kakliniew
Copy link

kakliniew commented Aug 7, 2024

@VietND96 any updates on this?

@VietND96
Copy link
Member

Service resource creation is disabled by default for Nodes
In Node, SE_NODE_HOST refer to status.podIP
For redundant pods scaled up keep running, you can refer to use KEDA with some fixes in the scaler logic - https://github.com/SeleniumHQ/docker-selenium/tree/trunk/.keda

Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked and limited conversation to collaborators Nov 14, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants