-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[🐛 Bug]: Nodes Disconnecting from Hub after AKS Deployment with Helm Chart #2065
Comments
@michaelmowry, thank you for creating this issue. We will troubleshoot it as soon as we can. Info for maintainersTriage this issue by using labels.
If information is missing, add a helpful comment and then
If the issue is a question, add the
If the issue is valid but there is no time to troubleshoot it, consider adding the
If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C),
add the applicable
After troubleshooting the issue, please add the Thank you! |
Hi @michaelmowry, can you try to |
Thanks for the reply. SE_NODE_GRID_URL = http://selenium-hub.seleniumgridpoc:4444 |
@michaelmowry, can you try to enable chromeNode:
extraEnvironmentVariables:
- name: SE_OPTS
value: "--log-level FINE" If there is no dependency can you try the latest chart |
We upgraded to 0.26.3 and still get the same issue with the chrome node not connecting. The only items to note are:
The updated values files and logs are attached. We also validated connectivity between the chrome node and the hub via curl and have attached the logs with the failed registration for the chrome node. We still get a timeout on "Sending registration event...". We can queue tests for execution but they also timeout due to no available chrome nodes. We have tried quite a few things but haven't been able to solve this...would appreciate any ideas. values.yaml.txt |
Honestly, I don't have much experience with Istio. Let me look around to see any clue. |
@michaelmowry, there is another ticket that also mentioned the same problem when Node registers - #1645 (comment). There was a comment mentioned that can be resolved by disabling Java Opentelemetry feature on the Selenium process. chromeNode:
extraEnvironmentVariables:
- name: SE_JAVA_OPTS
value: "-Dotel.javaagent.enabled=false -Dotel.metrics.exporter=none -Dotel.sdk.disabled=true" |
@michaelmowry What role does istio play in your kubernetes cluster ? Can it block the traffic within kubernetes namespace among pods ? |
Istio is a traffic manager within our cluster. It can block traffic within the namespace, however we have it configured to allow all traffic within the namespace. Calico is disabled in our namespace. The chrome node and hub run on seperate pods and have different IPs. From the chrome node log snippet below, it appears that selenium-hub is accessible on 4442 and 4443 as the sockets are created. Can anyone tell us more about how the registration event works? What port does it occur on and what endpoint does it use to register with the hub? It is strange that the 4442/4443 connection works but the registration does not, right?
|
Not specific to kubernetes but this link may be helpful on ports used in registration https://www.selenium.dev/documentation/grid/getting_started/#node-and-hub-on-different-machines
|
Hi, I am continuing Micheal's effort from our team. The issue is still not received. I tried with diabling the open telemetry feature as mentioned in the comment - [https://github.com//issues/1645#issuecomment-1851895016.)].But it didnt work out. Also I am attaching the response from hub and nodes when doing curl from one another. Please let me know if it does ring a bell on an possible cause? Hub to Node: Node to Hub: |
Node also needs to reach EventBus (port 4442, 4443) inside the Hub, that communication is done via TCP. Can you check if that is enabled? |
Hi Everyone, I am able to register the nodes by passing the environment variables of Pod names. I have another question on https:// calls inside nodes. When i trigger a test using my selenium grid on AKS, by default the webpage under test are routed to http:// instead of HTTPS:// Can you please help me to understand the root cause of this issue. |
Hi @Thomas-Personal, may I know the details on |
I just tried to understand Istio and service mesh, it looks like one proxy sidecar per pod, so I guess that's the reason Pod names are needed for components communication. |
Hi @VietND96 , We have updated the service names in the node env. by default , it was using the POD IP tp register the nodes. when we passed the service names , it got registered |
@VietND96 , Can you please let me know the release from which the service names are used by default. Passing the service names in the extra env variables causing some issues during autoscaled jobs . I am using 0.26.3.but it seems to have taken the POD IP for registration |
Hi @Thomas-Personal, you can check the chart version |
Thank you @VietND96 . I have issues with autoscaling . When the queue size is 2 , there are two scaled jobs triggered for chrome node. But only one node was successful and one test case picked up and run and the other test case failed. I could see only one node in the UI . the other node also says the node registration is successful. But I am not sure what was the error. Is it because the both scaled jobs using the same port ? do we need to change any configuration to see both queued test cases picked up successfully ? |
Hi @VietND96 , In Istio mesh, the POD IP based node registration seems to be causing the problem. So i added the below in the helpers.tpl Node registration is successful after including this part. But I couldn't get more than one node registered. Could you please help me with this issue. |
@Thomas-Personal, I have not tried this way yet, let me try to see any clue and get back to you. |
Thank you so much . Please let me know the results once you tried it. I am trying to implement it with ISTIO mesh for the organization that i work. |
Hi @VietND96 , I have made the clusterIP: none in the node service which made the service as headless without cluster IP and node started registering without issues. I have tried with KEDA autoscalar. I am facing two issues ,
Please help me with the above two issues |
@VietND96 any updates on this? |
Service resource creation is disabled by default for Nodes |
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
What happened?
Our team has deployed Selenium Grid to AKS using the helm templates in the repository. Our problem is that the nodes connect to the hub very briefly and are visible in the UI and then disappear and do not show up again. In the logs below we can see that the registration event between the node and hub is not successful. We are attempting to use a basic hub/node architecture with isolateComponents=false. We have disabled ingress and basic auth and are using istio. We are able to access the Selenium Grid UI on the Hub and we are able to queue tests but they timeout as no nodes are available for processing. Thanks in advance for any help on resolving this.
Command used to start Selenium Grid with Docker (or Kubernetes)
Relevant log output
Operating System
AKS
Docker Selenium version (image tag)
4.14.1-20231025
Selenium Grid chart version (chart version)
0.23
The text was updated successfully, but these errors were encountered: