-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[🐛 Bug]: Tests gets pointed to Terminating PODs in K8 #2155
Comments
@kherath17, thank you for creating this issue. We will troubleshoot it as soon as we can. Info for maintainersTriage this issue by using labels.
If information is missing, add a helpful comment and then
If the issue is a question, add the
If the issue is valid but there is no time to troubleshoot it, consider adding the
If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C),
add the applicable
After troubleshooting the issue, please add the Thank you! |
In this case, how do you identify that is a new session is assigned to Terminating pod OR it is a session is in progress and wait for completion OR another possible case is new session and pod scale down at the same second? |
@VietND96 that's the problem I have not set any explicit logic for that, I've let Selenium Grid handle that part as how it assigns the sessions to available free nodes, but the problem is it routes the new session to the terminating pod which deletes the node halfway thru the test |
I think it would be different. Following https://www.selenium.dev/documentation/grid/advanced_features/endpoints/#drain-node, a draining node will not be assigned any new session. I think it is tested enough via UT at upstream project Your situation here could be, in your test a session was created and it was used across tests (or it served long-running execution). At any point in time, when the Queue = 0 (all requests were served) the Scaler changed the number of replicas of Node deployment, and Node pods were randomly selected to terminate. Pod gets stuck at Terminating due to the preStop script handle to wait for if any session in progress can be finished gracefully. Pods stay with status Terminating in how long depends on |
@VietND96 thanks for the insights but to provide more info our scaling down gets triggered when a user calls the
FYI : This issue does not come into play if I have added some thread sleep with around 2000ms within the Note - |
So in your deployment, it could have a possible case, something like closing a session, creating a new session, and Scaler scaling down Node pods at the same second, it seems unpredictable. |
Closing this to investigate a more aligning solution with consideration to above comment thanks @VietND96 |
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
What happened?
Background:
I have configured a setup for Selenium Grid in Kubernetes with all relevant services which has the capability to understand incoming requests and create browser pods on demand and also to bring down the browsers after test execution through a Custom Scaler built. This is further enhanced with a preExec step on the pod lifecycle for node draining as well.
Issue:
When a pod deletion is triggered after test execution completes, it then triggers the node drain command defined on the pod lifecycle which keeps the pod in 'Terminating' status for few seconds, due to this the next consecutive test gets pointed to that terminating pod and starts it test execution where the pod gets deleted halfway through the test, resulting in a session id unknown exception.
Sample POD file:
"apiVersion": "v1",
"kind": "Pod",
"metadata": {
"name": pod_name,
"labels": {
"name": pod_name,
"app": "selenium-node-edge"
}
},
"spec": {
"volumes": [
{
"name": "dshm",
"emptyDir": {
"medium": "Memory"
}
}
],
"containers": [
{
"name": "selenium-node-edge",
"image": "selenium/node-edge:latest",
"imagePullPolicy": "IfNotPresent", #Newly Added
"ports": [
{
"containerPort": 5555
}
],
"volumeMounts": [
{
"mountPath": "/dev/shm",
"name": "dshm"
}
],
"env": [
{
"name": "SE_EVENT_BUS_HOST",
"value": "selenium-hub"
},
{
"name": "SE_EVENT_BUS_SUBSCRIBE_PORT",
"value": "4443"
},
{
"name": "SE_EVENT_BUS_PUBLISH_PORT",
"value": "4442"
},
{
"name": "SE_NODE_MAX_SESSIONS",
"value": "1"
},
{
"name": "SE_NODE_GRID_URL",
"value": "https://test.cloud.test.net/qlabv2"
}
],
"resources": {
"requests": { #Newly Added
"memory": "1000Mi",
"cpu": ".1"
},
"limits": {
"memory": "1000Mi",
"cpu": ".2" #0.5
}
},
"lifecycle": {
"preStop": {
"exec": {
"command": [
"/bin/bash",
"-c",
'if [ ! -z "${SE_REGISTRATION_SECRET}" ]; then HEADERS="X-REGISTRATION-SECRET: ${SE_REGISTRATION_SECRET}"; else HEADERS="X-REGISTRATION-SECRET;"; fi; curl -k -X POST http://127.0.0.1:5555/se/grid/node/drain --header "${HEADERS}"; while curl -sfk http://127.0.0.1:5555/status; do sleep 1; done;'
]
}
}
}
}
]
}
Command used to start Selenium Grid with Docker (or Kubernetes)
Relevant log output
Operating System
Kubernetes - EKS
Docker Selenium version (image tag)
4.17.0
Selenium Grid chart version (chart version)
No response
The text was updated successfully, but these errors were encountered: