-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[🐛 Bug]: Selenium Grid, request stuck in retry loop #10930
Comments
@eete22, thank you for creating this issue. We will troubleshoot it as soon as we can. Info for maintainersTriage this issue by using labels.
If information is missing, add a helpful comment and then
If the issue is a question, add the
If the issue is valid but there is no time to troubleshoot it, consider adding the
If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C),
add the applicable
After troubleshooting the issue, please add the Thank you! |
This looks similar to #9528, which we are still investigating. |
Can you please try the new Java 11 HTTP client and let us know? https://www.selenium.dev/blog/2022/using-java11-httpclient/ |
We updated to version 4.5.2 of the docker image which looks like it is using the Java 11 HTTP client by default. |
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
What happened?
Everything was working fine with Selenium Grid 3, but when we switched over to Selenium Grid 4 we started experiencing steps in our test cases sometimes taking ~18 minutes longer to complete. Looking at the logs, it seems that a request gets stuck in a retry loop. We have observed it for several different types of requests (e.g. /execute/sync, /frame/, /se/log).
We are running Selenium Grid with Docker/Podman using the following compose file:
I'll try to explain the scenario we have observed in the logs, starting from the request before the one that gets stuck.
1. Hub: Sends previous request.
2. Node: Previous request gets a channel.
3. Node: Previous request is sent and receives response.
4. Node: Channel for previous request is offered back to pool.
5. Hub: Sends new request.
6. Node: New request is processed, but it freezes after "injecting" log (where is normally would say "Using __ channel").
7. Node: About a minuter later the idleChannelDetector finds that the channel used by the previous request is idle, and closes it.
8. Hub: About two minuter after that, the hub gets server error and retries (180 s default timeout).
9. Node: the frozen request starts running and gets a channel, then gets stuck again.
10. Node: The new retry request gets frozen after the "injecting" log.
11. Five minutes after the call in step (9) the node gets a timeout exception for the first frozen request.
And it continues like this. The hub retrying every 3 minutes, the request getting stuck in the node after "injecting" log and being released when a new retry request is made. The requests that are started in the node get a timeout exceptions after 5 minutes. Meanwhile other requests for other sessions are running in parallel.
After the hub has retried 5 times, the client starts making other calls to the hub with the same session. After that everything seems to work again.
How can we reproduce the issue?
Relevant log output
Logs in description above.
Operating System
Docker
Selenium version
C# 4.2.0
What are the browser(s) and version(s) where you see this issue?
selenium/node-chrome:4.2.0-20220527 but also earlier versions of Selenium Grid 4 image
What are the browser driver(s) and version(s) where you see this issue?
selenium/node-chrome:4.2.0-20220527 but also earlier versions of Selenium Grid 4 image
Are you using Selenium Grid?
4.2.0 (but issue also occurred with earlier versions of Selenium Grid 4)
The text was updated successfully, but these errors were encountered: