-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
java.util.concurrent.TimeoutException thrown at random netty read timeouts with RemoteWebDriver #9528
Comments
It is very likely that old Selenium versions had a (much) longer timeout. You can configure the timeout if you use the It can also be something related to the browser... Were you using the same browser version and browser driver version in old Selenium versions? All in all, we need help to reproduce this... You can have a look at the Node logs, enable more verbose logging in GeckoDriver. |
I don't think it worked before because of old Selenium had a longer timeout. Because the timeout is happening in commands that executes in a fraction of second when it does not happen. And it happens for commands that have no justification to take long, it should just succeed or return an error immediately, as for example Alert.accept() or WebDriver.switchTo().defaultContent(). But if the old selenium had some type of auto-retry on timeout, than it could be, because as detailed in the description, when I retried a timed out command it executed fine. But if I don't have any progress I will make a test increasing the timeout just to be sure. In the old selenium the browser was of the docker image I will try to find any relevant log in the node logs and return with it if I find. |
I found the following log in the browser container log that seems related to one of the java.util.concurrent.TimeoutException I get at the selenium RemoteWebDriver:
There is also a message |
I tried executing with trace log, but there were no other extra entries for the timed out request, only for the requests executed successfully before and after it. |
I understand what you mean, it is not clear why the timeout happens, but what the Grid is doing is simply relaying the command to GeckoDriver. The log shows that the I would try to run the same tests with Chrome or Edge and see what happens, to understand if the issue is GeckoDriver, something with the tests or something with the Grid. Also, what type of load does the machine have when this happens? How many tests are executed in parallel? Does this stop happening when you run them sequentially? It is much likely that new versions of Firefox need more resources, plus there is a new process in the middle (GeckoDriver). |
The tests that uses selenium are run sequential, but there were other tests in other containers running simultaneously. So I tried to run the selenium tests alone to make sure the others weren't affecting the performance, but the result was the same. |
Ok, I understand. At this point I am out of ideas for things to suggest. I think we would need some sort of way to reproduce the issue, otherwise this will only turn into a conversation which we can have in the Slack channel. |
I have an information that might be useful. Using the docker image |
I see, we would appreciate a test that can be used to reproduce the issue, even if that means we need to run the test 100 times until we are able to reproduce it. |
I was able to reproduce with the following code:
The test.html code can be found here: https://jsfiddle.net/vmtpf35o/ This causes the TimeoutException copied in the attached file stack1.txt. If the 2 commented lines are uncommented, sometimes it fails with the TimeoutException of stack1.txt, sometimes it fails with a In my test suite I found these random stack2 alike errors, and they also stopped happening when I switched to the 3.141.59-20210422 container. I don't know if it is somehow related to the TimeoutException. |
Hi @diemol , the get_window_handles problem we have been discussing actually has the same stack trace as this issue: It's a The response calls NettyMessages.toNettyRequest, and in that method we see special treatment for POSTs, which usually work for me, but not GETs, which fail. GETs not having the proper info passed could be the source of "random" failures for @rcesarlumis 's issue, and maybe why all these GET related requests are reported to hang: teodesian/Selenium-Remote-Driver#452 (comment) The git blame there also mentions that it's a draft, so maybe GET was supposed to be added, but forgotten? |
@rcesarlumis, is there something special I need to set up to reproduce the issue? I started a docker container, standalone, like this:
and then I used your code to create this maven project, which I have executed 5 times and I have not bumped into the issue. The only difference is the Grid version, EDIT: I have been trying with Chrome, I will try with Firefox and report back. (But I believe the browser made no difference). |
I tried now running your project and using the same docker run command you stated in your comment and it reproduced. I ran three times, it reproduced when |
What is your host OS?
|
I completely missed the options above where Docker options, trying again. |
My local OS, where I ran your project, is Windows 10 running Docker Toolbox (https://docs.docker.com/toolbox/), which runs the docker in a virtualbox machine running its default boot2docker Linux. That virtualbox machine is configured with 4GB RAM. It reproduced using and not using these parameters My full CI test suite that also was getting the timeout runs in CentOS 7.8. |
Thank you for sharing the details and providing a very quick response/feedback for this issue.
I set up the Grid in standalone mode using Beta-4 jar. I was able to see the timeout. |
I was able to reproduce a few times with the code shared by @pujagani above. However, I noticed that I was able to reproduce it while my laptop was running low on resources (XCODE was being upgraded). So the resources constraint might be a reason for this to happen. However, I will generate a jar in a few moments and then you all can grab it from a url I will share here, so we can get your feedback. |
This comment has been minimized.
This comment has been minimized.
This helps #9528 because Netty tends to timeout when a GET has a `content-length` header (which is seen as a bad practice), however, the server should not timeout.
@rcesarlumis could you please try with the most recent pre-release? |
@diemol Hi, This is the error on the selenium client - client.log (it happened on the And this is the selenium server docker log with trace enabled - docker.log. The trace shows the call to delete cookies, but does not show the call to I used the docker image |
@HemanthRajaSudhakar I would need a test to reproduce the issue. Seems the Java 11 HTTP client is working for most users, so we need to get a way to reproduce it with the new client. |
@HemanthRajaSudhakar |
Hi. In my case it looks like it helped. Entry "System.setProperty (" webdriver.http.factory "," jdk-http-client ");" I had to put it directly in the testing method (where I have the @test notation). Thank you very much. |
I'd appreciate if someone could provide some clarification on the documentation provided at https://www.selenium.dev/blog/2022/using-java11-httpclient/ The documentation states that if you're using Selenium Grid, to download the client and set it with the --ext flag, but it only lists for standalone grid mode. If running in fully distributed mode (router, distributor, queue, map as separate components) which components need to be set? If running the new client on the Grid, does the system property still need to be set and the pom for the tests updated? Is there any definitive way to check if tests running on the grid are utilizing the java 11+ http client as opposed to the asynchttpclient? |
@diemol do we need to add something to grid documentation for this? |
I added a note to the blog post. In short, if you want to use the new HTTP client, it needs to be updated everywhere (tests on the client side, and each Grid component). |
Do we have anything in our docs about it, though? Blog post is good for explaining *why, but should be a straightforward what in the docs? |
We faced a similar issue on Selenium PS: Upgrade of Selenium is not an option for us (and probably for others) as the latest version of stable Spring Boot ( |
I am a noob when it comes to selenium; figured I should start there. I amusing selenium grid with the "If you are using the Hub/Node(s) mode or the Distributed mode, setting the -Dwebdriver.http.factory=jdk-http-client and —-ext flags needs to be done for all components." But I am not sure from the documentation it is possible to set these options in the available python Any pointers on how to check which HTTP client tests in grid are using (Seconding @efranken question) and how to set to the java11 HTTP client would be greatly appreciated! |
@taylorpaul,
|
@tfactor2 yes, 3.x has a completely different http client. We only support the latest version of Selenium, so we can't do anything about old versions. |
Is someone still having this issue after moving to the new HTTP client? |
Yes, I tried out the newest grid version using docker. It's still pretty unstable - after a certain time, the Chrome nodes stop reacting and then that appears to take down the whole grid, with timeouts and weird error messages. Unfortunately, I haven't been able to get a minimal version to properly recreate this. I have found that restarting Chrome nodes by hand when they get a timeout keeps the error at bay, and might try automating that as a workaround. I'm still in the analyzing stages and will post if I find a solution or minimal logfiles. |
@b-rogowski that could be anything and not exactly this issue. Looking forward to hear from your analysis. |
@diemol I did not encounter the problems (java.util.concurrent.TimeoutException and java.net.http.HttpTimeoutException) during my last tests. I wait 4.6.0 to confirm that it's ok because currently my tests are interfered with "Unable to parse" or "Failed to decode request as JSON". |
It seems that starting all the nodes in headless mode and then specifying that as an option in the test browser as well fixes our problem. Maybe anyone else who gets weird Chrome crashes could try that as a fix. |
@diemol : I can't see the Thank you. |
The issue definitely still exists with selenium v4.5.0 jdk http client and docker selenium 4.5.0 hub. The exception now shows as Also, probably a good idea to not discuss other issues on this thread. |
@ApexK docker selenium 4.5.0 hub is not using the new Java 11 HTTP client, this is only available in 4.5.3, which is why you might be bumping still into the issue. Can you use 4.5.3 and let us know? |
Thank you for the feedback. I will close this issue and if something related comes up, please open a new one so we can address any variant individually. |
Hi @therealdjryan / all, java.lang.RuntimeException: org.openqa.selenium.SessionNotCreatedException: Could not start a new session. Possible causes are invalid address of the remote server or browser start-up failure. Tried below after installing distr grid latest versiob Include selenium-http-jdk-client. with maven you would do this: |
I don't know if your configuration is similar to mine in any way, but regardless, this might give you an idea. I had reassigned System.out and was capturing it looking for information in the console output. But I forgot to restore the system output channel after finding what I wanted. I had also turned on debug --verbose so I was getting a lot of output out of everything, especially netty, which of course is used by Selenium. So when I went to create a new browser or do a get of a URL, the process would deadlock when the System.out buffer filled up. Once I added a line to restore the console output channel after finding what I was looking for the deadlock was resolved. |
After updating to 4.6.0 the issue is resolved..! |
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
🐛 Bug Report
Netty at random times gets a read timeout at. This happens at different selenium commands ( for example: WebDriver.switchTo().defaultContent, WebElement.click, WebDriver.switchTo().window, WebElement.sendKeys, WebDriver.get, Alert.accept ) and at random in a quite small percentage chance (<1% test cases).
To Reproduce
I don't have specific steps to reproduce. When our CI runs our test suite of thousands of tests run, about 10 fails at random due to this timeout. I could not reproduce by doing a simple long loop with a few commands on my development workstation.
Timeout details
This timeout always occurs at:
I could confirm that it took 3 minutes there, confirming that it is due to the default 3 minutes read timeout the selenium configures the netty with. But the commands that are timing outs would normally run very fast, much less than one second.
Trying the code below in a method called probably thousands times by my test suite, it failed entering the catch. But after it called again driver.switchTo().defaultContent() at the end of the code below it worked. So it seems that although the read timeout happens in netty, it still works normally afterwards.
In this case, the stack trace got by the
e.printStackTrace()
above was:Environment
OS: Docker containers inside a CentOS
Browser: RemoteWebDriver using Firefox in selenium/standalone-firefox:4.0.0-beta-3-20210426 docker image. Also tried the selenium/standalone-firefox:4.0.0-beta-4-prerelease-20210527 docker image, but the same thing happened.
Browser Driver version: RemoteWebDriver from selenium-java 4.0.0-beta-3
Language Bindings version: Java 4.0.0-beta-3
The RemoteWebDriver runs in a container that is running in the same docker host as the browser container. So all network between them is only logical in the same machine. Previously we were using Selenium 2.52, in the same docker host, and never happened anything similar to such timeout.
Do you have any tips about what I can try to fix it or investigate more about this?
The text was updated successfully, but these errors were encountered: