[🐛 Bug]: Selenium Grid, request stuck in retry loop #10930

eete22 · 2022-08-09T12:41:10Z

What happened?

Everything was working fine with Selenium Grid 3, but when we switched over to Selenium Grid 4 we started experiencing steps in our test cases sometimes taking ~18 minutes longer to complete. Looking at the logs, it seems that a request gets stuck in a retry loop. We have observed it for several different types of requests (e.g. /execute/sync, /frame/, /se/log).

We are running Selenium Grid with Docker/Podman using the following compose file:

hub:
    image: selenium/hub:4.2.0-20220527
    container_name: hub
    ports:
      - "4442:4442"
      - "4443:4443"
      - "4444:4444"
    restart: always
    environment:
      - SE_SESSION_REQUEST_TIMEOUT=2400
      
  node:
    image: selenium/node-chrome:4.2.0-20220527
    shm_size: 2gb
    container_name: node
    depends_on:
      - hub
    restart: always
    environment:
      - SE_EVENT_BUS_HOST=hub
      - SE_EVENT_BUS_PUBLISH_PORT=4442
      - SE_EVENT_BUS_SUBSCRIBE_PORT=4443
      - SE_NODE_SESSION_TIMEOUT=300
      - SE_NODE_OVERRIDE_MAX_SESSIONS=true
      - SE_NODE_MAX_SESSIONS=4
      - SCREEN_WIDTH=1920
      - SCREEN_HEIGHT=1080
      - SCREEN_DEPTH=24
    volumes:
      - /dev/shm:/dev/shm
    ports:
      - "9002:5900"
    links:
      - hub

I'll try to explain the scenario we have observed in the logs, starting from the request before the one that gets stuck.

1. Hub: Sends previous request.

13:36:23.034 DEBUG [NettyRequestSender.sendRequestWithOpenChannel] - Using open Channel [id: 0x7d412036, L:/10.88.0.31:51890 - R:/10.88.0.31:5555] for GET '/session/20135a43e0062e63b125ccdf0a75ab23/element/5e717040-395e-4053-b7b4-07910d640306/enabled'

2. Node: Previous request gets a channel.

13:36:23.036 DEBUG [NettyRequestSender.pollPooledChannel] - Using pooled Channel '[id: 0x34b2bf61, L:/127.0.0.1:39380 - R:localhost/127.0.0.1:60380]' for 'GET' to 'http://localhost:60380/session/20135a43e0062e63b125ccdf0a75ab23/element/5e717040-395e-4053-b7b4-07910d640306/enabled'
13:36:23.036 DEBUG [NettyRequestSender.sendRequestWithOpenChannel] - Using open Channel [id: 0x34b2bf61, L:/127.0.0.1:39380 - R:localhost/127.0.0.1:60380] for GET '/session/20135a43e0062e63b125ccdf0a75ab23/element/5e717040-395e-4053-b7b4-07910d640306/enabled'

3. Node: Previous request is sent and receives response.

13:36:23.043 DEBUG [HttpHandler.handleHttpResponse] -
Request DefaultFullHttpRequest(decodeResult: success, version: HTTP/1.1, content: EmptyByteBufBE)
GET /session/20135a43e0062e63b125ccdf0a75ab23/element/5e717040-395e-4053-b7b4-07910d640306/enabled HTTP/1.1
Accept: application/json, image/png
User-Agent: selenium/4.0.0 (.net linux)
Connection: keep-alive
Cache-Control: no-cache
traceparent: 00-f1d0e00c8e477eda9572995e0b7da776-629612b9dedff6a5-01
host: localhost:60380
Content-Type: application/json; charset=utf-8
  Response DefaultHttpResponse(decodeResult: success, version: HTTP/1.1)
  HTTP/1.1 200 OK
  Content-Type: application/json; charset=utf-8
  cache-control: no-cache
  content-length: 14

4. Node: Channel for previous request is offered back to pool.

13:36:23.044 DEBUG [ChannelManager.tryToOfferChannelToPool] - Adding key: http://localhost:60380 for channel [id: 0x34b2bf61, L:/127.0.0.1:39380 - R:localhost/127.0.0.1:60380]

5. Hub: Sends new request.

13:36:23.055 DEBUG [NettyRequestSender.sendRequestWithOpenChannel] - Using open Channel [id: 0x7d412036, L:/10.88.0.31:51890 - R:/10.88.0.31:5555] for POST '/session/20135a43e0062e63b125ccdf0a75ab23/execute/sync'

6. Node: New request is processed, but it freezes after "injecting" log (where is normally would say "Using __ channel").

13:36:23.059 DEBUG [HttpTracing.inject] - Injecting (POST) /session/20135a43e0062e63b125ccdf0a75ab23/execute/sync into OpenTelemetrySpan{traceId=a780c6e2ef7b2b647a0c64e1a3a0a9da,spanId=31fc6d6d7e2fbf62} at org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler:84

7. Node: About a minuter later the idleChannelDetector finds that the channel used by the previous request is idle, and closes it.

13:37:23.557 DEBUG [DefaultChannelPool$IdleChannelDetector.expiredChannels] - Adding Candidate expired Channel [id: 0x34b2bf61, L:/127.0.0.1:39380 - R:localhost/127.0.0.1:60380] isIdleTimeoutExpired=true isRemotelyClosed=false isTtlExpired=false
13:37:23.557 DEBUG [DefaultChannelPool$IdleChannelDetector.closeChannels] - Closing Idle Channel [id: 0x34b2bf61, L:/127.0.0.1:39380 - R:localhost/127.0.0.1:60380]

8. Hub: About two minuter after that, the hub gets server error and retries (180 s default timeout).

13:39:23.076 DEBUG [RetryRequest.lambda$static$7] - Failure due to server error #1. Retrying.
13:39:23.082 DEBUG [NettyConnectListener.writeRequest] - Using new Channel '[id: 0xc39d1690, L:/10.88.0.31:52094 - R:/10.88.0.31:5555]' for 'POST' to '/session/20135a43e0062e63b125ccdf0a75ab23/execute/sync'

9. Node: the frozen request starts running and gets a channel, then gets stuck again.

13:39:23.087 DEBUG [NettyConnectListener.writeRequest] - Using new Channel '[id: 0x3d99f31f, L:/127.0.0.1:39510 - R:localhost/127.0.0.1:60380]' for 'POST' to '/session/20135a43e0062e63b125ccdf0a75ab23/execute/sync'

10. Node: The new retry request gets frozen after the "injecting" log.

13:39:23.089 DEBUG [HttpTracing.inject] - Injecting (POST) /session/20135a43e0062e63b125ccdf0a75ab23/execute/sync into OpenTelemetrySpan{traceId=a780c6e2ef7b2b647a0c64e1a3a0a9da,spanId=656ae3b1f9818e50} at org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler:84

11. Five minutes after the call in step (9) the node gets a timeout exception for the first frozen request.

Build info: version: '4.2.0', revision: '86eb611648'
System info: host: '7cc6f0360350', ip: 'Unknown', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-372.9.1.el8.x86_64', java.version: '11.0.15'
Driver info: driver.version: unknown
org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException
Build info: version: '4.2.0', revision: '86eb611648'
System info: host: '7cc6f0360350', ip: 'Unknown', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-372.9.1.el8.x86_64', java.version: '11.0.15'
Driver info: driver.version: unknown
	at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:65)
	at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
	at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
	at org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:49)
	at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
	at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
	at org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:98)
	at org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)
	at org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:92)
	at org.openqa.selenium.grid.node.ProtocolConvertingSession.execute(ProtocolConvertingSession.java:75)
	at org.openqa.selenium.grid.node.local.SessionSlot.execute(SessionSlot.java:125)
	at org.openqa.selenium.grid.node.local.LocalNode.executeWebDriverCommand(LocalNode.java:447)
	at org.openqa.selenium.grid.node.ForwardWebDriverCommand.execute(ForwardWebDriverCommand.java:35)
	at org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)
	at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
	at org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler.execute(SpanWrappedHttpHandler.java:86)
	at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
	at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
	at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
	at org.openqa.selenium.grid.node.Node.execute(Node.java:240)
	at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
	at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
	at org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)
	at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
	at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
	at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
	at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
	at org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.util.concurrent.TimeoutException
	at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
	at org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)
	at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:59)
	... 32 more
13:44:23.082 DEBUG [SeleniumSpanExporter$1.lambda$export$4] - SpanData{spanContext=ImmutableSpanContext{traceId=a780c6e2ef7b2b647a0c64e1a3a0a9da, spanId=31fc6d6d7e2fbf62, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, parentSpanContext=ImmutableSpanContext{traceId=a780c6e2ef7b2b647a0c64e1a3a0a9da, spanId=6ee9490e9202f7ef, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=true, valid=true}, resource=Resource{schemaUrl=https://opentelemetry.io/schemas/1.9.0, attributes={service.name="unknown_service:java", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.14.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=default, version=null, schemaUrl=null}, name=node.forward_command, kind=INTERNAL, startEpochNanos=1655904983059769000, endEpochNanos=1655905463082764947, attributes=AttributesMap{data={http.method=POST, random.key=e65233dd-8b7d-4030-89f9-c26c7e17877e, span.kind=server, http.target=/session/20135a43e0062e63b125ccdf0a75ab23/execute/sync, error=true}, capacity=128, totalAddedValues=5}, totalAttributeCount=5, events=[ImmutableEventData{name=exception, attributes={exception.message="Unable to execute request: java.util.concurrent.TimeoutException
Build info: version: '4.2.0', revision: '86eb611648'
System info: host: '7cc6f0360350', ip: 'Unknown', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-372.9.1.el8.x86_64', java.version: '11.0.15'
Driver info: driver.version: unknown", exception.stacktrace="org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException
Build info: version: '4.2.0', revision: '86eb611648'
System info: host: '7cc6f0360350', ip: 'Unknown', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-372.9.1.el8.x86_64', java.version: '11.0.15'
Driver info: driver.version: unknown
	at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:65)
	at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
	at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
	at org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:49)
	at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
	at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
	at org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:98)
	at org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)
	at org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:92)
	at org.openqa.selenium.grid.node.ProtocolConvertingSession.execute(ProtocolConvertingSession.java:75)
	at org.openqa.selenium.grid.node.local.SessionSlot.execute(SessionSlot.java:125)
	at org.openqa.selenium.grid.node.local.LocalNode.executeWebDriverCommand(LocalNode.java:447)
	at org.openqa.selenium.grid.node.ForwardWebDriverCommand.execute(ForwardWebDriverCommand.java:35)
	at org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)
	at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
	at org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler.execute(SpanWrappedHttpHandler.java:86)
	at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
	at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
	at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
	at org.openqa.selenium.grid.node.Node.execute(Node.java:240)
	at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
	at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
	at org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)
	at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
	at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
	at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
	at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
	at org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.util.concurrent.TimeoutException
	at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
	at org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)
	at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:59)
	... 32 more
", exception.type="org.openqa.selenium.TimeoutException", http.flavor=1, http.handler_class="org.openqa.selenium.remote.http.Route$PredicatedRoute", http.host="hub:4444", http.method="POST", http.request_content_length="99", http.scheme="HTTP", http.target="/session/20135a43e0062e63b125ccdf0a75ab23/execute/sync", http.user_agent="selenium/4.0.0 (.net linux)"}, epochNanos=1655905463082460855, totalAttributeCount=11}], totalRecordedEvents=1, links=[], totalRecordedLinks=0, status=ImmutableStatusData{statusCode=ERROR, description=Kind: UNKNOWN Description:}, hasEnded=true}
13:44:23.082 WARN [SeleniumSpanExporter$1.lambda$export$1] - Unable to execute request: java.util.concurrent.TimeoutException
Build info: version: '4.2.0', revision: '86eb611648'
System info: host: '7cc6f0360350', ip: 'Unknown', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-372.9.1.el8.x86_64', java.version: '11.0.15'
Driver info: driver.version: unknown
13:44:23.082 WARN [SeleniumSpanExporter$1.lambda$export$1] - org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException
Build info: version: '4.2.0', revision: '86eb611648'
System info: host: '7cc6f0360350', ip: 'Unknown', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-372.9.1.el8.x86_64', java.version: '11.0.15'
Driver info: driver.version: unknown
	at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:65)
	at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
	at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
	at org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:49)
	at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
	at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
	at org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:98)
	at org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)
	at org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:92)
	at org.openqa.selenium.grid.node.ProtocolConvertingSession.execute(ProtocolConvertingSession.java:75)
	at org.openqa.selenium.grid.node.local.SessionSlot.execute(SessionSlot.java:125)
	at org.openqa.selenium.grid.node.local.LocalNode.executeWebDriverCommand(LocalNode.java:447)
	at org.openqa.selenium.grid.node.ForwardWebDriverCommand.execute(ForwardWebDriverCommand.java:35)
	at org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)
	at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
	at org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler.execute(SpanWrappedHttpHandler.java:86)
	at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
	at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
	at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
	at org.openqa.selenium.grid.node.Node.execute(Node.java:240)
	at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
	at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
	at org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)
	at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
	at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
	at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
	at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
	at org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.util.concurrent.TimeoutException
	at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
	at org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)
	at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:59)
	... 32 more

13:44:23.083 WARN [SeleniumSpanExporter$1.lambda$export$3] - {"traceId": "a780c6e2ef7b2b647a0c64e1a3a0a9da","eventTime": 1655905463082460855,"eventName": "exception","attributes": {"exception.message": "Unable to execute request: java.util.concurrent.TimeoutException\nBuild info: version: '4.2.0', revision: '86eb611648'\nSystem info: host: '7cc6f0360350', ip: 'Unknown', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-372.9.1.el8.x86_64', java.version: '11.0.15'\nDriver info: driver.version: unknown","exception.stacktrace": "org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException\nBuild info: version: '4.2.0', revision: '86eb611648'\nSystem info: host: '7cc6f0360350', ip: 'Unknown', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-372.9.1.el8.x86_64', java.version: '11.0.15'\nDriver info: driver.version: unknown\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:65)\n\tat org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)\n\tat org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:49)\n\tat org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)\n\tat org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)\n\tat org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:98)\n\tat org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)\n\tat org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:92)\n\tat org.openqa.selenium.grid.node.ProtocolConvertingSession.execute(ProtocolConvertingSession.java:75)\n\tat org.openqa.selenium.grid.node.local.SessionSlot.execute(SessionSlot.java:125)\n\tat org.openqa.selenium.grid.node.local.LocalNode.executeWebDriverCommand(LocalNode.java:447)\n\tat org.openqa.selenium.grid.node.ForwardWebDriverCommand.execute(ForwardWebDriverCommand.java:35)\n\tat org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler.execute(SpanWrappedHttpHandler.java:86)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.grid.node.Node.execute(Node.java:240)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)\n\tat java.base\u002fjava.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base\u002fjava.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base\u002fjava.lang.Thread.run(Thread.java:829)\nCaused by: java.util.concurrent.TimeoutException\n\tat java.base\u002fjava.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)\n\tat java.base\u002fjava.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)\n\tat org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:59)\n\t... 32 more\n","exception.type": "org.openqa.selenium.TimeoutException","http.flavor": 1,"http.handler_class": "org.openqa.selenium.remote.http.Route$PredicatedRoute","http.host": "hub:4444","http.method": "POST","http.request_content_length": "99","http.scheme": "HTTP","http.target": "\u002fsession\u002f20135a43e0062e63b125ccdf0a75ab23\u002fexecute\u002fsync","http.user_agent": "selenium\u002f4.0.0 (.net linux)"}}

13:44:23.157 DEBUG [TimeoutTimerTask.expire] - Request timeout to localhost/127.0.0.1:60380 after 300000 ms for NettyResponseFuture{currentRetry=0,
	isDone=0,
	isCancelled=0,
	asyncHandler=org.asynchttpclient.AsyncCompletionHandlerBase@144965c7,
	nettyRequest=org.asynchttpclient.netty.request.NettyRequest@4595be47,
	future=java.util.concurrent.CompletableFuture@43b022a9[Not completed],
	uri=http://localhost:60380/session/20135a43e0062e63b125ccdf0a75ab23/execute/sync,
	keepAlive=true,
	redirectCount=0,
	timeoutsHolder=org.asynchttpclient.netty.timeout.TimeoutsHolder@4f3f3e4a,
	inAuth=0,
	touch=1655905163087} after 300079 ms
13:44:23.157 DEBUG [ChannelManager.closeChannel] - Closing Channel [id: 0x3d99f31f, L:/127.0.0.1:39510 - R:localhost/127.0.0.1:60380] 
13:44:23.158 DEBUG [NettyRequestSender.abort] - Aborting Future NettyResponseFuture{currentRetry=0,
	isDone=0,
	isCancelled=0,
	asyncHandler=org.asynchttpclient.AsyncCompletionHandlerBase@144965c7,
	nettyRequest=org.asynchttpclient.netty.request.NettyRequest@4595be47,
	future=java.util.concurrent.CompletableFuture@43b022a9[Not completed],
	uri=http://localhost:60380/session/20135a43e0062e63b125ccdf0a75ab23/execute/sync,
	keepAlive=true,
	redirectCount=0,
	timeoutsHolder=org.asynchttpclient.netty.timeout.TimeoutsHolder@4f3f3e4a,
	inAuth=0,
	touch=1655905163087}

13:44:23.158 DEBUG [NettyRequestSender.abort] - Request timeout to localhost/127.0.0.1:60380 after 300000 ms
java.util.concurrent.TimeoutException: Request timeout to localhost/127.0.0.1:60380 after 300000 ms
	at org.asynchttpclient.netty.timeout.TimeoutTimerTask.expire(TimeoutTimerTask.java:43)
	at org.asynchttpclient.netty.timeout.RequestTimeoutTimerTask.run(RequestTimeoutTimerTask.java:50)
	at io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715)
	at io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34)
	at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703)
	at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790)
	at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:829)
13:44:23.158 DEBUG [AsyncCompletionHandler.onThrowable] - Request timeout to localhost/127.0.0.1:60380 after 300000 ms
java.util.concurrent.TimeoutException: Request timeout to localhost/127.0.0.1:60380 after 300000 ms
	at org.asynchttpclient.netty.timeout.TimeoutTimerTask.expire(TimeoutTimerTask.java:43)
	at org.asynchttpclient.netty.timeout.RequestTimeoutTimerTask.run(RequestTimeoutTimerTask.java:50)
	at io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715)
	at io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34)
	at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703)
	at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790)
	at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:829)

And it continues like this. The hub retrying every 3 minutes, the request getting stuck in the node after "injecting" log and being released when a new retry request is made. The requests that are started in the node get a timeout exceptions after 5 minutes. Meanwhile other requests for other sessions are running in parallel.
After the hub has retried 5 times, the client starts making other calls to the hub with the same session. After that everything seems to work again.

How can we reproduce the issue?

Relevant log output

Logs in description above.

Operating System

Docker

Selenium version

C# 4.2.0

What are the browser(s) and version(s) where you see this issue?

selenium/node-chrome:4.2.0-20220527 but also earlier versions of Selenium Grid 4 image

What are the browser driver(s) and version(s) where you see this issue?

selenium/node-chrome:4.2.0-20220527 but also earlier versions of Selenium Grid 4 image

Are you using Selenium Grid?

4.2.0 (but issue also occurred with earlier versions of Selenium Grid 4)

The text was updated successfully, but these errors were encountered:

github-actions · 2022-08-09T12:41:33Z

@eete22, thank you for creating this issue. We will troubleshoot it as soon as we can.

Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

diemol · 2022-08-16T09:14:40Z

This looks similar to #9528, which we are still investigating.

diemol · 2022-10-17T16:06:58Z

Can you please try the new Java 11 HTTP client and let us know? https://www.selenium.dev/blog/2022/using-java11-httpclient/

eete22 · 2022-10-27T12:36:47Z

We updated to version 4.5.2 of the docker image which looks like it is using the Java 11 HTTP client by default.
From what we can see when running our tests with this image, the issue is resolved.
Great work!

github-actions · 2022-11-27T00:13:46Z

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

eete22 added I-defect needs-triaging labels Aug 9, 2022

titusfortner added the C-grid label Aug 9, 2022

diemol removed the needs-triaging label Aug 10, 2022

diemol added the R-awaiting answer label Oct 17, 2022

eete22 closed this as completed Oct 27, 2022

github-actions bot locked and limited conversation to collaborators Nov 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[🐛 Bug]: Selenium Grid, request stuck in retry loop #10930

[🐛 Bug]: Selenium Grid, request stuck in retry loop #10930

eete22 commented Aug 9, 2022

github-actions bot commented Aug 9, 2022

diemol commented Aug 16, 2022

diemol commented Oct 17, 2022

eete22 commented Oct 27, 2022

github-actions bot commented Nov 27, 2022

[🐛 Bug]: Selenium Grid, request stuck in retry loop #10930

[🐛 Bug]: Selenium Grid, request stuck in retry loop #10930

Comments

eete22 commented Aug 9, 2022

What happened?

How can we reproduce the issue?

Relevant log output

Operating System

Selenium version

What are the browser(s) and version(s) where you see this issue?

What are the browser driver(s) and version(s) where you see this issue?

Are you using Selenium Grid?

github-actions bot commented Aug 9, 2022

diemol commented Aug 16, 2022

diemol commented Oct 17, 2022

eete22 commented Oct 27, 2022

github-actions bot commented Nov 27, 2022