Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[🐛 Bug]: Selenium Grid, request stuck in retry loop #10930

Closed
eete22 opened this issue Aug 9, 2022 · 5 comments
Closed

[🐛 Bug]: Selenium Grid, request stuck in retry loop #10930

eete22 opened this issue Aug 9, 2022 · 5 comments

Comments

@eete22
Copy link

eete22 commented Aug 9, 2022

What happened?

Everything was working fine with Selenium Grid 3, but when we switched over to Selenium Grid 4 we started experiencing steps in our test cases sometimes taking ~18 minutes longer to complete. Looking at the logs, it seems that a request gets stuck in a retry loop. We have observed it for several different types of requests (e.g. /execute/sync, /frame/, /se/log).

We are running Selenium Grid with Docker/Podman using the following compose file:

hub:
    image: selenium/hub:4.2.0-20220527
    container_name: hub
    ports:
      - "4442:4442"
      - "4443:4443"
      - "4444:4444"
    restart: always
    environment:
      - SE_SESSION_REQUEST_TIMEOUT=2400
      
  node:
    image: selenium/node-chrome:4.2.0-20220527
    shm_size: 2gb
    container_name: node
    depends_on:
      - hub
    restart: always
    environment:
      - SE_EVENT_BUS_HOST=hub
      - SE_EVENT_BUS_PUBLISH_PORT=4442
      - SE_EVENT_BUS_SUBSCRIBE_PORT=4443
      - SE_NODE_SESSION_TIMEOUT=300
      - SE_NODE_OVERRIDE_MAX_SESSIONS=true
      - SE_NODE_MAX_SESSIONS=4
      - SCREEN_WIDTH=1920
      - SCREEN_HEIGHT=1080
      - SCREEN_DEPTH=24
    volumes:
      - /dev/shm:/dev/shm
    ports:
      - "9002:5900"
    links:
      - hub

I'll try to explain the scenario we have observed in the logs, starting from the request before the one that gets stuck.

1. Hub: Sends previous request.

13:36:23.034 DEBUG [NettyRequestSender.sendRequestWithOpenChannel] - Using open Channel [id: 0x7d412036, L:/10.88.0.31:51890 - R:/10.88.0.31:5555] for GET '/session/20135a43e0062e63b125ccdf0a75ab23/element/5e717040-395e-4053-b7b4-07910d640306/enabled'

2. Node: Previous request gets a channel.

13:36:23.036 DEBUG [NettyRequestSender.pollPooledChannel] - Using pooled Channel '[id: 0x34b2bf61, L:/127.0.0.1:39380 - R:localhost/127.0.0.1:60380]' for 'GET' to 'http://localhost:60380/session/20135a43e0062e63b125ccdf0a75ab23/element/5e717040-395e-4053-b7b4-07910d640306/enabled'
13:36:23.036 DEBUG [NettyRequestSender.sendRequestWithOpenChannel] - Using open Channel [id: 0x34b2bf61, L:/127.0.0.1:39380 - R:localhost/127.0.0.1:60380] for GET '/session/20135a43e0062e63b125ccdf0a75ab23/element/5e717040-395e-4053-b7b4-07910d640306/enabled'

3. Node: Previous request is sent and receives response.

13:36:23.043 DEBUG [HttpHandler.handleHttpResponse] -
Request DefaultFullHttpRequest(decodeResult: success, version: HTTP/1.1, content: EmptyByteBufBE)
GET /session/20135a43e0062e63b125ccdf0a75ab23/element/5e717040-395e-4053-b7b4-07910d640306/enabled HTTP/1.1
Accept: application/json, image/png
User-Agent: selenium/4.0.0 (.net linux)
Connection: keep-alive
Cache-Control: no-cache
traceparent: 00-f1d0e00c8e477eda9572995e0b7da776-629612b9dedff6a5-01
host: localhost:60380
Content-Type: application/json; charset=utf-8

  Response DefaultHttpResponse(decodeResult: success, version: HTTP/1.1)
  HTTP/1.1 200 OK
  Content-Type: application/json; charset=utf-8
  cache-control: no-cache
  content-length: 14
4. Node: Channel for previous request is offered back to pool.

13:36:23.044 DEBUG [ChannelManager.tryToOfferChannelToPool] - Adding key: http://localhost:60380 for channel [id: 0x34b2bf61, L:/127.0.0.1:39380 - R:localhost/127.0.0.1:60380]

5. Hub: Sends new request.

13:36:23.055 DEBUG [NettyRequestSender.sendRequestWithOpenChannel] - Using open Channel [id: 0x7d412036, L:/10.88.0.31:51890 - R:/10.88.0.31:5555] for POST '/session/20135a43e0062e63b125ccdf0a75ab23/execute/sync'

6. Node: New request is processed, but it freezes after "injecting" log (where is normally would say "Using __ channel").

13:36:23.059 DEBUG [HttpTracing.inject] - Injecting (POST) /session/20135a43e0062e63b125ccdf0a75ab23/execute/sync into OpenTelemetrySpan{traceId=a780c6e2ef7b2b647a0c64e1a3a0a9da,spanId=31fc6d6d7e2fbf62} at org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler:84

7. Node: About a minuter later the idleChannelDetector finds that the channel used by the previous request is idle, and closes it.

13:37:23.557 DEBUG [DefaultChannelPool$IdleChannelDetector.expiredChannels] - Adding Candidate expired Channel [id: 0x34b2bf61, L:/127.0.0.1:39380 - R:localhost/127.0.0.1:60380] isIdleTimeoutExpired=true isRemotelyClosed=false isTtlExpired=false
13:37:23.557 DEBUG [DefaultChannelPool$IdleChannelDetector.closeChannels] - Closing Idle Channel [id: 0x34b2bf61, L:/127.0.0.1:39380 - R:localhost/127.0.0.1:60380]

8. Hub: About two minuter after that, the hub gets server error and retries (180 s default timeout).

13:39:23.076 DEBUG [RetryRequest.lambda$static$7] - Failure due to server error #1. Retrying.
13:39:23.082 DEBUG [NettyConnectListener.writeRequest] - Using new Channel '[id: 0xc39d1690, L:/10.88.0.31:52094 - R:/10.88.0.31:5555]' for 'POST' to '/session/20135a43e0062e63b125ccdf0a75ab23/execute/sync'

9. Node: the frozen request starts running and gets a channel, then gets stuck again.

13:39:23.087 DEBUG [NettyConnectListener.writeRequest] - Using new Channel '[id: 0x3d99f31f, L:/127.0.0.1:39510 - R:localhost/127.0.0.1:60380]' for 'POST' to '/session/20135a43e0062e63b125ccdf0a75ab23/execute/sync'

10. Node: The new retry request gets frozen after the "injecting" log.

13:39:23.089 DEBUG [HttpTracing.inject] - Injecting (POST) /session/20135a43e0062e63b125ccdf0a75ab23/execute/sync into OpenTelemetrySpan{traceId=a780c6e2ef7b2b647a0c64e1a3a0a9da,spanId=656ae3b1f9818e50} at org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler:84

11. Five minutes after the call in step (9) the node gets a timeout exception for the first frozen request.
Build info: version: '4.2.0', revision: '86eb611648'
System info: host: '7cc6f0360350', ip: 'Unknown', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-372.9.1.el8.x86_64', java.version: '11.0.15'
Driver info: driver.version: unknown
org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException
Build info: version: '4.2.0', revision: '86eb611648'
System info: host: '7cc6f0360350', ip: 'Unknown', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-372.9.1.el8.x86_64', java.version: '11.0.15'
Driver info: driver.version: unknown
	at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:65)
	at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
	at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
	at org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:49)
	at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
	at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
	at org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:98)
	at org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)
	at org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:92)
	at org.openqa.selenium.grid.node.ProtocolConvertingSession.execute(ProtocolConvertingSession.java:75)
	at org.openqa.selenium.grid.node.local.SessionSlot.execute(SessionSlot.java:125)
	at org.openqa.selenium.grid.node.local.LocalNode.executeWebDriverCommand(LocalNode.java:447)
	at org.openqa.selenium.grid.node.ForwardWebDriverCommand.execute(ForwardWebDriverCommand.java:35)
	at org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)
	at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
	at org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler.execute(SpanWrappedHttpHandler.java:86)
	at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
	at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
	at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
	at org.openqa.selenium.grid.node.Node.execute(Node.java:240)
	at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
	at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
	at org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)
	at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
	at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
	at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
	at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
	at org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.util.concurrent.TimeoutException
	at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
	at org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)
	at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:59)
	... 32 more
13:44:23.082 DEBUG [SeleniumSpanExporter$1.lambda$export$4] - SpanData{spanContext=ImmutableSpanContext{traceId=a780c6e2ef7b2b647a0c64e1a3a0a9da, spanId=31fc6d6d7e2fbf62, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, parentSpanContext=ImmutableSpanContext{traceId=a780c6e2ef7b2b647a0c64e1a3a0a9da, spanId=6ee9490e9202f7ef, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=true, valid=true}, resource=Resource{schemaUrl=https://opentelemetry.io/schemas/1.9.0, attributes={service.name="unknown_service:java", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.14.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=default, version=null, schemaUrl=null}, name=node.forward_command, kind=INTERNAL, startEpochNanos=1655904983059769000, endEpochNanos=1655905463082764947, attributes=AttributesMap{data={http.method=POST, random.key=e65233dd-8b7d-4030-89f9-c26c7e17877e, span.kind=server, http.target=/session/20135a43e0062e63b125ccdf0a75ab23/execute/sync, error=true}, capacity=128, totalAddedValues=5}, totalAttributeCount=5, events=[ImmutableEventData{name=exception, attributes={exception.message="Unable to execute request: java.util.concurrent.TimeoutException
Build info: version: '4.2.0', revision: '86eb611648'
System info: host: '7cc6f0360350', ip: 'Unknown', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-372.9.1.el8.x86_64', java.version: '11.0.15'
Driver info: driver.version: unknown", exception.stacktrace="org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException
Build info: version: '4.2.0', revision: '86eb611648'
System info: host: '7cc6f0360350', ip: 'Unknown', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-372.9.1.el8.x86_64', java.version: '11.0.15'
Driver info: driver.version: unknown
	at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:65)
	at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
	at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
	at org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:49)
	at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
	at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
	at org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:98)
	at org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)
	at org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:92)
	at org.openqa.selenium.grid.node.ProtocolConvertingSession.execute(ProtocolConvertingSession.java:75)
	at org.openqa.selenium.grid.node.local.SessionSlot.execute(SessionSlot.java:125)
	at org.openqa.selenium.grid.node.local.LocalNode.executeWebDriverCommand(LocalNode.java:447)
	at org.openqa.selenium.grid.node.ForwardWebDriverCommand.execute(ForwardWebDriverCommand.java:35)
	at org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)
	at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
	at org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler.execute(SpanWrappedHttpHandler.java:86)
	at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
	at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
	at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
	at org.openqa.selenium.grid.node.Node.execute(Node.java:240)
	at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
	at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
	at org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)
	at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
	at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
	at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
	at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
	at org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.util.concurrent.TimeoutException
	at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
	at org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)
	at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:59)
	... 32 more
", exception.type="org.openqa.selenium.TimeoutException", http.flavor=1, http.handler_class="org.openqa.selenium.remote.http.Route$PredicatedRoute", http.host="hub:4444", http.method="POST", http.request_content_length="99", http.scheme="HTTP", http.target="/session/20135a43e0062e63b125ccdf0a75ab23/execute/sync", http.user_agent="selenium/4.0.0 (.net linux)"}, epochNanos=1655905463082460855, totalAttributeCount=11}], totalRecordedEvents=1, links=[], totalRecordedLinks=0, status=ImmutableStatusData{statusCode=ERROR, description=Kind: UNKNOWN Description:}, hasEnded=true}
13:44:23.082 WARN [SeleniumSpanExporter$1.lambda$export$1] - Unable to execute request: java.util.concurrent.TimeoutException
Build info: version: '4.2.0', revision: '86eb611648'
System info: host: '7cc6f0360350', ip: 'Unknown', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-372.9.1.el8.x86_64', java.version: '11.0.15'
Driver info: driver.version: unknown
13:44:23.082 WARN [SeleniumSpanExporter$1.lambda$export$1] - org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException
Build info: version: '4.2.0', revision: '86eb611648'
System info: host: '7cc6f0360350', ip: 'Unknown', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-372.9.1.el8.x86_64', java.version: '11.0.15'
Driver info: driver.version: unknown
	at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:65)
	at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
	at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
	at org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:49)
	at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
	at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
	at org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:98)
	at org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)
	at org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:92)
	at org.openqa.selenium.grid.node.ProtocolConvertingSession.execute(ProtocolConvertingSession.java:75)
	at org.openqa.selenium.grid.node.local.SessionSlot.execute(SessionSlot.java:125)
	at org.openqa.selenium.grid.node.local.LocalNode.executeWebDriverCommand(LocalNode.java:447)
	at org.openqa.selenium.grid.node.ForwardWebDriverCommand.execute(ForwardWebDriverCommand.java:35)
	at org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)
	at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
	at org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler.execute(SpanWrappedHttpHandler.java:86)
	at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
	at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
	at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
	at org.openqa.selenium.grid.node.Node.execute(Node.java:240)
	at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)
	at org.openqa.selenium.remote.http.Route.execute(Route.java:68)
	at org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)
	at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
	at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
	at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
	at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)
	at org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.util.concurrent.TimeoutException
	at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
	at org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)
	at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:59)
	... 32 more

13:44:23.083 WARN [SeleniumSpanExporter$1.lambda$export$3] - {"traceId": "a780c6e2ef7b2b647a0c64e1a3a0a9da","eventTime": 1655905463082460855,"eventName": "exception","attributes": {"exception.message": "Unable to execute request: java.util.concurrent.TimeoutException\nBuild info: version: '4.2.0', revision: '86eb611648'\nSystem info: host: '7cc6f0360350', ip: 'Unknown', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-372.9.1.el8.x86_64', java.version: '11.0.15'\nDriver info: driver.version: unknown","exception.stacktrace": "org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException\nBuild info: version: '4.2.0', revision: '86eb611648'\nSystem info: host: '7cc6f0360350', ip: 'Unknown', os.name: 'Linux', os.arch: 'amd64', os.version: '4.18.0-372.9.1.el8.x86_64', java.version: '11.0.15'\nDriver info: driver.version: unknown\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:65)\n\tat org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)\n\tat org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:49)\n\tat org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)\n\tat org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)\n\tat org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:98)\n\tat org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)\n\tat org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:92)\n\tat org.openqa.selenium.grid.node.ProtocolConvertingSession.execute(ProtocolConvertingSession.java:75)\n\tat org.openqa.selenium.grid.node.local.SessionSlot.execute(SessionSlot.java:125)\n\tat org.openqa.selenium.grid.node.local.LocalNode.executeWebDriverCommand(LocalNode.java:447)\n\tat org.openqa.selenium.grid.node.ForwardWebDriverCommand.execute(ForwardWebDriverCommand.java:35)\n\tat org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler.execute(SpanWrappedHttpHandler.java:86)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.grid.node.Node.execute(Node.java:240)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)\n\tat java.base\u002fjava.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base\u002fjava.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base\u002fjava.lang.Thread.run(Thread.java:829)\nCaused by: java.util.concurrent.TimeoutException\n\tat java.base\u002fjava.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)\n\tat java.base\u002fjava.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)\n\tat org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:59)\n\t... 32 more\n","exception.type": "org.openqa.selenium.TimeoutException","http.flavor": 1,"http.handler_class": "org.openqa.selenium.remote.http.Route$PredicatedRoute","http.host": "hub:4444","http.method": "POST","http.request_content_length": "99","http.scheme": "HTTP","http.target": "\u002fsession\u002f20135a43e0062e63b125ccdf0a75ab23\u002fexecute\u002fsync","http.user_agent": "selenium\u002f4.0.0 (.net linux)"}}

13:44:23.157 DEBUG [TimeoutTimerTask.expire] - Request timeout to localhost/127.0.0.1:60380 after 300000 ms for NettyResponseFuture{currentRetry=0,
	isDone=0,
	isCancelled=0,
	asyncHandler=org.asynchttpclient.AsyncCompletionHandlerBase@144965c7,
	nettyRequest=org.asynchttpclient.netty.request.NettyRequest@4595be47,
	future=java.util.concurrent.CompletableFuture@43b022a9[Not completed],
	uri=http://localhost:60380/session/20135a43e0062e63b125ccdf0a75ab23/execute/sync,
	keepAlive=true,
	redirectCount=0,
	timeoutsHolder=org.asynchttpclient.netty.timeout.TimeoutsHolder@4f3f3e4a,
	inAuth=0,
	touch=1655905163087} after 300079 ms
13:44:23.157 DEBUG [ChannelManager.closeChannel] - Closing Channel [id: 0x3d99f31f, L:/127.0.0.1:39510 - R:localhost/127.0.0.1:60380] 
13:44:23.158 DEBUG [NettyRequestSender.abort] - Aborting Future NettyResponseFuture{currentRetry=0,
	isDone=0,
	isCancelled=0,
	asyncHandler=org.asynchttpclient.AsyncCompletionHandlerBase@144965c7,
	nettyRequest=org.asynchttpclient.netty.request.NettyRequest@4595be47,
	future=java.util.concurrent.CompletableFuture@43b022a9[Not completed],
	uri=http://localhost:60380/session/20135a43e0062e63b125ccdf0a75ab23/execute/sync,
	keepAlive=true,
	redirectCount=0,
	timeoutsHolder=org.asynchttpclient.netty.timeout.TimeoutsHolder@4f3f3e4a,
	inAuth=0,
	touch=1655905163087}

13:44:23.158 DEBUG [NettyRequestSender.abort] - Request timeout to localhost/127.0.0.1:60380 after 300000 ms
java.util.concurrent.TimeoutException: Request timeout to localhost/127.0.0.1:60380 after 300000 ms
	at org.asynchttpclient.netty.timeout.TimeoutTimerTask.expire(TimeoutTimerTask.java:43)
	at org.asynchttpclient.netty.timeout.RequestTimeoutTimerTask.run(RequestTimeoutTimerTask.java:50)
	at io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715)
	at io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34)
	at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703)
	at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790)
	at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:829)
13:44:23.158 DEBUG [AsyncCompletionHandler.onThrowable] - Request timeout to localhost/127.0.0.1:60380 after 300000 ms
java.util.concurrent.TimeoutException: Request timeout to localhost/127.0.0.1:60380 after 300000 ms
	at org.asynchttpclient.netty.timeout.TimeoutTimerTask.expire(TimeoutTimerTask.java:43)
	at org.asynchttpclient.netty.timeout.RequestTimeoutTimerTask.run(RequestTimeoutTimerTask.java:50)
	at io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715)
	at io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34)
	at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703)
	at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790)
	at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:829)
  1. And it continues like this. The hub retrying every 3 minutes, the request getting stuck in the node after "injecting" log and being released when a new retry request is made. The requests that are started in the node get a timeout exceptions after 5 minutes. Meanwhile other requests for other sessions are running in parallel.

  2. After the hub has retried 5 times, the client starts making other calls to the hub with the same session. After that everything seems to work again.

How can we reproduce the issue?

-

Relevant log output

Logs in description above.

Operating System

Docker

Selenium version

C# 4.2.0

What are the browser(s) and version(s) where you see this issue?

selenium/node-chrome:4.2.0-20220527 but also earlier versions of Selenium Grid 4 image

What are the browser driver(s) and version(s) where you see this issue?

selenium/node-chrome:4.2.0-20220527 but also earlier versions of Selenium Grid 4 image

Are you using Selenium Grid?

4.2.0 (but issue also occurred with earlier versions of Selenium Grid 4)

@github-actions
Copy link

github-actions bot commented Aug 9, 2022

@eete22, thank you for creating this issue. We will troubleshoot it as soon as we can.


Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

@diemol
Copy link
Member

diemol commented Aug 16, 2022

This looks similar to #9528, which we are still investigating.

@diemol
Copy link
Member

diemol commented Oct 17, 2022

Can you please try the new Java 11 HTTP client and let us know? https://www.selenium.dev/blog/2022/using-java11-httpclient/

@eete22
Copy link
Author

eete22 commented Oct 27, 2022

We updated to version 4.5.2 of the docker image which looks like it is using the Java 11 HTTP client by default.
From what we can see when running our tests with this image, the issue is resolved.
Great work!

@eete22 eete22 closed this as completed Oct 27, 2022
@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked and limited conversation to collaborators Nov 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants