Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[🐛 Bug]: browser created but never used (same as #11881) #12832

Closed
bhecquet opened this issue Sep 28, 2023 · 12 comments · Fixed by #12848
Closed

[🐛 Bug]: browser created but never used (same as #11881) #12832

bhecquet opened this issue Sep 28, 2023 · 12 comments · Fixed by #12848

Comments

@bhecquet
Copy link
Contributor

What happened?

Hello,

In april, I raised the issue #11881 about browser being created on selenium grid node but never used when request timeout is set.
This issue was corrected (I thought my correction was enough) but I still see the issue in an heavy loaded grid.
The point is that when requesting a new session, we do :

boolean sessionCreated = data.latch.await(requestTimeout.toMillis(), MILLISECONDS);

which waits for X secs (the session request timeout)

Then (this was the original bug), if the session is being created on node, we wait a bit more because session request is not in the queue anymore, but session may not be created when timeout expires

So, to correct it, I added an other wait of 5 seconds

sessionCreated = data.latch.await(5000, MILLISECONDS);

These 5 seconds were arbitrary choosen but adding some logs today, I've seen that driver creation takes more time

15:16:32.199 INFO [LocalDistributor.newSession] - 530153fd-a618-4418-8198-535a6c54117f: new session => start creating
15:16:38.766 INFO [LocalDistributor.newSession] - 530153fd-a618-4418-8198-535a6c54117f: new session => created

or even more

15:09:42.391 INFO [LocalDistributor.newSession] - 769d66eb-178a-4e45-8a8c-7f06c3f9177f: new session => start creating
15:09:51.892 INFO [LocalDistributor.newSession] - 769d66eb-178a-4e45-8a8c-7f06c3f9177f: new session => created

So 5 seconds is definitely not enough

A quick solution would be to increase the time on line (e.g: 15 seconds)

sessionCreated = data.latch.await(5000, MILLISECONDS);

But do you think there is an other way to correct the issue ? (fixed time waits are never the best solution 😕 )
Looking at the logs (added logs can be seen here: https://github.com/bhecquet/selenium/tree/driver_sessions_staled), I think that wait time should be increased, but also, we need to prevent distributor from taking the session request once first timeout expires

How can we reproduce the issue?

Set a grid with 1 hub, 1 node and try to create 4 browsers at the time
After some retries, a browser will be created without being used

Relevant log output

>>> Client node
  WARN  2023-09-28 15:39:17,389 [TestNG-tests-1] SeleniumGridDriverFactory: Error creating driver on hub

>>> Hub
15:38:42.358 INFO [LocalNewSessionQueue.addToQueue] - bd2be918-88a3-41ef-809a-5e0c1f66ef5f: start waiting session creation
15:38:42.867 INFO [LocalDistributor.newSession] - bd2be918-88a3-41ef-809a-5e0c1f66ef5f: new session
15:38:44.400 INFO [LocalDistributor.reserveSlot] - No slots found for request bd2be918-88a3-41ef-809a-5e0c1f66ef5f and capabilities Capabilities ...
15:38:44.400 INFO [LocalDistributor.newSession] - Unable to find a free slot for request bd2be918-88a3-41ef-809a-5e0c1f66ef5f. 
...
15:39:11.001 INFO [LocalDistributor.reserveSlot] - No slots found for request bd2be918-88a3-41ef-809a-5e0c1f66ef5f and capabilities Capabilities ...
15:39:11.001 INFO [LocalDistributor.newSession] - Unable to find a free slot for request bd2be918-88a3-41ef-809a-5e0c1f66ef5f. 
15:39:11.002 WARN [SeleniumSpanExporter$1.lambda$export$1] - Will retry session bd2be918-88a3-41ef-809a-5e0c1f66ef5f
15:39:11.516 INFO [LocalDistributor.newSession] - bd2be918-88a3-41ef-809a-5e0c1f66ef5f: new session
15:39:12.361 INFO [LocalNewSessionQueue.addToQueue] - bd2be918-88a3-41ef-809a-5e0c1f66ef5f: a bit more time
15:39:12.548 INFO [LocalDistributor.reserveSlot] - No slots found for request bd2be918-88a3-41ef-809a-5e0c1f66ef5f and capabilities Capabilities {a ....
15:39:12.548 INFO [LocalDistributor.newSession] - Unable to find a free slot for request bd2be918-88a3-41ef-809a-5e0c1f66ef5f. 
15:39:12.549 WARN [SeleniumSpanExporter$1.lambda$export$1] - Will retry session bd2be918-88a3-41ef-809a-5e0c1f66ef5f
15:39:12.551 INFO [LocalDistributor.newSession] - bd2be918-88a3-41ef-809a-5e0c1f66ef5f: new session
15:39:13.580 INFO [LocalDistributor.newSession] - bd2be918-88a3-41ef-809a-5e0c1f66ef5f: new session => start creating
15:39:17.362 INFO [LocalNewSessionQueue.addToQueue] - bd2be918-88a3-41ef-809a-5e0c1f66ef5f: session not created
15:39:20.706 INFO [LocalDistributor.newSession] - bd2be918-88a3-41ef-809a-5e0c1f66ef5f: new session => created



>>> Node
15:39:13.585 INFO [LocalNode.newSession] - new session => start creating
15:39:20.695 INFO [LocalNode.newSession] - Session created by the Node. Id: 183f4330244d3d4a34a68313e7434dd9, Caps: Capabilities {acceptInsecureCerts: true, browserName: chrome, browserVersion: 114.0.5735.199, chrome: {chromedriverVersion: 114.0.5735.90 (386bc09e8f4f..., userDataDir: C:\Users\xxx\AppData\Lo...}, goog:chromeOptions: {debuggerAddress: localhost:63241}, networkConnectionEnabled: false, pageLoadStrategy: normal, platformName: any, proxy: Proxy(autodetect), se:bidiEnabled: false, se:cdp: ws://host.comp.any..., se:cdpVersion: 114.0.5735.199, setWindowRect: true, sr:beta: false, sr:testName: login, sr:try: 0, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify, webauthn:extension:credBlob: true, webauthn:extension:largeBlob: true, webauthn:extension:minPinLength: true, webauthn:extension:prf: true, webauthn:virtualAuthenticators: true}

Operating System

Windows / Linux

Selenium version

4.13.0

What are the browser(s) and version(s) where you see this issue?

not related to browser

What are the browser driver(s) and version(s) where you see this issue?

not related to browser

Are you using Selenium Grid?

4.13.0

@github-actions
Copy link

@bhecquet, thank you for creating this issue. We will troubleshoot it as soon as we can.


Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

@joerg1985
Copy link
Member

@bhecquet could you explain why you are setting the session request timeout to 30secs?
These timeouts are more a safety net in my mind and should not be set to low.

In the rare case when the session is exactly created while the request is rejected the hub could tell the node it has already rejected the request, to allow the node to close this session instantly and not lock until the inactivity timeout is hit.

The past PR did just add some extra seconds to the session request timeout. Just like setting the timeout to 35sec, so the node had some seconds more to start the session?

@bhecquet
Copy link
Contributor Author

Hello @joerg1985 , we have 2 hubs, for redundancy, and we do not want our session request to be stuck on one hub which eventually cannot provide session in reasonnable time.

I think the case (with such low timeout) is not rare as a session creation can take 5 to 10 seconds.
I've begun working on a correction but your idea is interesting

@joerg1985
Copy link
Member

@bhecquet
Does your client code connects to hub A and if there is a timeout it fails back to hub B?
If this is the case there might be better options like calling the sessions endpoint for hub A and hub B, to pick the one which responsive & free? Checking the status could use a lower timeout to create the session faster, in cases when one hub is down. This could balance the load between the hubs when both are available.

I think there is currently no high availability support for this in selenium itself, but this might be a cool feature.
There could be also strategies implemented to get the session as fast as possible (e.g. creating a session request in all hubs and take the first one created).

The current issue with the unused browser should be fixed in any case.

@krmahadevan
Copy link
Contributor

@joerg1985

This could balance the load between the hubs when both are available.

I think the hub can be scaled up if the sessions are backed by a persistent data store. I remember even adding documentation around how to use a persistent data store for sessions here

This is essentially what the distributed grid was supposed to be solving. Maybe I am missing something additional here ?

@bhecquet
Copy link
Contributor Author

@joerg1985 , failover is done almost the way you describe it (I added random so that we do not always pick up the same hub)
It's a legacy behavior of my framework, but also, on each hub, we have nodes (one for each) with specific capabilities and tests using these nodes are quite long, so having sessions free is not enough is this case, we need the capabilities matching

@krmahadevan , we need redundancy has our grid is used for 24/24 tests, and our users don't want service to be unavailable if I need to do maintenance

I've corrected the bug today, the way you proposed, and I'll provide a PR soon

@diemol
Copy link
Member

diemol commented Oct 3, 2023

@joerg1985

This could balance the load between the hubs when both are available.

I think the hub can be scaled up if the sessions are backed by a persistent data store. I remember even adding documentation around how to use a persistent data store for sessions here

This is essentially what the distributed grid was supposed to be solving. Maybe I am missing something additional here ?

@krmahadevan the current issue is the Distributor. Needs to be backed by some sort of data storage so multiple Distributors can access the same Grid status.

@krmahadevan
Copy link
Contributor

@diemol - You mean that we would need to come up with some sort of a persistant mechanism for the following in the LocalDistributor ?

@diemol
Copy link
Member

diemol commented Oct 3, 2023

@diemol - You mean that we would need to come up with some sort of a persistant mechanism for the following in the LocalDistributor ?

The Grid Model

@krmahadevan
Copy link
Contributor

Ok. Let me dig in and come back with what I find.

@titusfortner titusfortner added this to the 4.14 milestone Oct 5, 2023
@pujagani
Copy link
Contributor

pujagani commented Oct 9, 2023

I think long ago we did have an issue to create a Redis-backed distributor. The rationale behind it was what Diego mentioned. Then we added the Grid Model layer. So that issue was no longer valid. Maybe how we allow users to have Redis backed SessionMap, similarly, the Distributor can point to Redis-backed Grid Model or JDBC backed Grid Model. We can probably start with that. Just sharing my 2 cents.

@titusfortner titusfortner modified the milestones: 4.14, 4.15 Oct 10, 2023
Copy link

github-actions bot commented Dec 3, 2023

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked and limited conversation to collaborators Dec 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
6 participants