[🐛 Bug]: chart - autoscaling too many browser nodes #2160

ofirdassa9 · 2024-03-04T20:26:21Z

What happened?

I run a simple selenium test that gets a remote driver from the hub and goes to facebook.com and then to google.com
As long as the test is live (doesn't matter if it's sleeping, or actually doing something), more and more chrome-nodes are being deployed (I tried Firefox and Edge as well, I get the same result), until there are 8 which is the default limit.
I use Keda that is install with the chart and not an existing one.
This happens also in my EKS and my docker-desktop clusters
I used port-forward to reach out the hub service from my browser
the python script of the test:

import time
from selenium import webdriver

# URL for the remote Chrome WebDriver
remote_url = "http://automation:automation@localhost:4444/wd/hub"  # Replace this with the actual URL of your remote WebDriver

# Setting up the Chrome options
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")

# Getting the remote WebDriver
driver = webdriver.Remote(remote_url, options=chrome_options)

# Navigating to Facebook
driver.get("https://www.facebook.com")

# Printing the title of the page
print("Title of the page:", driver.title)

driver.get("https://www.google.com")

# Printing the title of the page
print("Title of the page:", driver.title)
# Closing the WebDriver
time.sleep(60)
driver.quit()

my values.yaml:

basicAuth:
  username: "automation"
  password: "automation"

autoscaling:
  enabled: true

Command used to start Selenium Grid with Docker (or Kubernetes)

helm install selenium-grid -n selenium-grid docker-selenium/selenium-grid -f values.yaml --create-namespace

Relevant log output

no relevant output logs

Operating System

EKS, Docker desktop

Docker Selenium version (image tag)

4.18.1-20240224

Selenium Grid chart version (chart version)

0.28.3

The text was updated successfully, but these errors were encountered:

github-actions · 2024-03-04T20:26:34Z

@ofirdassa9, thank you for creating this issue. We will troubleshoot it as soon as we can.

Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

VietND96 · 2024-03-06T07:14:51Z

@ofirdassa9, can you read through #2133
For the right fix, we want help investigation and fixing the Scaler at the upstream KEDA project - https://github.com/kedacore/keda/blob/main/pkg/scalers/selenium_grid_scaler.go

ofirdassa9 · 2024-03-06T20:37:51Z

@VietND96 It looks like when setting

autoscaling:
  scaledJobOptions:
    scalingStrategy:
      strategy: default

it behaves as expected. thank you!

shouldn't this be the default value for the helm chart?

andrii-rymar · 2024-03-08T16:27:59Z

@ofirdassa9 in my case the default strategy doesn't work well enough. For some reason scaler doesn't create an expected amount of jobs when they are requested. I can see some overscaling with the accurate one sometimes but at least new session requests do not stay in the queue for no reason.

VietND96 · 2024-03-12T09:30:13Z

@andrii-rymar, you describe about kind of this issue, right? Something like, with default strategy, given that Queue has 6 requests coming, 6 Node pods will be up. There are 6 Node pods are up and running, however only 5 sessions are able to create and remaining 1 request stay in queue until it failed with reason selenium.common.exceptions.SessionNotCreatedException: Message: Could not start a new session. Could not start a new session. Unable to create new session

miguel-cardoso-mindera · 2024-09-10T08:34:48Z

I'm having the same issue, regardless of default or accurate strategy.

For example, I have autoscaling enabled in the helm chart, min 0 max 300. And requesting 20 tests in parallel results in absurd scale up:

VietND96 · 2024-09-11T23:15:44Z

When I look at the scaler implementation - https://github.com/kedacore/keda/blob/main/pkg/scalers/selenium_grid_scaler.go
The graphQL query is { grid { maxSession, nodeCount }, sessionsInfo { sessionQueueRequests, sessions { id, capabilities, nodeId } } }.
It relies on queue size and current total sessions. I suspect it leads to a case something like there are 48 active sessions (Chrome & Firefox) and 50 queues (including Chrome, Firefox, Edge). Node Edge could not scale up any new until it has at least 1 session in list of active sessions. Or something like it stops scaling up when queues ~= active sessions.
I will try to prove that suspicion and give the fix if possible.

miguel-cardoso-mindera · 2024-09-16T11:15:05Z

Appreciate it @VietND96

In my case above, we are ONLY running chrome, no other browsers

VietND96 · 2024-09-20T04:03:08Z

You can follow https://github.com/SeleniumHQ/docker-selenium/tree/trunk/.keda
Replace KEDA component image tag and try out to see how it works

miguel-cardoso-mindera · 2024-09-20T14:25:54Z

Thanks @VietND96 , looks like the scaling is working correctly, we are not experiencing a disproportionate amount of scaling up

github-actions · 2024-10-21T00:23:12Z

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

ofirdassa9 added the needs-triaging label Mar 4, 2024

VietND96 added I-autoscaling-k8s Issue relates to autoscaling in Kubernetes, or the scaler in KEDA and removed needs-triaging labels Mar 21, 2024

This was referenced Sep 17, 2024

Refactor Selenium Grid scaler kedacore/keda#6169

Merged

Experimental: Selenium Grid scaler in K8s implementation preview #2400

Merged

VietND96 closed this as completed in #2400 Sep 19, 2024

github-actions bot locked and limited conversation to collaborators Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[🐛 Bug]: chart - autoscaling too many browser nodes #2160

[🐛 Bug]: chart - autoscaling too many browser nodes #2160

ofirdassa9 commented Mar 4, 2024

github-actions bot commented Mar 4, 2024

VietND96 commented Mar 6, 2024

ofirdassa9 commented Mar 6, 2024

andrii-rymar commented Mar 8, 2024

VietND96 commented Mar 12, 2024

miguel-cardoso-mindera commented Sep 10, 2024

VietND96 commented Sep 11, 2024

miguel-cardoso-mindera commented Sep 16, 2024

VietND96 commented Sep 20, 2024

miguel-cardoso-mindera commented Sep 20, 2024

github-actions bot commented Oct 21, 2024

[🐛 Bug]: chart - autoscaling too many browser nodes #2160

[🐛 Bug]: chart - autoscaling too many browser nodes #2160

Comments

ofirdassa9 commented Mar 4, 2024

What happened?

Command used to start Selenium Grid with Docker (or Kubernetes)

Relevant log output

Operating System

Docker Selenium version (image tag)

Selenium Grid chart version (chart version)

github-actions bot commented Mar 4, 2024

VietND96 commented Mar 6, 2024

ofirdassa9 commented Mar 6, 2024

andrii-rymar commented Mar 8, 2024

VietND96 commented Mar 12, 2024

miguel-cardoso-mindera commented Sep 10, 2024

VietND96 commented Sep 11, 2024

miguel-cardoso-mindera commented Sep 16, 2024

VietND96 commented Sep 20, 2024

miguel-cardoso-mindera commented Sep 20, 2024

github-actions bot commented Oct 21, 2024