Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chromedriver frequently hangs when attempting to start a new session. #87

Closed
onlywade opened this issue Aug 13, 2015 · 101 comments · Fixed by #469 or KaTeX/KaTeX#1590
Closed

Chromedriver frequently hangs when attempting to start a new session. #87

onlywade opened this issue Aug 13, 2015 · 101 comments · Fixed by #469 or KaTeX/KaTeX#1590

Comments

@onlywade
Copy link

Using the standalone-chrome or node-chrome images, I often observe a timeout when attempting to launch a new Chrome session.

Steps to reproduce:

  1. launch a standalone chrome instance: docker run -d --name chrome selenium/standalone-chrome:2.47.1
  2. build the test container: docker build -t selenium/test:local ./Test
  3. run the repeat test script: ./test-repeat.sh chrome
  4. If necessary, repeat steps 1 and 3 until the problem is observed

Note:

In order to narrow the focus of the test to launching new sessions, you may want to temporarily modify Test/smoke-test.js to omit the part of the test that tries navigating to github.com after starting the session. After modifying the script, make sure to rebuild the selenium/test:local image to pick up the changes.

Expected results:

All 50 sessions launch and quit successfully - the test passes.

Actual results:

One of the session launch commands will hang indefinitely.

Docker host: boot2docker v1.7.0 (Tiny Core Linux)

@bmannix
Copy link

bmannix commented Aug 13, 2015

I can reproduce this issue. My first run was actually clean, but it hung on the 41st attempt during my second run.

@onlywade
Copy link
Author

Here's a bit more detail about what I see happening on the system when the launch hangs.

The last line from the selenium-server stdout just says that it's launching a new Chrome session:

➜ ~ docker logs 9251a | tail -n 4
12:26:19.769 INFO - Executing: [new session: Capabilities [{platform=ANY, javascriptEnabled=true, browserName=chrome, version=}]])
12:26:19.771 INFO - Creating a new session for Capabilities [{platform=ANY, javascriptEnabled=true, browserName=chrome, version=}]
Starting ChromeDriver 2.16.333243 (0bfa1d3575fc1044244f21ddb82bf870944ef961) on port 17315
Only local connections are allowed. 

Inside of the container, I see that some Chrome processes are in fact running:

root@9251a325813b:/# ps auxww
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
seluser      1  0.1  0.1  17968  2796 ?        Ss   12:25   0:00 /bin/bash /opt/bin/entry_point.sh
seluser      5  0.0  0.0   4448  1640 ?        S    12:25   0:00 /bin/sh /usr/bin/xvfb-run --server-args=:99.0 -screen 0 1360x1020x24 -ac +extension RANDR java -jar /opt/selenium/selenium-server-standalone.jar
seluser     16  0.6  1.2 207472 26296 ?        Sl   12:25   0:00 Xvfb :99 :99.0 -screen 0 1360x1020x24 -ac +extension RANDR -nolisten tcp -auth /tmp/xvfb-run.qPOeDR/Xauthority
seluser     27 17.2  5.6 3042680 116716 ?      Sl   12:25   0:09 java -jar /opt/selenium/selenium-server-standalone.jar
root       241  0.1  0.1  18144  3376 ?        Ss   12:26   0:00 bash
seluser    811  0.3  0.5 381852 11012 ?        Sl   12:26   0:00 /opt/selenium/chromedriver-2.16 --port=17315
seluser    816  0.8  3.8 555444 79020 ?        Sl   12:26   0:00 /opt/google/chrome/chrome --no-sandbox --disable-background-networking --disable-client-side-phishing-detection --disable-component-update --disable-default-apps --disable-hang-monitor --disable-prompt-on-repost --disable-sync --disable-web-resources --enable-logging --ignore-certificate-errors --load-extension=/tmp/.com.google.Chrome.HFdqNc/internal --log-level=0 --metrics-recording-only --no-first-run --password-store=basic --remote-debugging-port=12545 --safebrowsing-disable-auto-update --safebrowsing-disable-download-protection --test-type=webdriver --use-mock-keychain --user-data-dir=/tmp/.com.google.Chrome.XIexPl  data:,       
seluser    824  0.0  0.0   4368   660 ?        S    12:26   0:00 cat
seluser    825  0.0  0.0   4368   656 ?        S    12:26   0:00 cat
seluser    827  0.0  1.9 347672 40388 ?        S    12:26   0:00 /opt/google/chrome/chrome --type=zygote --enable-logging --log-level=0 --no-sandbox --user-data-dir=/tmp/.com.google.Chrome.XIexPl
seluser    828  0.0  0.3  87176  6432 ?        S    12:26   0:00 /opt/google/chrome/nacl_helper --no-sandbox
seluser    845  0.1  2.2 441212 46220 ?        Sl   12:26   0:00 /opt/google/chrome/chrome --type=gpu-process --channel=816.0.1041773465 --enable-logging --log-level=0 --no-sandbox --user-data-dir=/tmp/.com.google.Chrome.XIexPl --v8-natives-passed-by-fd --v8-snapshot-passed-by-fd --supports-dual-gpus=false --gpu-driver-bug-workarounds=2,45,57 --disable-accelerated-video-decode --gpu-vendor-id=0x0000 --gpu-device-id=0x0000 --gpu-driver-vendor --gpu-driver-version --user-data-dir=/tmp/.com.google.Chrome.XIexPl --v8-natives-passed-by-fd --v8-snapshot-passed-by-fd --enable-logging --log-level=0
seluser    874  0.0  0.0      0     0 ?        Z    12:26   0:00 [chrome] <defunct>
seluser    879  3.7  0.7 555444 15400 ?        S    12:26   0:01 /opt/google/chrome/chrome --no-sandbox --disable-background-networking --disable-client-side-phishing-detection --disable-component-update --disable-default-apps --disable-hang-monitor --disable-prompt-on-repost --disable-sync --disable-web-resources --enable-logging --ignore-certificate-errors --load-extension=/tmp/.com.google.Chrome.HFdqNc/internal --log-level=0 --metrics-recording-only --no-first-run --password-store=basic --remote-debugging-port=12545 --safebrowsing-disable-auto-update --safebrowsing-disable-download-protection --test-type=webdriver --use-mock-keychain --user-data-dir=/tmp/.com.google.Chrome.XIexPl  data:, 

I am a bit suspicious of the defunct proc there, but it doesn't seem to consistently appear along with the problem, so I'm not sure that it's related to the hang at all.

For what it's worth, I don't see anything out of the ordinary in the Chrome debug log:

root@9251a325813b:/# cat /tmp/.com.google.Chrome.XIexPl/chrome_debug.log 
[816:816:0814/122620:ERROR:browser_main_loop.cc(185)] Running without the SUID sandbox! See https://code.google.com/p/chromium/wiki/LinuxSUIDSandboxDevelopment for more information on developing with the sandbox on.
[816:816:0814/122620:INFO:audio_manager_pulse.cc(258)] Failed to connect to the context.  Error: Connection refused
[845:845:0814/122620:ERROR:sandbox_linux.cc(345)] InitializeSandbox() called with multiple threads in process gpu-process
[816:816:0814/122620:WARNING:password_store_factory.cc(346)] Using basic (unencrypted) store for password storage. See http://code.google.com/p/chromium/wiki/LinuxPasswordStorage for more information about password storage options.
[874:874:0814/122620:ERROR:renderer_main.cc(200)] Running without renderer sandbox
[874:874:0814/122635:INFO:child_thread_impl.cc(666)] ChildThreadImpl::EnsureConnected()

(Those errors appear even when the session launch succeeds.)

As a sort of control I've executed the same test (repeatedly launch/quit Chrome sessions) against an Ubuntu 14.04 VM that is set up just like the Docker images -- the same Chrome and chromedriver versions, using xvfb with the same screen geometry, etc... The session launches are successful there 100% of the time, as far as I can tell, so the problem does appear to be specific to running chromedriver within a container.

@yotamshapira
Copy link

Happens to me on an Ubuntu 14.04 VM (not in a container):
chrome_debug.log:
[2854:2854:0825/123047:ERROR:nss_util.cc(97)] Failed to create /.pki/nssdb directory. [2891:2891:0825/123047:ERROR:sandbox_linux.cc(345)] InitializeSandbox() called with multiple threads in process gpu-process [2854:2854:0825/123047:WARNING:password_store_factory.cc(346)] Using basic (unencrypted) store for password storage. See http://code.google.com/p/chromium/wiki/LinuxPasswordStorage for more information about password storage options. [1:1:0825/123102:INFO:child_thread_impl.cc(666)] ChildThreadImpl::EnsureConnected()

Chrome process list:
selenium 1347 1 0 12:04 ? 00:00:11 /usr/bin/java -jar /usr/local/selenium/server/selenium-server-standalone.jar -role node -nodeConfig /usr/local/selenium/config/selenium_node.json -Dwebdriver.chrome.driver=/usr/local/selenium/drivers/chromedriver/chromedriver selenium 2848 1347 0 12:30 ? 00:00:00 /usr/local/selenium/drivers/chromedriver_linux64-2.18/chromedriver --port=22923 selenium 2854 2848 0 12:30 ? 00:00:00 /opt/google/chrome/chrome --disable-background-networking --disable-client-side-phishing-detection --disable-component-update --disable-default-apps --disable-hang-monitor --disable-popup-blocking --disable-prompt-on-repost --disable-sync --disable-web-resources --enable-logging --ignore-certificate-errors --load-extension=/tmp/.com.google.Chrome.oy5xLY/internal --log-level=0 --metrics-recording-only --no-first-run --password-store=basic --remote-debugging-port=12573 --safebrowsing-disable-auto-update --safebrowsing-disable-download-protection --test-type=webdriver --use-mock-keychain --user-data-dir=/tmp/.com.google.Chrome.nwXnbU data:, selenium 2869 2854 0 12:30 ? 00:00:00 /opt/google/chrome/chrome --type=zygote --enable-logging --log-level=0 --user-data-dir=/tmp/.com.google.Chrome.nwXnbU selenium 2870 2869 0 12:30 ? 00:00:00 /opt/google/chrome/nacl_helper selenium 2873 2869 0 12:30 ? 00:00:00 /opt/google/chrome/chrome --type=zygote --enable-logging --log-level=0 --user-data-dir=/tmp/.com.google.Chrome.nwXnbU selenium 2891 2854 0 12:30 ? 00:00:00 /opt/google/chrome/chrome --type=gpu-process --channel=2854.0.384655797 --enable-logging --log-level=0 --user-data-dir=/tmp/.com.google.Chrome.nwXnbU --v8-natives-passed-by-fd --v8-snapshot-passed-by-fd --supports-dual-gpus=false --gpu-driver-bug-workarounds=2,45,57 --disable-accelerated-video-decode --gpu-vendor-id=0x15ad --gpu-device-id=0x0405 --gpu-driver-vendor --gpu-driver-version --user-data-dir=/tmp/.com.google.Chrome.nwXnbU --v8-natives-passed-by-fd --v8-snapshot-passed-by-fd --enable-logging --log-level=0 selenium 2916 2873 0 12:30 ? 00:00:00 [chrome] <defunct> selenium 2923 2854 0 12:30 ? 00:00:05 /opt/google/chrome/chrome --disable-background-networking --disable-client-side-phishing-detection --disable-component-update --disable-default-apps --disable-hang-monitor --disable-popup-blocking --disable-prompt-on-repost --disable-sync --disable-web-resources --enable-logging --ignore-certificate-errors --load-extension=/tmp/.com.google.Chrome.oy5xLY/internal --log-level=0 --metrics-recording-only --no-first-run --password-store=basic --remote-debugging-port=12573 --safebrowsing-disable-auto-update --safebrowsing-disable-download-protection --test-type=webdriver --use-mock-keychain --user-data-dir=/tmp/.com.google.Chrome.nwXnbU data:,

@ArchmageInc
Copy link

I have also experienced this and trying to discover the cause has not been very successful. If I switch our tests to run using firefox, I do not experience this issue. When using the selenium-node-chrome-debug, I can see the browser open and running tests. During a hang, however, the browser is only open on the task bar and not able to be interacted with.

@fbarbat
Copy link

fbarbat commented Nov 5, 2015

I am also having this issue. This will prevent me to use docker-selenium. I need it to be stable so I can have continuous integration and monitoring over it. And I can't use firefox since I am using some chromeOptions.

@charford
Copy link
Contributor

Seeing this issue as well.

@sebs-code
Copy link

Checking in to report I am also seeing this issue. Google search led me here...

@n8whnp
Copy link

n8whnp commented Dec 22, 2015

I am having these issues as well, I found that decreasing the memory available to docker reduced how frequently this happens.

@peterbollen
Copy link

Seeing this as well

vito added a commit to vmware-archive/atc that referenced this issue Jan 16, 2016
chromedriver has a bug that causes it to hang and then everything fails:

  SeleniumHQ/docker-selenium#87

detect if phantomjs is installed and use it, since that's less painful
for local dev workflows.

also remove the js Dockerfile; just use 'node' since the JS tests are
pretty straightforward
@Compufreak345
Copy link

Any progress here? I had this issues on rare occasions in previous releases, but after updating it occures so often that I can't run my tests anymore, no matter how often I try.

@mbrock
Copy link

mbrock commented Jan 22, 2016

I am having similar issues that I haven't investigated closely yet.

Is there any reason to believe that the container should be running an init process? Lack of proper zombie killing is one reason containers can behave strangely, right?

I think I'll try to build a Chrome WebDriver container that uses Yelp's dumb_init (http://engineeringblog.yelp.com/2016/01/dumb-init-an-init-for-docker.html) and see if that helps. Otherwise I'll have to make a timeout on a higher level to force delete and recreate the containers.

@pwaller
Copy link

pwaller commented Feb 17, 2016

This has definitely gotten worse in the last month or two. I can't run our tests at all now.

@borick
Copy link

borick commented Feb 17, 2016

Have you guys seen the instructions in https://github.com/SeleniumHQ/docker-selenium#running-the-images?
it says to make sure you mount /dev/shm on the container. It has already improved robustness for me greatly.

@pwaller
Copy link

pwaller commented Feb 17, 2016

Yes, I have /dev/shm bind mounted.

@sterago
Copy link

sterago commented Feb 18, 2016

My team has been seeing these issues as well and we're not using docker, just a regular grid setup based on Ubuntu 14.04, so perhaps this is not specifically related to docker?
To reproduce, we wrote a simple selenium script that repeatedly opens Chrome, visits a generic web page and quits the browser. After about 50 iterations the script hangs trying to start a new Chrome session. This only happens using a RemoteWebDriver, though. Using a local Chrome instance we haven't seen it hanging. HTH

@Vanuan
Copy link

Vanuan commented Feb 22, 2016

@sterago

wrote a simple selenium script that repeatedly opens Chrome

Is it with or without selenium grid? Probably simple node would work?

@sterago
Copy link

sterago commented Feb 22, 2016

@Vanuan
The script fails when using a browser handle obtained by contacting a selenium hub. Hub and node are running on the same machine.

@Vanuan
Copy link

Vanuan commented Feb 22, 2016

So it's anywhere in this chain

hub -> node -> chromedriver -> chrome
                                  |
hub <- node <- chromedriver <----- 

According to the log:

04:44:24.140 INFO - Creating a new session for Capabilities [{rotatable=false, nativeEvents=false, browserName=chrome, takesScreenshot=false, javascriptEnabled=false, version=, platform=ANY, cssSelectorsEnabled=false}]
Starting ChromeDriver 2.20.353124 (035391233162d32c80f1dce587c8154a13830c3b) on port 20575
Only local connections are allowed.

the hub reaches the node, but the node is unable to reach chromedriver because this message isn't printed:

04:44:24.450 INFO - Done: [new session: Capabilities [{rotatable=false, nativeEvents=false, browserName=chrome, takesScreenshot=false, javascriptEnabled=false, version=, platform=ANY, cssSelectorsEnabled=false}]]

@sterago
Copy link

sterago commented Feb 22, 2016

@Vanuan Yes, and considering that when using a local ChromeDriver instance on that same machine it never hung during our tests, one could assume that the chromedriver <-> chrome part of the chain can be excluded as, at least in isolation, it seems not to trigger the issue. Perhaps there is some kind of deadlock happening during the communication between node and chromedriver? The result of some quick tests we did strace'ing all the parts involved seemed to suggest something along those lines, but I wouldn't bet on it.

@Vanuan
Copy link

Vanuan commented Feb 22, 2016

I've just reproduced it by directly connecting to the node (without a hub).

@sterago
Copy link

sterago commented Feb 22, 2016

@Vanuan Is that using Docker? What's the OS? Our test script is using the Python selenium bindings, yours as well?

@Vanuan
Copy link

Vanuan commented Feb 22, 2016

No, mine is using ruby bindings. Yes. It's a docker node. Here's a run command:

docker run -d \
  -p 5555:5555 \
  -e HUB_PORT_4444_TCP_ADDR=${HUB_HOST} \
  -e HUB_PORT_4444_TCP_PORT=4444 \
  -e REMOTE_HOST=${CURRENT_HOST}:5555 \
  -v /dev/shm:/dev/shm \
  --name=chrome \
  selenium/node-chrome:2.52.0

And I reproduce the timeout by connecting directly to ${CURRENT_HOST}:5555

It happens less frequently though. Client, hub and node are all separate machines.

@Vanuan
Copy link

Vanuan commented Feb 22, 2016

This does look strange:

07:44:44.661 INFO - Creating a new session for Capabilities [{rotatable=false, nativeEvents=false, browserName=chrome, takesScreenshot=false, javascriptEnabled=false, version=, platform=ANY, cssSelectorsEnabled=false}]
Starting ChromeDriver 2.20.353124 (035346203162d32c80f1dce587c8154a1efa0c3b) on port 1317
Only local connections are allowed.
...
08:37:00.129 INFO - Command failed to close cleanly. Destroying forcefully (v2). [/opt/selenium/chromedriver-2.20, --port=20521][ {}]
08:37:01.137 ERROR - Unable to kill process with PID 1563
08:37:01.138 WARN - Exception thrown
...
Caused by: org.openqa.selenium.WebDriverException: java.lang.reflect.InvocationTargetException
Driver info: driver.version: unknown
    at org.openqa.selenium.remote.server.DefaultDriverProvider.callConstructor(DefaultDriverProvider.java:113)
...
java.util.concurrent.ExecutionException: org.openqa.selenium.WebDriverException: java.lang.reflect.InvocationTargetException
Build info: version: '2.52.0', ...
...

Driver info: driver.version: ChromeDriver
    at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:665)
    at org.openqa.selenium.remote.RemoteWebDriver.startSession(RemoteWebDriver.java:249)
    at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:131)
    at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:144)
    at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:170)
    at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:138)
    ... 14 more
Caused by: org.openqa.selenium.WebDriverException: java.net.SocketTimeoutException: Read timed out
...
Driver info: driver.version: ChromeDriver
    at org.openqa.selenium.remote.service.DriverCommandExecutor.execute(DriverCommandExecutor.java:91)
...
Caused by: java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
...
    at org.openqa.selenium.remote.internal.ApacheHttpClient.execute(ApacheHttpClient.java:90)
    at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:142)
    at org.openqa.selenium.remote.service.DriverCommandExecutor.execute(DriverCommandExecutor.java:82)

chromedriver is running there though:

docker exec chrome ps aux|grep chromedriver
seluser   3931  0.0  0.1 391412  9292 ?        Sl   07:44   0:00 /opt/selenium/chromedriver-2.20 --port=1317
seluser   4689  0.0  0.0 391412  7432 ?        Sl   08:23   0:00 /opt/selenium/chromedriver-2.20 --port=30084

And it's listening:

netstat -tl
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 localhost:12698         *:*                     LISTEN     
tcp        0      0 localhost:30084         *:*                     LISTEN     
tcp        0      0 localhost:1317          *:*                     LISTEN     
tcp        0      0 localhost:12106         *:*                     LISTEN     
tcp        0      0 localhost:12050         *:*                     LISTEN     
tcp        0      0 *:5555                  *:*                     LISTEN    

@Vanuan
Copy link

Vanuan commented Feb 23, 2016

Reproduced again. Port used by chromedriver: 1081

Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
# Selenium-standalone.jar talking to chromedriver:
tcp        0      0 localhost:1081          localhost:41303         ESTABLISHED 14332/chromedriver-
tcp        0      0 localhost:41303         localhost:1081          ESTABLISHED 21/java         

# Chrome talking to chromedriver
tcp        0      0 localhost:12544         localhost:38406         ESTABLISHED 14337/internal --lo
tcp        0      0 localhost:38406         localhost:12544         ESTABLISHED 14332/chromedriver-

# Selenium-standalone.jar talking to selenium hub
tcp        0      0 b2962a5d73da:49516      $SELENIUM_HUB:4444 ESTABLISHED 21/java         
tcp        1      0 b2962a5d73da:49506      $SELENIUM_HUB:4444 CLOSE_WAIT  21/java     

# Selenium-standalone.jar talking to itself?
tcp        0      0 b2962a5d73da:5555       $CONTAINER_IP:33263      ESTABLISHED 21/java  
# Client (tests) talking to Selenium-standalone.jar server    
tcp        0      0 b2962a5d73da:5555       $DRIVER_CLIENT:60783      ESTABLISHED 21/java         




Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name

# chromedriver
tcp        0      0 localhost:1081          *:*                     LISTEN      14332/chromedriver-
# chrome extension that communicates with chromedriver
tcp        0      0 localhost:12544         *:*                     LISTEN      14337/internal --lo
# selenium node that communicates with chromedriver and exposes a port
tcp        0      0 *:5555                  *:*                     LISTEN      21/java         

How can I check where the connection times out? What's the code is responsible for writing Done: [new session:?

@jjYBdx4IL
Copy link

For me, this NEVER happens if I start chrome from inside an X11 session. No matter if it runs against Xvfb or not, or whether Xvfb has been started on the console. The problem appears to be X11 resource usage related to a live/real X11 user desktop session. I traced it down to DBUS. Setting DBUS_SESSION_BUS_ADDRESS=/some/nonsense in Jenkins fixed my testing there and chrome/chromium are starting up there again without any problems at all.........

@sterago
Copy link

sterago commented Feb 23, 2016

@jjYBdx4IL how exactly did you configure the DBUS_SESSION_BUS_ADDRESS environment variable? Is it configured for the environment where the selenium node process runs or somewhere else?

@elgalu
Copy link
Member

elgalu commented Feb 23, 2016

You you guys try installing dbus-x11 and see if that helps?

scheib pushed a commit to scheib/chromium that referenced this issue May 8, 2017
…//codereview.chromium.org/2861163002/ )

Reason for revert:
Speculative revert -- the TSan bots have been reporting a data race when setting Envvars (in this case, appending to the python path to start a websocket server). The race appeared immediately after this patch landed, so it may be legitimate. Reverting this to see if it clears the failures up; if so, we'll probably just have to serialize the calls to setenv.

Filed crbug.com/719633 for this as well.

Original issue's description:
> Linux: Disable DBus auto-launch
>
> This is a workaround (ETA ~ 2-3 years) for libdbus not being multi-threading
> friendly and causing random hangs when running chrome outside of Linux
> desktop environments.
>
> Background:
> -----------
> Typically, Linux desktop environments set the DBUS_SESSION_BUS_ADDRESS
> environment variable. This variable allows the dbus client library to
> directly connect to the existing bus, which is started by the desktop
> environment or systemd.
> When this variable is missing, the dbus client library will fallback
> to auto-launch mode [1], which causes 4 nested fork() + exec() calls.
> Doing this has two problems: (i) slows down startup; (ii) can hang
> the browser if the fork() happens while another thread is in a malloc()
> (Chrome's tcmalloc has no at-fork handlers).
> This situation (no env variable) is very common in test scenarios
> (browsertests, chromedriver, etc).
>
> Change introduced by this CL:
> -----------------------------
> This CL sets the bus address env variable to "disabled:" if not set.
> This effectively shuts down the dbus auto-launch. If necessary, this
> behavior can be restored by setting, before launching chrome,
> DBUS_SESSION_BUS_ADDRESS="autolaunch:" .
> This workaround will be necessary until libdbus and gspawn are fixed
> to be multi-threading friendly [2,3] and that fix rolls into the
> various distributions.
> The change is introduced in the main embedder rather than in the
> google-chrome wrapper, as several binaries can be affected by this,
> for instance:
> - browser tests (http://crbug.com/693668)
> - chrome --headless
> - webdriver/selenium which seem to directly invoke "chrome"
>    see SeleniumHQ/docker-selenium#87
>
> [1] https://dbus.freedesktop.org/doc/dbus-launch.1.html
> [2] https://bugs.freedesktop.org/show_bug.cgi?id=100843
> [3] https://bugs.chromium.org/p/chromedriver/issues/detail?id=1699
>
> BUG=715658,695643,713947
> TEST=strace -ff -o trace chrome; grep dbus-launch trace*
>
> Review-Url: https://codereview.chromium.org/2861163002
> Cr-Commit-Position: refs/heads/master@{#469987}
> Committed: https://chromium.googlesource.com/chromium/src/+/8511820ec8280caacbd4f81f3ecd13b6c61681b0

[email protected],[email protected],[email protected],[email protected],[email protected]
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=715658,695643,713947

Review-Url: https://codereview.chromium.org/2869843003
Cr-Commit-Position: refs/heads/master@{#470059}
scheib pushed a commit to scheib/chromium that referenced this issue May 10, 2017
…//codereview.chromium.org/2869843003/ )

Reason for reland:
Adding TSan suppression. The race is independent of this CL, see crbug.com/719633

Original issue's description:
> Revert of Linux: Disable DBus auto-launch (patchset #1 id:1 of https://codereview.chromium.org/2861163002/ )
>
> Reason for revert:
> Speculative revert -- the TSan bots have been reporting a data race when setting Envvars (in this case, appending to the python path to start a websocket server). The race appeared immediately after this patch landed, so it may be legitimate. Reverting this to see if it clears the failures up; if so, we'll probably just have to serialize the calls to setenv.
>
> Filed crbug.com/719633 for this as well.
>
> Original issue's description:
> > Linux: Disable DBus auto-launch
> >
> > This is a workaround (ETA ~ 2-3 years) for libdbus not being multi-threading
> > friendly and causing random hangs when running chrome outside of Linux
> > desktop environments.
> >
> > Background:
> > -----------
> > Typically, Linux desktop environments set the DBUS_SESSION_BUS_ADDRESS
> > environment variable. This variable allows the dbus client library to
> > directly connect to the existing bus, which is started by the desktop
> > environment or systemd.
> > When this variable is missing, the dbus client library will fallback
> > to auto-launch mode [1], which causes 4 nested fork() + exec() calls.
> > Doing this has two problems: (i) slows down startup; (ii) can hang
> > the browser if the fork() happens while another thread is in a malloc()
> > (Chrome's tcmalloc has no at-fork handlers).
> > This situation (no env variable) is very common in test scenarios
> > (browsertests, chromedriver, etc).
> >
> > Change introduced by this CL:
> > -----------------------------
> > This CL sets the bus address env variable to "disabled:" if not set.
> > This effectively shuts down the dbus auto-launch. If necessary, this
> > behavior can be restored by setting, before launching chrome,
> > DBUS_SESSION_BUS_ADDRESS="autolaunch:" .
> > This workaround will be necessary until libdbus and gspawn are fixed
> > to be multi-threading friendly [2,3] and that fix rolls into the
> > various distributions.
> > The change is introduced in the main embedder rather than in the
> > google-chrome wrapper, as several binaries can be affected by this,
> > for instance:
> > - browser tests (http://crbug.com/693668)
> > - chrome --headless
> > - webdriver/selenium which seem to directly invoke "chrome"
> >    see SeleniumHQ/docker-selenium#87
> >
> > [1] https://dbus.freedesktop.org/doc/dbus-launch.1.html
> > [2] https://bugs.freedesktop.org/show_bug.cgi?id=100843
> > [3] https://bugs.chromium.org/p/chromedriver/issues/detail?id=1699
> >
> > BUG=715658,695643,713947
> > TEST=strace -ff -o trace chrome; grep dbus-launch trace*
> >
> > Review-Url: https://codereview.chromium.org/2861163002
> > Cr-Commit-Position: refs/heads/master@{#469987}
> > Committed: https://chromium.googlesource.com/chromium/src/+/8511820ec8280caacbd4f81f3ecd13b6c61681b0

> Review-Url: https://codereview.chromium.org/2869843003
> Cr-Commit-Position: refs/heads/master@{#470059}
> Committed: https://chromium.googlesource.com/chromium/src/+/1e78cb7863da28bb3411286cdbcc4fb4510ce173

BUG=715658,695643,713947,719633
[email protected],[email protected],[email protected]

Review-Url: https://codereview.chromium.org/2865283002
Cr-Commit-Position: refs/heads/master@{#470301}
@ktkopone
Copy link

ktkopone commented Jun 16, 2017

Also experiencing this issue in Windows 10 (no container) when using Beta Chrome release 60.0.3112.32, but only when running chrome in the new --headless mode (which chromedriver doesn't ostensibly support yet, admittedly, but for my very simple test case it seems fine when I do get chromedriver launched).

Launching chromedriver from python with --headless and --disable-gpu nets me the intermittent error on around 2/30 tries , with or without --no-sandbox (which I've seen suggested a lot from googling).

As an aside, adding --remote-debugging-port=9222 breaks it since chromedriver generates its own port to try and access devtools on, but that's on me for using chromedriver for headless chrome before it's supported, and not relevant to my use case nor probably this issue.

With --headless and --disable-gpu, it simply silently fails to connect as others have observed:

[2.039][INFO]: Launching chrome: "C:\Program Files (x86)\Google\Chrome\Application\chrome.exe" --
disable-background-networking --disable-client-side-phishing-detection --disable-default-apps --
disable-gpu --disable-hang-monitor --disable-infobars --disable-notifications --disable-popup-blocking 
--disable-prompt-on-repost --disable-setuid-sandbox --disable-sync --disable-web-resources 
--enable-automation --enable-logging --force-fieldtrials=SiteIsolationExtensions/Control 
--headless --ignore-certificate-errors --load-component-extension="C:\Users\kkoponen\AppData\Local\Temp\scoped_dir29680_29077\internal" 
--log-level=0 --metrics-recording-only --no-first-run --no-sandbox --password-store=basic 
--remote-debugging-port=12215 --safebrowsing-disable-auto-update --test-type=webdriver 
--use-mock-keychain --user-data-dir="C:\Users\kkoponen\AppData\Local\Temp\scoped_dir29680_13552" data:,
[2.050][DEBUG]: DevTools request: http://localhost:12215/json/version
[4.060][DEBUG]: DevTools request failed
[4.111][DEBUG]: DevTools request: http://localhost:12215/json/version
[4.312][DEBUG]: DevTools request failed
[4.363][DEBUG]: DevTools request: http://localhost:12215/json/version
[6.118][DEBUG]: DevTools request failed
[6.168][DEBUG]: DevTools request: http://localhost:12215/json/version
[6.368][DEBUG]: DevTools request failed

As I understand it that linked patch isn't relevant to windows users.

Anyone have any ideas about preventing this in windows?

@tholewebgods
Copy link

This seems to be a duplicate of #89 which apparently has a fix.

Though this seems to be related to docker I'd like to share my experience with a local Selenium 3.4.0 standalone server without docker.

The symptoms were the same: Selenium was stuck when creating a new session.

It turns out the issue is no longer happening using Chrome 59.

See my MCVE with minimal client code, setup descriptions and observations: https://github.com/tholewebgods/selenium-new-session-freeze-mcve

breilly2 added a commit to broadinstitute/firecloud-ui that referenced this issue Jul 6, 2017
Don't hide the test failure if there is a problem capturing a screenshot.
Avoid stalled headless chrome nodes (SeleniumHQ/docker-selenium#87).
ahaessly added a commit to broadinstitute/firecloud-ui that referenced this issue Jul 7, 2017
* merge in new auth token and some qa users
* update tests to use new users from config
* get current tests to pass with new qa users
* change dev to qa for thurloe url
* Save screenshot on test failure.
* udpate ctmpls
* put / in front of chrome path
* add pem ctmpl and fix template render stuff
* documentation for automation
* fix failure screenshot rendering
* Added clean-up to registration test and migrated it to the new test users.
Added passing of FireCloud-Id to Thurloe calls.
Fixed a bug with xpath of checking for an element with text.
* don't use ivy cache
* add dsde-toolbox pull to runtests script
* Added Thurloe service
* changing local orch api
* use different auth domain
* get default auth domain from vault
* add host name to test runner
* fixed loading of particpants.txt for DataTabSpec
* fail whole script if tests fail
* Improved reliability of test for creating a billing project.
Don't hide the test failure if there is a problem capturing a screenshot.
Avoid stalled headless chrome nodes (SeleniumHQ/docker-selenium#87).
* Re-throw error when logging.
Be a little more quiet about clean-up failures.
* Avoid instability with Google sign-in when the popup window automatically closes.
@m2bright
Copy link

All,
I may have some further insight into this behavior. I found that, when we were automating selenium to verify a google-login page, after the redirection occurred, we did a driver.get(...) to the page it was redirected to. This caused the driver to throw a TimeOutException waiting for a get to occur, when in reality, it never performed the get. So moral of the story? Don't do that!

loomchild added a commit to loomchild/vue-es that referenced this issue Sep 10, 2017
abby-sergz pushed a commit to adblockplus/chromium-src-build that referenced this issue Sep 14, 2017
…//codereview.chromium.org/2869843003/ )

Reason for reland:
Adding TSan suppression. The race is independent of this CL, see crbug.com/719633

Original issue's description:
> Revert of Linux: Disable DBus auto-launch (patchset #1 id:1 of https://codereview.chromium.org/2861163002/ )
>
> Reason for revert:
> Speculative revert -- the TSan bots have been reporting a data race when setting Envvars (in this case, appending to the python path to start a websocket server). The race appeared immediately after this patch landed, so it may be legitimate. Reverting this to see if it clears the failures up; if so, we'll probably just have to serialize the calls to setenv.
>
> Filed crbug.com/719633 for this as well.
>
> Original issue's description:
> > Linux: Disable DBus auto-launch
> >
> > This is a workaround (ETA ~ 2-3 years) for libdbus not being multi-threading
> > friendly and causing random hangs when running chrome outside of Linux
> > desktop environments.
> >
> > Background:
> > -----------
> > Typically, Linux desktop environments set the DBUS_SESSION_BUS_ADDRESS
> > environment variable. This variable allows the dbus client library to
> > directly connect to the existing bus, which is started by the desktop
> > environment or systemd.
> > When this variable is missing, the dbus client library will fallback
> > to auto-launch mode [1], which causes 4 nested fork() + exec() calls.
> > Doing this has two problems: (i) slows down startup; (ii) can hang
> > the browser if the fork() happens while another thread is in a malloc()
> > (Chrome's tcmalloc has no at-fork handlers).
> > This situation (no env variable) is very common in test scenarios
> > (browsertests, chromedriver, etc).
> >
> > Change introduced by this CL:
> > -----------------------------
> > This CL sets the bus address env variable to "disabled:" if not set.
> > This effectively shuts down the dbus auto-launch. If necessary, this
> > behavior can be restored by setting, before launching chrome,
> > DBUS_SESSION_BUS_ADDRESS="autolaunch:" .
> > This workaround will be necessary until libdbus and gspawn are fixed
> > to be multi-threading friendly [2,3] and that fix rolls into the
> > various distributions.
> > The change is introduced in the main embedder rather than in the
> > google-chrome wrapper, as several binaries can be affected by this,
> > for instance:
> > - browser tests (http://crbug.com/693668)
> > - chrome --headless
> > - webdriver/selenium which seem to directly invoke "chrome"
> >    see SeleniumHQ/docker-selenium#87
> >
> > [1] https://dbus.freedesktop.org/doc/dbus-launch.1.html
> > [2] https://bugs.freedesktop.org/show_bug.cgi?id=100843
> > [3] https://bugs.chromium.org/p/chromedriver/issues/detail?id=1699
> >
> > BUG=715658,695643,713947
> > TEST=strace -ff -o trace chrome; grep dbus-launch trace*
> >
> > Review-Url: https://codereview.chromium.org/2861163002
> > Cr-Commit-Position: refs/heads/master@{#469987}
> > Committed: https://chromium.googlesource.com/chromium/src/+/8511820ec8280caacbd4f81f3ecd13b6c61681b0

> Review-Url: https://codereview.chromium.org/2869843003
> Cr-Commit-Position: refs/heads/master@{#470059}
> Committed: https://chromium.googlesource.com/chromium/src/+/1e78cb7863da28bb3411286cdbcc4fb4510ce173

BUG=715658,695643,713947,719633
[email protected],[email protected],[email protected]

Review-Url: https://codereview.chromium.org/2865283002
Cr-Original-Commit-Position: refs/heads/master@{#470301}
Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src
Cr-Mirrored-Commit: 2fc330d0b93d4bfd7bd04b9fdd3102e529901f91
@blablaBen
Copy link

Adding DBUS_SESSION_BUS_ADDRESS=/dev/null as an environment variable in docker-compose totally fixed the problem! Now, I feel that it is running faster than it was!

xtremerui pushed a commit to vmware-archive/web that referenced this issue Jan 11, 2018
chromedriver has a bug that causes it to hang and then everything fails:

  SeleniumHQ/docker-selenium#87

detect if phantomjs is installed and use it, since that's less painful
for local dev workflows.

also remove the js Dockerfile; just use 'node' since the JS tests are
pretty straightforward
@hutber
Copy link

hutber commented Feb 8, 2018

Now my question is, how can I set this variable in circleCI :(

@leviable
Copy link

leviable commented Feb 8, 2018

@hutber Go to your project settings in Circle CI, click the link on the left for "Environment Variables", then add the key/value pair.

soulgalore added a commit to sitespeedio/browsertime that referenced this issue Feb 11, 2018
soulgalore added a commit to sitespeedio/browsertime that referenced this issue Feb 11, 2018
@ghost
Copy link

ghost commented Feb 16, 2018

I don't believe this issue is strictly related to Docker and I am getting different results based on the browser driver I use. Invoking NUnit console runner from AWS Run-Command (send-command) on a remote EC2. Selenium will fail to navigate to a url for the first one or two tests (inconsistently). My workaround: I made a separate TestFixture with the order attribute of 1. Fixture contains two tests. The test: driver.navigate().gotourl(app) and then assert(pass). Then all remaining tests run fine. Sometimes geckodriver still fails. Not using Selenium Grid currently, but it is necessary to implement it.

@PS1Online
Copy link

DBUS_SESSION_BUS_ADDRESS=/dev/null

In Excel vba (Windows) how it works?
Where should I enter this code and what is the syntax in vba?
SeleniumBasic WebDriver 2.0.9.0 (for vba) and latest Chrome Driver (2.37) (16/03/2018).
Windows 7 64-bit.

@m2bright
Copy link

I have a solution: I stopped using selenium and went a different way.

@PS1Online
Copy link

I hope that the developers have read and are looking for a solution to solve the problem, present on both Windows and Linux at all.

@diemol
Copy link
Member

diemol commented Mar 19, 2018

HI @PS1Online,

I am not sure how this could apply to your context. This env var was necessary many releases ago for the docker-selenium images. You seem to be running in a complete different environment (Excel VBA).

Perhaps the most simple way is for you to join https://seleniumhq.herokuapp.com, there are lots of people there that can potentially help you.

benthorner pushed a commit to alphagov/govuk-puppet that referenced this issue Jul 8, 2019
Recently we saw the smokey tests were hanging on production (AWS). On
investigating the issue, we found that chrome was failing to respond to
Chromedriver commands. The interaction between chromedriver and Chrome
is done in the context of a session [1]; we were able to get the active
session ID by doing an 'strace' of the chrome/driver processes.

curl -d '{"url":"https://www.google.com"}' http://localhost:9515/session/27f4262ab044392b05138540055a8fd6/url

This provided some clarity on the reason for the smokey tests hanging,
and lead to the following issue, which suggests the issue is related to
'dbus': SeleniumHQ/docker-selenium#87. While
this seems to be part of the main Chromium distribution [3], it's not
clear if this has made it into Chrome itself.

This trials implementing the suggested fix for the smokey process.

References
==========

[1] https://www.pawangaria.com/post/automation/browser-automation-from-command-line/
[2] https://chromium.googlesource.com/chromium/src/+/2fc330d0b93d4bfd7bd04b9fdd3102e529901f91%5E%21/
[3] https://chromium.googlesource.com/chromium/src/+/refs/heads/master/services/service_manager/embedder/main.cc#274
@lock lock bot locked and limited conversation to collaborators Aug 14, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.