Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Karma tests only run half the time in Docker environment #3359

Open
EricZLou opened this issue Sep 3, 2019 · 15 comments
Open

Karma tests only run half the time in Docker environment #3359

EricZLou opened this issue Sep 3, 2019 · 15 comments

Comments

@EricZLou
Copy link

EricZLou commented Sep 3, 2019

Are there any known race conditions that happen when Karma is used with Docker? The test runs correctly sometimes. Example: I run karma start and the tests don't run correctly (shown below), but when I run it a second time, it might run correctly (also shown below).

Expected behaviour

The tests run consistently every time the karma server is started with ./node_modules/karma/bin/karma start ./karma.conf.ts. I am in a Docker environment.

Actual behaviour

Sometimes, the tests run to completion correctly. Other times, the tests do not run at all.

When the tests run correctly:
image

When the tests don't run correctly:
image

Environment Details

  • Karma version (output of karma --version):
    4.2.0
  • Relevant part of your karma.config.js file
    image
@johnjbarton
Copy link
Contributor

There isn't anything that karma-runner can do here. The browser has been loaded and now it is in control of the action. karma is just waiting to hear back; when it doesn't then it kills the process.

You can try setting the ping timeout to a larger value:
15ae12f
Requires v4.3

@EricZLou
Copy link
Author

EricZLou commented Sep 4, 2019

Got it. Thanks a lot for your help once again @johnjbarton

@EricZLou
Copy link
Author

Hi @johnjbarton
I have upgraded karma to v4.3 and set pingTimeout:10000 in karma.conf.ts:

image

But it still seems to be using 2000ms as the ping timeout instead of 10000ms.

image

Is there anywhere else I would need to set the ping timeout?

@EricZLou
Copy link
Author

Also just made an observation that Karma disconnects in the middle of Fetching/Requesting. Could this be the cause of the problem? Is this also related to setting a higher ping timeout?

@johnjbarton
Copy link
Contributor

Please run with --logLevel=debug and verify that the final config shows the ping timeout value you expect.

Also just made an observation that Karma disconnects in the middle of Fetching/Requesting. Could this be the cause of the problem? Is this also related to setting a higher ping timeout?

If one of the files fetched crashes the browser or spins the CPU without yielding to the event-loop, then the browser won't respond to pings. No setting for timeout will fix this: the race in you code needs to be fixed.

Note that CI machines are often much less powerful than desktop machines so tests may pass locally and fail in CI under this race condition.

You can check to see if it happens in the same file each time. Add logging to top-level of files to see progress.

In general any test that fails with timeout should be suspect first: the failing timeout is a symptom not a cause.

@EricZLou
Copy link
Author

I'm running the command node .\node_modules\karma\bin\karma start .\core\tests\karma.conf.ts --logLevel=debug with v4.3 and the final config still doesn't seem to be printing out. The output starts with a very large json, then "Loading plugin ... ", then "Processing ... ", etc.

When I Ctrl+f for "timeout", I don't get any relevant results. When should the final config be printed?

Also, Karma disconnects 30 seconds after the Karma server starts, not at any specific file.

@johnjbarton
Copy link
Contributor

The output starts with a very large json,

I guess that is the config. Search for the string Final config.

I think the most likely explanation for your issue is a race in your test or app code. karma is doing all it can do: it's putting your app and test code in the browser with the test framework. Then it's waiting for test results. No one ever shows up. I guess that is because the test or app code is burning CPU, not giving a single event loop turn to socket.io or the test framework.

These cases are hard to solve. I suggest you start by commenting out as much of your test as possible, even all of it, just a single console.log() call. If that still fails, cut out your app code 1/2 at a time. And so on to isolate the race.

@EricZLou
Copy link
Author

Ah yes. The output was too large so it was getting cropped. I see the final config now. The pingTimeout is indeed set to what the config has. I notice that the default value for the pingTimeout (when I don't specify it in the config) is 5000. However, the error screenshots above say 2000ms (which is neither 5000 nor the 10000 I specify in the config).

And thanks for your recommendations! I'll try that out.

@semanio
Copy link

semanio commented May 20, 2020

I am posting a comment because I'm having a similar issue with Karma inside Docker where my junit-reporter wasn't creating an XML file. I have adjusted the pingTimeout in Karma v5.0.8 and I'm still getting (with a larger test suite (6700 tests)) into a state of when the browser (socket) is having to reconnect before it begins executing/running the tests. When I debug with Karma, I see the following (key) log messages print out just before my test runner starts to execute:

  • Disconnected during run, waiting 600000ms for reconnecting.
  • Lost socket connection, but browser continued to execute. Reconnected

Why that is key

When looking in browser.js inside the reconnect function; that second message is near this line: this.setState(EXECUTING). The only other place that state is set in the file is in the onStart function (which does not get called in this situation). Therefore, I tested adding this line: this.emitter.emit('browser_start', this) to the reconnect function right below this.setState(EXECUTING); I'm now getting the expected results in my junit-reporter. Would this be an acceptable fix? If so, I make a Pull Request. I have made this commit on my fork:
semanio@f587b69

Extra context

I have timed it between registering the first socket to these messages and there is about a 34/35 seconds delay (within Docker). Per the first message, the reconnect honors the pingTimeout config; however, it seems like the initial connection is not. As I am writing this, I have yet to track down why this is the case.

semanio added a commit to semanio/karma that referenced this issue May 20, 2020
…nection to mimic the emit that occurs within onStart when the same state is set (karma-runner#3359)
@johnjbarton
Copy link
Contributor

Unfortunately the reporters are not designed to be resilient to reconnect. They may assume two browsers exist if the browser_start is sent twice. So we can't send browser_start twice.

I guess we need to introduce another state, CONFIGURING_DISCONNECTED to ensure that EXECUTING_DISCONNECTED always means that we did enter EXECUTING before disconnect. In your scenario, your disconnect should be during CONFIGURING_DISCONNECTED and when you get to reconnect you can verify that state and emit browser_start to enter EXECUTING.

Does that make sense?

@semanio
Copy link

semanio commented May 20, 2020

@johnjbarton - make sense. I am definitely in CONFIGURING_DISCONNECTED state with this situation. I'll see if I can work up that fix.

To be clear; in this case, it's not calling browser_start at all in this situation without my proposed fix. I am actively testing that out though against a small test suite to be sure. Regardless, what you are saying as the proper fix makes sense.

@yogeshkrishnani
Copy link

yogeshkrishnani commented Feb 3, 2021

We're facing the same issue as @semanio mentioned. If a browser gets disconnected during run, and reconnects on the same socket, junit-reporter isn't generating an XML file.

  • Disconnected during run, waiting 60000ms for reconnecting.
  • CONFIGURING -> EXECUTING_DISCONNECTED
  • A browser has connected on socket KsxBiNj3Kdzmv8ExAAAB
  • Lost socket connection, but browser continued to execute. Reconnected on socket KsxBiNj3Kdzmv8ExAAAB.
  • EXECUTING_DISCONNECTED -> EXECUTING

At the end, I see the following error:

ERROR: Step ‘Publish JUnit test result report’ failed: No test report files were found. Configuration error?

@semanio : Were you able to resolve the issue through any other ways? Because I see that the changes suggested by you haven't been merged.

@johnjbarton Can we please look into this scenario? I also believe browser_start event is not called once browser disconnects and reconnects, ideally it should.

@semanio
Copy link

semanio commented Feb 3, 2021

@yogeshkrishnani I got sidetracked at work and never worked on the alternative fix that @johnjbarton and I discussed in this reply; I'll see if I can circle back to it in the near future.

@johnjbarton
Copy link
Contributor

I wonder now if this issue is actually one we see elsewhere. Some analysis by a colleague came up with this explanation:

  • A test with a large chunk of JS to parse (or an intense cpu loop in a test perhaps) occupies the JS event loop
  • The socket.io ping fails to get CPU time, tripping the pingTimeout ("Disconnected during run")
  • The pingTimeout causes EXECUTING_DISCONNECTED.
  • The test runs and emits results.
  • The emit causes the client to reconnect "Reconnected on socket"
  • The socket.io upgrade logic fails to pick the results packet.
  • End of story. The client thinks it is done (COMPLETE) but the server thinks it is EXECUTING.
  • No progress until eventually we hit another timeout.

A workaround we have tried is to set the karma pingTimeout irrationally large, 90s.

I think the correct fix is to re-work our socket io code based on the latest implementation.

@johnjbarton
Copy link
Contributor

Note that karma 6x is built on socket.io v3 which has reversed ping/pong code:
socketio/engine.io-client@81d7171

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants