-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test.Saga hangs test runner on Linux under certain conditions #135
Comments
@jenspettersson Have you tried running on Linux, but not inside a container? |
@bording Nope, I don't have access to a Linux instance atm. Maybe I should have been more clear about that in the issue. |
However, it happens on our build server as well and that's a Linux server building docker containers. But still, only built/run within a container. |
@jenspettersson It would be good to know if it was specific to being inside a container. Do you have WSL set up on your Windows 10 machine? That would be an easy way to test it, and would provide another useful data point. |
@bording No, I don't have WSL on my Windows 10 machine, it's my clients machine and I don't have permission to setup everything. I did just test this on my personal MacBook with .NET Core 2.2.104 SDK installed (no docker) and the same thing happened. |
@jenspettersson I cloned your repo and have run the tests using
The tests are running properly and not hanging for all of those. It would seem that there must be something more going on because I am not able to repro the problem you're describing. |
@bording Strange, but thanks for investigating. I'm currently testing this on Azure Dev Ops with a build pipeline that runs on "Hosted Ubuntu 1604" and it's also hangs. I made that Azure DevOps project public and it's running build no. 71 as I'm writing this and it has been running for 4 minutes and is still not done: https://dev.azure.com/jenspettersson/nsb-saga-test/_build/results?buildId=71&_a=summary |
I canceled the build after it's been trying to run the tests for 10 minutes:
|
Tested the same pipeline (build 72) but with build agent "Hosted VS2017" and it also hanged... |
Last test for tonight: Created a Ubuntu 18.04 VM on Azure and cloned the repo and ran Changed the
and ran |
And yet I've tried all the same things, and it works fine. I think we're going to need more to go on to be able to figure it out. |
I can confirm that this hangs on a Ubuntu 18.04 VM that @jenspettersson have created on Azure. @bording would you be able to take a look if I send you the credentials? |
@jenspettersson I can see the tests hanging on the VM you created, but they don't hang on mine. That means the problem has to be more than just "run on Linux". Perhaps it's related to the type/number of CPUs on the machine? Also, I see that you updated your repo to include an NUnit-based test project. However, that isn't yet a fair comparison because you need to add an assembly-level See https://github.com/Particular/NServiceBus/blob/develop/src/NServiceBus.AcceptanceTests/AssemblyInfo.cs for an example of what you need to add. Once NUnit is also running tests in parallel, I'll be interested to see what the behavior is on your VM. |
Yes, perhaps the problem might be more than just "run on Linux", that was just our initial thought after spending hours finally tracking down the specific tests that caused our whole test suite to never complete inside our docker containers. I added the The VM I set up had only 1 vcpu (size: B1s in azure) but even as I changed it to two (B2s) it still behaved the same way (that is: xunit tests hangs, nunit tests works). |
I was able to take a memory dump of the hanging test process, and after playing around with My current theory is that
I think we need to review all the places |
What are our options? Provide a Task returning overload and deprecate the
existing one?
cc @danielmarbach
…On Thu, 21 Feb 2019 at 00:58, Brandon Ording ***@***.***> wrote:
I was able to take a memory dump of the hanging test process, and after
playing around with lldb for a while, it looks like most of the threads
are sitting inside of the Task.Wait code. One thread was clearly doing
that from inside the NServiceBus.Testing.
My current theory is that
https://github.com/Particular/NServiceBus.Testing/blob/df2943a0f048b5796480f974b5f2146472f4ee9d/src/NServiceBus.Testing/Saga.cs#L296
is the culprit.
I think we need to review all the places GetAwaiter().GetResult() is
being called to synchronously wait for a Task to complete.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#135 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAHoZCnv76a1w3bCpvWhS9m6SH4JXNdoks5vPeEbgaJpZM4bDg8z>
.
|
Talking with @danielmarbach we don't see a quick fix for this. But we think that we've found a better workaround, add this before all your tests.
That seemed to work when we changed it on the VM, can you verify? Hopefully XUnit have some way to put that code in a single place. |
If xUnit has a |
It is there primarily for async void. You can restore it after the test in a finally block |
Looks like xUnit has two that it uses: The first one is for |
As long as jens is aware that the tests using it break out from concurrency behavior and he doesn't apply it globally it should be fine fiven that the saga tests are independent |
When we introduce an async fluent style we would have to release either suffixed methods or release a new major. Though already today we mention the AAA style should be preferred over the fluent style. See https://docs.particular.net/nservicebus/testing/fluent And https://docs.particular.net/nservicebus/testing/#testing-a-saga |
Perhaps we should be more aggressive in moving to that being the only style, especially since the fluent style has these deadlock bombs hidden in it? |
Agreed
…On Thu, 21 Feb 2019 at 15:42, Brandon Ording ***@***.***> wrote:
Though already today we mention the AAA style should be preferred over the
fluent style.
Perhaps we should be more aggressive is moving to that being the only
style, especially since the fluent style has these deadlock bombs hidden in
it?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#135 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAHoZCfe4OJMTu0c_k_xE5rv7-JsrAo7ks5vPrBygaJpZM4bDg8z>
.
|
I have absolutely no problems moving away from the fluent style testing. This particular service only have a few sagas with just a couple of tests, so no problems for us to rewrite those tests. We haven't been using the fluent style testing for our regular handlers so it will be more consistent for us to change our saga tests. Thanks for the information! |
We are just in the final stages of an upgrade from NSB5 to NSB7. We have a suite of existing Fluent style tests which are suffering from the deadlock issue. I've read this xunit issue which confirms the problem as well as the position of the project in terms of doing anything to help (which is somewhat understandable but the tone is interesting). I can also confirm the suggested workarounds both work for me - disabling parallelization or removing the thread limit (however I'm not sure of the implications of that?). The new AAA style is great, however converting our existing suites to that would be a massive undertaking and I can put up with slower execution. However I do feel like it would be feasible to convert I guess in summary what I'm saying is
|
@danielmarbach absolutely! Re the discussion on the PR - I would be fine with a major bump and replacement of sync entirely. |
Closing this as the PR has been merged |
@simonfox see https://discuss.particular.net/t/nservicebus-testing-7-2-minor-release-available/1374 Working on to doco as well |
@danielmarbach perfect! Thanks so much. |
Not 100% sure this is caused by
Test.Saga
but we only experience this when running tests (in a Linux container) that contains Saga-tests that does some async operation.Reproduced in https://github.com/jenspettersson/NSBSagaTest
Symptom
When running
dotnet test
inside a Linux container (in this repomicrosoft/dotnet:2.2.104-sdk
on Docker For Windows with Linux containers) the test executor hangs if you test a Saga from at least two different tests in parallel and the saga does some sort of async operation.This is not the case when running tests with
dotnet test
on a Windows 10 computer or directly from Visual Studio.How to reproduce
Clone the repo and and run
docker build -t nsb-lab .
in the root. This will try to run all the tests while running onmicrosoft/dotnet:2.2.104-sdk
. The result should be:and it hangs "forever" on the last line.
I've included a
xunit.runner.json
file with bothparallelizeAssembly
andparallelizeTestCollections
set to true (I think this is the default for the runner on Linux) and if you change it to:and run the docker command again, the tests should pass.
Instead of changing the
xunit.runner.json
file, you can removeAnother_Test_for_SagaOne
andAnother_Test_for_SagaTwo
and the test runner won't hang.The text was updated successfully, but these errors were encountered: