Massive xunit test performance degradation in .NET core 2.1.700 and 2.2.300 #2775

JSkimming · 2019-05-22T15:46:57Z

TL;DR

Update: The issue has been found and is now tracked via PR microsoft/vstest#2024.

Update2: The issue has now been fixed with the .NET SDK releases 2.1.701 & 2.2.301, demonstrated by the following builds.

General

Operating System

OS: Linux - specifically containers
Versions: 2.1.700 and 2.2.300

I've found that the latest release .NET Core 2.1 & 2.2 have introduced a performance degradation when executing xunit tests via dotnet test.

I have found this with 3 different OSS projects I maintain, and across two different CI providers (CircleCI & Travis). The specific details are below.

FYI. I haven't raised this against xunit, because the only difference is the two releases of .NET Core.

The following are builds of identical source code running against 4 different releases of .NET core running in CircleCI using containers.

Please see the compare-dotnet-releases branch of https://github.com/JSkimming/abioc.

Built Against .NET Core 2.1.604 - 0m 42s
Built Against .NET Core 2.2.204 - 0m 48s
Built Against .NET Core 2.1.700 - 19m 13s
Built Against .NET Core 2.2.300 - 19m 3s

As you can see the test execution takes an order of magnitude longer.

Here is the same build in travis. - NOTE: The mac build is fine because it is pinned to an earlier version of .NET Core.

Here are two completely different projects, both exhibiting the same issue in CircleCI and Travis

https://circleci.com/gh/JSkimming/tesla-net/tree/master
https://travis-ci.org/JSkimming/tesla-net/builds/531160500
https://circleci.com/gh/JSkimming/Castle.Core.AsyncInterceptor/tree/master
https://travis-ci.org/JSkimming/Castle.Core.AsyncInterceptor/builds/530615608

Update

Latest builds with timings between steps

Built Against .NET Core 2.1.604 - 0m 42s
Built Against .NET Core 2.2.204 - 0m 44s
Built Against .NET Core 2.1.700 - 19m 15s
Built Against .NET Core 2.2.300 - 19m 5s

The text was updated successfully, but these errors were encountered:

karelz · 2019-05-22T15:56:24Z

Did you get a chance to collect traces and compare where the difference comes from?
Given that latest version is fast, is it worth investigation?

cc @adamsitnik @billwert @brianrob

JSkimming · 2019-05-22T16:00:38Z

Given that latest version is fast, is it worth investigation?

@karelz It's the latest versions that are slow.

Did you get a chance to collect traces and compare where the difference comes from?

Not yet, how should I go about collecting traces?

karelz · 2019-05-22T16:06:01Z

oh, I didn't notice it is minutes, not seconds. That is bad.
@adamsitnik can you please take a look? This looks serious.

@livarcocc @nguerrera is it a known issue?

JSkimming · 2019-05-22T16:15:47Z

@karelz I'm unable to replicate it on my machine 😉

I've tried on Windows, WSL and Docker (there's a helper Dockerfile in the root of the repo), but given it's happening on two different CI systems, and three different OSS projects, I figured it worth reporting before being able to replicate on my laptop.

The fastest way to execute the tests is to run coverage.cmd on windows, and coverage.sh on linux, coverage.cmd just tests .NET Framework though.

karelz · 2019-05-22T16:20:52Z

OK, that is interesting - I assume we will have similar problems to replicate it.
Are you able to reproduce it on similar machine setup in cloud? (similar VM used by the CI systems)

JSkimming · 2019-05-22T16:53:31Z

I assume we will have similar problems to replicate it.

Quite possibly.

Are you able to reproduce it on similar machine setup in cloud? (similar VM used by the CI systems)

Nope, I haven't tried that. I'm not in a position to try that at the moment either.

brianrob · 2019-05-22T17:54:57Z

I'm going to try the Dockerfile out and see if I can reproduce the regression.

nguerrera · 2019-05-22T17:58:58Z

@livarcocc @nguerrera is it a known issue?

First I hear of it.

nguerrera · 2019-05-22T18:02:53Z

As you can see the test execution takes an order of magnitude longer.

Strangely, the dotnet test runs are reporting mere seconds of total time while the overall job takes much longer. Is it possible to get timestamps at the start/end of each command?

nguerrera · 2019-05-22T18:09:31Z

CC @singhsarab

JSkimming · 2019-05-22T18:15:45Z

I'm going to try the Dockerfile out and see if I can reproduce the regression.

@brianrob I've just pushed an updated Dockerfile, and I'm experimenting with tweaking the following:

docker build --cpuset-cpus=0 --cpuset-mems=0 --cpu-period=1000 --build-arg CACHE_BUST=1 .

You need to keep changing CACHE_BUST and it will cache up to the test execution.

Is it possible to get timestamps at the start/end of each command?

@nguerrera I've just pushed an update to display the dates, checkout the builds of this PR JSkimming/abioc#82

brianrob · 2019-05-22T19:25:57Z

It seems that I'm not able to reproduce this either.

karelz · 2019-05-22T20:47:59Z

@brianrob is there something we can ask @JSkimming to collect that we can then analyze?

brianrob · 2019-05-22T20:51:17Z

Yes. If you're able to reproduce the issue, please follow the instructions at https://github.com/dotnet/coreclr/blob/master/Documentation/project-docs/linux-performance-tracing.md to create a trace for each of the good and bad cases. You don't need to capture the whole thing - let's just start with 20 seconds for each.

JSkimming · 2019-05-22T21:03:30Z

@brianrob any tips on how to get that running in a hosted CI system?

So far I've only been able to replicate the issue on CircleCI and Travis.

There's a lot of sudo and having multiple shell windows open in those instructions, none of which I'm sure I can do easily. I believe I can configure Travis to be able to use sudo, though I think that will change it from container builds to VMs, which may no longer exhibit the issue.

If it's something that can be enabled on Azure DevOps, I'm willing to set-up the build there too, though until I try, I don't know if I'll be able to replicate things.

JSkimming · 2019-05-22T21:25:53Z

Strangely, the dotnet test runs are reporting mere seconds of total time while the overall job takes much longer. Is it possible to get timestamps at the start/end of each command?

@nguerrera I've updated the main description with links to the builds with timings between the commands.

karelz · 2019-05-22T23:45:38Z

@JSkimming sadly we are not familiar with those environments.
If it is related only to specific environment (special container in CI systems), it may be very well some problem in the environment setup ...

JSkimming · 2019-05-23T06:21:41Z

OK, I think I've found the issue.

TL;DR

On a hunch, I changed the script to stop displaying the output of dotnet test, and things are back to normal. See here for CircleCI, and here for Travis.

The Hunch

I noticed the test output flickered when displaying a message.

Test run in progress...

Here's an example

I get the impression the CLI is writing that message a lot, which eventually results in the flickering because a buffer somewhere is filled.

Hosted CI systems capture the stdout of the tools being executed, but in their case, all those writes are going through code (that ultimately streams to the browser) rather than directly to a terminal, and all those stdout writes from dotnet test are destroying things.

This also explains the discrepancy between the test execution timings from the dotnet test and the actual timings I inserted. See below; the CLI states the tests executed in 2.1397 Seconds whereas the real execution time is 10m 49s.

Wed May 22 20:53:05 UTC 2019
0 minutes and 7 seconds elapsed.

$ dotnet test --no-restore --no-build -f netcoreapp2.1 -c Release /root/project/test/Abioc.Tests.Internal/Abioc.Tests.Internal.csproj --results-directory /root/project/test/TestResults/output/ --logger "trx;LogFileName=Abioc.Tests.Internal.trx"
Test run for /root/project/test/Abioc.Tests.Internal/bin/Release/netcoreapp2.1/Abioc.Tests.Internal.dll(.NETCoreApp,Version=v2.1)
Microsoft (R) Test Execution Command Line Tool Version 16.1.0
Copyright (c) Microsoft Corporation.  All rights reserved.

Starting test execution, please wait...
Test run in progress.Test run in progress ... Results File: /root/project/test/TestResults/output/Abioc.Tests.Internal.trx

Test Run Successful.
Total tests: 310
     Passed: 310
 Total time: 2.1397 Seconds

Wed May 22 21:03:54 UTC 2019
10 minutes and 56 seconds elapsed.

~~I'm not sure where to start looking for what changed in the dotnet cli that caused all this. All scenarios are using v16.1.0 of vstest.~~

Actually, I misread the versions, the issue was introduced in v16.1.0 see below.

karelz · 2019-05-23T15:24:28Z

Awesome to see we are closer to a root cause!
@nguerrera do you have an idea if CLI is responsible for too frequent text update in this case?

nguerrera · 2019-05-23T15:52:46Z

Looks like microsoft/vstest#1964

@singhsarab

JSkimming · 2019-05-23T16:01:06Z

@nguerrera Ahh, I misread the numbers. The earlier (working) release was 16.0.1, and those changes went into 16.1.0. So it is in vstest.

JSkimming · 2019-05-23T18:22:21Z

From a quick reading of the code I think I may have identified the issue.

The ProgressIndicator clears and redisplays the progress message whenever it receives successive calls to Pause() then Start().

This occurs whenever ConsoleLogger.TestMessageHandler() or ConsoleLogger.TestResultHandler is called regardless of the verbosity level.

As a result, the log message is constantly being redrawn, even if the verbosity level would not result in a message.

I'm going to have a look at a potential fix, and submit a PR later. I'll keep you updated on progress.

nguerrera · 2019-05-23T18:27:07Z

@karelz This should be moved to microsoft/vstest repo.

nguerrera · 2019-05-23T18:27:25Z

@JSkimming Awesome work tracking this down!

JSkimming · 2019-05-23T21:08:08Z

@karelz I've raised this PR microsoft/vstest#2024, all feedback is welcome.

Do you want to raise this as an issue in the microsoft/vstest repo, I'm happy to do it if it helps.

karelz · 2019-05-24T05:03:41Z

Awesome. Please open issue there if they need it, but maybe PR is enough. We can't move issues cross-org.
ZenHub would make just a copy.

Awesome work, thanks a ton!!!

dotnet/core#2775

tremblaysimon · 2019-06-14T14:11:53Z

I observed a degradation of performance even in a Linux host environment (not only in a CI environment).

One of our test project (around 2800 tests) took about 30 seconds (on 2.1.604) and now it takes about 1.87 minutes. We talk about a 4.5 times longer to run tests than previous SDK.

Do you know if the PR involved will fix that too?

JSkimming · 2019-07-09T12:02:11Z

@karelz this issue is still not resolved.

The fix I implemented was merged on the 28th May, and the NuGet package was released on the 30th May, but until there is a .NET Core SDK release where the latest version of VSTest is "inserted", the problem persists.

Is there any update on when a new SDK will be released incorporating the latest VSTest?

JSkimming · 2019-07-10T19:13:55Z

The .NET SDK releases 2.1.701 & 2.2.301 are now available and contain the fixes.

This should fix the performance per dotnet/core#2775 N.B., I'm force pushing so that the previous build gets cancelled.

dotnet/core#2775 (comment)

dotnet/core#2775

JSkimming added a commit to JSkimming/abioc that referenced this issue May 22, 2019

🐋 Updated to aid investigation of dotnet/core#2775

8c940c0

JSkimming mentioned this issue May 23, 2019

Prevent unnecessary progress indicator refresh microsoft/vstest#2024

Merged

brpratt mentioned this issue May 23, 2019

Tests execution hags with the latest version of Test Execution Command Line Tool Version (16.1.0) microsoft/vstest#2021

Closed

karelz closed this as completed May 24, 2019

prawalagarwal mentioned this issue May 27, 2019

Dotnet test hangs when running on a multi-project solution microsoft/vstest#2025

Closed

cheng93 added a commit to cheng93/TripleTriad that referenced this issue May 27, 2019

downgrade docker image in circle ci due to issue slow tests

f2af830

dotnet/core#2775

joao-r-reis mentioned this issue Jun 5, 2019

Add new examples folder and concurrent execution example datastax/csharp-driver#456

Merged

jbrot added a commit to jbrot/CKAN that referenced this issue Jul 14, 2019

Switched to .NET 2.1.701.

756ac99

This should fix the performance per dotnet/core#2775 N.B., I'm force pushing so that the previous build gets cancelled.

daviddesmet mentioned this issue Jul 17, 2019

dotnet test hangs on linux with dotnet 2.2.300 #2880

Closed

UncleSamSwiss mentioned this issue Jul 30, 2019

Dotnet XUnit test hangs when running on a solution in 3.0 preview 7 microsoft/vstest#2107

Closed

linianhui added a commit to linianhui/networking that referenced this issue Aug 6, 2019

Update Directory.Build.props

be9d7a2

dotnet/core#2775 (comment)

daviddesmet mentioned this issue Sep 12, 2019

Unit Testing with Linux agents are taking longer to complete daviddesmet/NaCl.Core#29

Closed

joao-r-reis added a commit to datastax/csharp-driver that referenced this issue Dec 20, 2019

fix slow unit tests on travis with sdk 2.2

333f463

dotnet/core#2775

UncleSamSwiss mentioned this issue Jan 31, 2020

Dotnet XUnit test hangs when running on a solution in 3.0 preview 7 dotnet/sdk#10418

Closed

joao-r-reis mentioned this issue Feb 27, 2020

Travis: Use .NET Core SDK 2.1.804 datastax/csharp-driver#492

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Massive xunit test performance degradation in .NET core 2.1.700 and 2.2.300 #2775

Massive xunit test performance degradation in .NET core 2.1.700 and 2.2.300 #2775

JSkimming commented May 22, 2019 •

edited

Loading

karelz commented May 22, 2019

JSkimming commented May 22, 2019

karelz commented May 22, 2019

JSkimming commented May 22, 2019 •

edited

Loading

karelz commented May 22, 2019

JSkimming commented May 22, 2019

brianrob commented May 22, 2019

nguerrera commented May 22, 2019

nguerrera commented May 22, 2019

nguerrera commented May 22, 2019

JSkimming commented May 22, 2019 •

edited

Loading

brianrob commented May 22, 2019

karelz commented May 22, 2019

brianrob commented May 22, 2019

JSkimming commented May 22, 2019

JSkimming commented May 22, 2019

karelz commented May 22, 2019

JSkimming commented May 23, 2019 •

edited

Loading

karelz commented May 23, 2019

nguerrera commented May 23, 2019

JSkimming commented May 23, 2019 •

edited

Loading

JSkimming commented May 23, 2019

nguerrera commented May 23, 2019

nguerrera commented May 23, 2019

JSkimming commented May 23, 2019

karelz commented May 24, 2019

tremblaysimon commented Jun 14, 2019

JSkimming commented Jul 9, 2019

JSkimming commented Jul 10, 2019

Massive xunit test performance degradation in .NET core 2.1.700 and 2.2.300 #2775

Massive xunit test performance degradation in .NET core 2.1.700 and 2.2.300 #2775

Comments

JSkimming commented May 22, 2019 • edited Loading

TL;DR

General

Update

karelz commented May 22, 2019

JSkimming commented May 22, 2019

karelz commented May 22, 2019

JSkimming commented May 22, 2019 • edited Loading

karelz commented May 22, 2019

JSkimming commented May 22, 2019

brianrob commented May 22, 2019

nguerrera commented May 22, 2019

nguerrera commented May 22, 2019

nguerrera commented May 22, 2019

JSkimming commented May 22, 2019 • edited Loading

brianrob commented May 22, 2019

karelz commented May 22, 2019

brianrob commented May 22, 2019

JSkimming commented May 22, 2019

JSkimming commented May 22, 2019

karelz commented May 22, 2019

JSkimming commented May 23, 2019 • edited Loading

TL;DR

The Hunch

karelz commented May 23, 2019

nguerrera commented May 23, 2019

JSkimming commented May 23, 2019 • edited Loading

JSkimming commented May 23, 2019

nguerrera commented May 23, 2019

nguerrera commented May 23, 2019

JSkimming commented May 23, 2019

karelz commented May 24, 2019

tremblaysimon commented Jun 14, 2019

JSkimming commented Jul 9, 2019

JSkimming commented Jul 10, 2019

JSkimming commented May 22, 2019 •

edited

Loading

JSkimming commented May 22, 2019 •

edited

Loading

JSkimming commented May 22, 2019 •

edited

Loading

JSkimming commented May 23, 2019 •

edited

Loading

JSkimming commented May 23, 2019 •

edited

Loading