Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add helix-matrix.yml and windows arm 64 helix queue #22002

Merged
merged 28 commits into from
May 24, 2020
Merged

Add helix-matrix.yml and windows arm 64 helix queue #22002

merged 28 commits into from
May 24, 2020

Conversation

HaoK
Copy link
Member

@HaoK HaoK commented May 19, 2020

Initial clone of the daily helix jobs to a new yaml for a new azdo pipeline

@ghost ghost added the area-infrastructure Includes: MSBuild projects/targets, build scripts, CI, Installers and shared framework label May 19, 2020
@HaoK HaoK requested a review from a team May 20, 2020 01:51
@HaoK
Copy link
Member Author

HaoK commented May 20, 2020

This shouldn't affect anything for the ci builds, but I need to have a yml in master to play with splitting the additional helix queues into a separate pipeline

@Pilchie
Copy link
Member

Pilchie commented May 20, 2020

👀

Copy link
Member

@dougbu dougbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you added a pipeline using this YAML?

.azure/pipelines/helix-matrix.yml Outdated Show resolved Hide resolved
.azure/pipelines/helix-matrix.yml Outdated Show resolved Hide resolved
@HaoK
Copy link
Member Author

HaoK commented May 20, 2020

So as far as I can tell, I can only create a pipeline from the UI or specify a yaml file that's already in the repo, I guess I can try doing this directly in the azdo UI first with the contents of this PR

@HaoK
Copy link
Member Author

HaoK commented May 20, 2020

Ah okay never mind, looks like I can add a pipeline against this branch, i'll convert this to draft then and remark it as ready for review once its got everything passing

@HaoK HaoK marked this pull request as draft May 20, 2020 04:22
@dougbu
Copy link
Member

dougbu commented May 20, 2020

I'm not sure, but if that's the case why do we run on internal today?

It's the same pipeline as the official (internal) builds today.

@dougbu
Copy link
Member

dougbu commented May 20, 2020

Forgot to ask: Shouldn't this PR remove PR / rolling build differences in ci.yml?

@HaoK
Copy link
Member Author

HaoK commented May 20, 2020

Yeah once the new pipeline is working I'll update the ci yml

@HaoK
Copy link
Member Author

HaoK commented May 20, 2020

@dougbu have you ever seen any issues with the submodules not updating properly? The new pipeline looks like its failing to build due to the submodules missing in sources for messagePack and google? The helix jobs are passing fine in the existing pipelines in this branch too, so I'm puzzled, is there a hidden pipelines configuration UI/settings page that has extra stuff I need to copy over to the new one?

##[warning].packages/microsoft.build.tasks.git/1.1.0-beta-20206-02/build/Microsoft.Build.Tasks.Git.targets(24,5): warning : (NETCORE_ENGINEERING_TELEMETRY=Build) Could not find file '/home/vsts/work/1/s/src/submodules/googletest/.git'. The source code won't be available via Source Link.

@HaoK
Copy link
Member Author

HaoK commented May 20, 2020

Alright, for posterity, when setting up a new pipeline, there's additional stuff that needs to be configured in the 'high discover-able' buried settings page found via edit -> triggers -> YAML -> Get Sources (our non default settings are: clean = true, all build directories, checkout submodules, any nested submodules within)

@HaoK
Copy link
Member Author

HaoK commented May 21, 2020

Everything fails miserably on the windows 10 arm64 queue, we are building on linux arm64 though, so I'll try switching to a windows agent to see if that helps

@HaoK
Copy link
Member Author

HaoK commented May 21, 2020

Weird, errors were all in restore, so might be something else:

C:\dotnetbuild\work\A8520933\w\9E7608B5\e\RunTests\RunTests.csproj : error NU1102: Unable to find package Microsoft.WindowsDesktop.App.Ref with version (= 5.0.0-preview.5.20253.1)
C:\dotnetbuild\work\A8520933\w\9E7608B5\e\RunTests\RunTests.csproj : error NU1102:   - Found 17 version(s) in nuget.org [ Nearest version: 5.0.0-preview.4.20251.1 ]
  Failed to restore C:\dotnetbuild\work\A8520933\w\9E7608B5\e\RunTests\RunTests.csproj (in 463 ms).

@HaoK
Copy link
Member Author

HaoK commented May 21, 2020

Progress, looks like most of the windows arm64 tests are passing now, except for node services and libuv kestrel tests (@Tratcher ) do you have any ideas why libuv would fail to load only on windows arm64 and not the Deblian 9 arm64?

System.DllNotFoundException : Unable to load DLL 'libuv' or one of its dependencies: The specified module could not be found. (0x8007007E)

at Microsoft.AspNetCore.Server.Kestrel.Transport.Libuv.Internal.Networking.LibuvFunctions.NativeMethods.uv_loop_size()
   at Microsoft.AspNetCore.Server.Kestrel.Transport.Libuv.Internal.Networking.LibuvFunctions.loop_size() in /_/src/Servers/Kestrel/Transport.Libuv/src/Internal/Networking/LibuvFunctions.cs:line 339
   at Microsoft.AspNetCore.Server.Kestrel.Transport.Libuv.Internal.Networking.UvLoopHandle.Init(LibuvFunctions uv) in /_/src/Servers/Kestrel/Transport.Libuv/src/Internal/Networking/UvLoopHandle.cs:line 17
   at Microsoft.AspNetCore.Server.Kestrel.Transport.Libuv.Internal.LibuvThread.ThreadStart(Object parameter) in /_/src/Servers/Kestrel/Transport.Libuv/src/Internal/LibuvThread.cs:line 288
--- End of stack trace from previous location ---
   at Microsoft.AspNetCore.Server.Kestrel.Transport.Libuv.Internal.LibuvConnectionListener.BindAsync() in /_/src/Servers/Kestrel/Transport.Libuv/src/Internal/LibuvConnectionListener.cs:line 141
   at Microsoft.AspNetCore.Server.Kestrel.Transport.Libuv.Tests.LibuvTransportTests.CallingAcceptAfterDisposeAsyncThrows() in /_/src/Servers/Kestrel/Transport.Libuv/test/LibuvTransportTests.cs:line 152
--- End of stack trace from previous location ---

@Tratcher
Copy link
Member

Progress, looks like most of the windows arm64 tests are passing now, except for node services and libuv kestrel tests (@Tratcher ) do you have any ideas why libuv would fail to load only on windows arm64 and not the Deblian 9 arm64?

No. Looking at the package it has support for both. Any thoughts @halter73?

@HaoK
Copy link
Member Author

HaoK commented May 21, 2020

Doesn't look like nodejs is supported on win arm64 yet from nodejs/build#1138

It does look like there's an unofficial build support for versions 12+ from https://unofficial-builds.nodejs.org/download/release/v12.15.0/

But i'll just skip these tests on this queue for now and file an issue

@halter73
Copy link
Member

No. Looking at the package it has support for both. Any thoughts @halter73?

I'm not sure either. You're right that we include a win-arm version of libuv.dll in the Libuv NuGet package. It's at runtimes/win-arm/native/libuv.dll which seems to be the right location. I have no idea why the runtime can't find it.

@HaoK
Copy link
Member Author

HaoK commented May 21, 2020

@halter73 are you ok if I just skip these tests on the win-arm64 queue for now and file an issue for these tests?

@HaoK
Copy link
Member Author

HaoK commented May 21, 2020

Could it be related to how we publish/build the helix jobs? We do a single build for arch arm64 and publish to all the helix queues, does this break for libuv since we are using the same publish directory to run tests on both linux arm64 and win arm64?

@halter73
Copy link
Member

@halter73 are you ok if I just skip these tests on the win-arm64 queue for now and file an issue for these tests?

I'm OK with that. Are you going to file an issue or should I?

@HaoK
Copy link
Member Author

HaoK commented May 21, 2020

Thanks you can file the issue @halter73 I'll add the skips to this PR

@halter73
Copy link
Member

halter73 commented May 21, 2020

We do a single build for arch arm64 and publish to all the helix queues, does this break for libuv since we are using the same publish directory to run tests on both linux arm64 and win arm64?

I don't think so. Unless it's a self-contained build, I think the publish output should include native assemblies for all RIDs defined in NuGet packages. @sebastienros would know.

@halter73
Copy link
Member

I filed the libuv issue.

@HaoK
Copy link
Member Author

HaoK commented May 21, 2020

Thanks, I'll probably add a new csproj attribute to skip these, since its a pain to apply skip to tests enbulk, just updating all Facts to conditional facts is painful :/

@HaoK
Copy link
Member Author

HaoK commented May 22, 2020

Okay looks like the new pipeline is green finally with the new windows arm64 queue. Marking as ready for review, the question about how often to run this pipeline (every checkin rolling or X times a day?). I also need to remove the PR trigger before merging

@HaoK HaoK marked this pull request as ready for review May 22, 2020 22:09
@HaoK HaoK changed the title Add helix-matrix.yml Add helix-matrix.yml and windows arm 64 helix queue May 22, 2020
@@ -30,8 +30,8 @@ echo "Installing Runtime"
powershell.exe -NoProfile -ExecutionPolicy unrestricted -Command "[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12; &([scriptblock]::Create((Invoke-WebRequest -useb 'https://dot.net/v1/dotnet-install.ps1'))) -Architecture %$arch% -Runtime dotnet -Version %$runtimeVersion% -InstallDir %DOTNET_ROOT%"

set exit_code=0
echo "Restore: dotnet restore RunTests\RunTests.csproj --source https://api.nuget.org/v3/index.json --ignore-failed-sources..."
dotnet restore RunTests\RunTests.csproj --source https://api.nuget.org/v3/index.json --ignore-failed-sources
echo "Restore: dotnet restore RunTests\RunTests.csproj --source https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet5/nuget/v3/index.json --ignore-failed-sources..."
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BrennanConroy I had to switch feeds for some of the packages to restore properly on the windows arm queue, not sure why?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link to a failure otherwise I have no context to help

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is super strange. I'm not against this change, but I would like to know why that package is trying to be resolved :D

@Pilchie
Copy link
Member

Pilchie commented May 23, 2020

For frequency I suggest checking with Helix folks and see what they recommend to not overload our available arm64 machine pool.

@dougbu
Copy link
Member

dougbu commented May 23, 2020

@HaoK I moved the pipeline into the \dotnet\aspnetcore folder in AzDO, together w/ our other public pipelines.

dougbu added a commit that referenced this pull request May 23, 2020
- all work now done in "Checkout dotnet/aspnetcore" step
- reconfigured pipeline based on #22002 (comment)
@HaoK HaoK merged commit a806ae6 into master May 24, 2020
@HaoK HaoK deleted the helix/winarm branch May 24, 2020 00:08
dougbu added a commit that referenced this pull request May 24, 2020
- all work now done in "Checkout dotnet/aspnetcore" step
- reconfigured pipeline based on #22002 (comment)
displayName: Build shared fx
- script: .\restore.cmd -ci /p:BuildInteropProjects=true
displayName: Restore interop projects
- script: .\build.cmd -ci -nobl -NoRestore -test -noBuildJava -all -projects eng\helix\helix.proj /p:IsHelixDaily=true /p:IsRequiredCheck=true /p:IsHelixJob=true /p:BuildInteropProjects=true /p:RunTemplateTests=true /p:ASPNETCORE_TEST_LOG_DIR=artifacts/log
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have missed an earlier discussion. Why is -noBuildJava needed here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, looks like we have a difference between the quarantined tests and what was in the normal ci.yml

I copied this from the quarantined test yml which had that...

https://github.com/dotnet/aspnetcore/blob/master/.azure/pipelines/quarantined-tests.yml#L38

@BrennanConroy do we need to build java for the signalr tests?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BrennanConroy do we need to build java for the signalr tests?

I believe so. Also, is this pipeline running anywhere? I don't see a run of this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be here, https://dev.azure.com/dnceng/public/_build?definitionId=837&_a=summary

I'll add back the java flags on those pipelines

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah new pipeline name, that's why I didn't see it.

@dougbu
Copy link
Member

dougbu commented May 24, 2020

On another note, taking the Helix builds out of non-PR (rolling and official) builds sped rolling builds significantly. It didn't affect official builds as much because the Windows x64/x86 build took almost as long as the removed Helix builds. But, rolling builds look better than they have in ages
image

@dougbu
Copy link
Member

dougbu commented May 24, 2020

/fyi @mmitche the current longest job in official builds doesn't do any testing.

@Pilchie
Copy link
Member

Pilchie commented May 25, 2020

Wait looking at this build, it looks like we aren't running any helix tests on PRs and rolling builds anymore. I think we still want to run the most common platforms (say Win10 and Ubuntu 18.04), just not the full matrix, since the chance of a PR breaking a specific platform is low, and there are constraints on machines.

@HaoK
Copy link
Member Author

HaoK commented May 25, 2020

Yeah, the helix tests are run only in PR and in the helix-matrix twice a day.

I can easily tweak things to have the PR checks run on all builds including rolling, I'll open a PR for that now

@Pilchie
Copy link
Member

Pilchie commented May 25, 2020

Thanks - the idea is for those rolling builds to match the PR builds as closely as possible so that we have an idea of the baseline reliability of PR builds (without having to categorize failures as caused by the PR, or caused by flakiness)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-infrastructure Includes: MSBuild projects/targets, build scripts, CI, Installers and shared framework
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants