Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

outerloop tests failing to detect dotnet #11185

Closed
2 tasks
carlossanlop opened this issue Oct 7, 2022 · 5 comments
Closed
2 tasks

outerloop tests failing to detect dotnet #11185

carlossanlop opened this issue Oct 7, 2022 · 5 comments
Assignees

Comments

@carlossanlop
Copy link
Member

carlossanlop commented Oct 7, 2022

Build

https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_build/results?buildId=43878

Build leg reported

Invariant.Tests.WorkItemExecution

Pull Request

dotnet/runtime#76707

Action required for the engineering services team

To triage this issue (First Responder / @dotnet/dnceng):

  • Open the failing build above and investigate
  • Add a comment explaining your findings

If this is an issue that is causing build breaks across multiple builds and would get benefit from being listed on the build analysis check, follow the next steps:

  1. Add the label "Known Build Error"
  2. Edit this issue and add an error string in the Json below that can help us match this issue with future build breaks. You should use the known issues documentation
{
   "ErrorMessage" : "dotnet: No such file or directory",
   "BuildRetry": false
}

Additional information about the issue reported

./RunTests.sh: line 168: /tmp/helix/working/A6B808E3/p/dotnet: No such file or directory

Report

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 0 0
@MattGal
Copy link
Member

MattGal commented Oct 7, 2022

@carlossanlop a couple points here:

  • If you review the job list for this job, you will see no payload with a file named "dotnet'"
  • Correlation payloads are determined and constructed by the runtime team's build process and whatever copy of dotnet you thought you were getting is simply not there (you can download and examine the zip files in the above job list to see).
  • Since the problem is with a correlation payload, it's also every work item in that entire job.

I will spend some time looking around to see if the root cause here is obvious, but this is either going to stem from a recent regression from a checkin to runtime, or the PR itself.

@mmitche
Copy link
Member

mmitche commented Oct 7, 2022

@MattGal I think it's been failing as far back as I can see. I think the correlation payload didn't zip up dotnet in the expected location

@jozkee
Copy link
Member

jozkee commented Oct 7, 2022

FWIW: the linked PR is adding tests that create 8 GB files (not in parallel to avoid having more than one huge file at the time) so I wonder if that may affect that some dependencies being downloaded fail because VM is out of disk, but I highly doubt it because I assume that at the point the test is run, you already have all the dependencies.

Just wanted to point out what's "special" about the PR in relation with outerloop.

@mmitche
Copy link
Member

mmitche commented Oct 7, 2022

FWIW: the linked PR is adding tests that create 8 GB files (not in parallel to avoid having more than one huge file at the time) so I wonder if that may affect that some dependencies being downloaded fail because VM is out of disk, but I highly doubt it because I assume that at the point the test is run, you already have all the dependencies.

Just wanted to point out what's "special" about the PR in relation with outerloop.

@jozkee The outerloop testing has failed in the same way against main (and other branches) for as far back as the AzDO history goes...so my guess is that this is just a test configuration error.

@MattGal
Copy link
Member

MattGal commented Oct 7, 2022

Digging into this one specific problem, I crafted this query (though of course the same problem is likely broader than this)

Jobs
| where QueueName == "osx.1200.amd64.open"
| extend Propz = parse_json(Properties)
| where Propz["DefinitionName"] == "runtime-libraries-coreclr outerloop"
| where Propz["System.PhaseName"] == "libraries_build_OSX_x64_Debug"
| order by Finished desc 

This shows me that the inflection of "working" and "not working" was somewhere between these two runs:
pr/public/dotnet/runtime/refs/pull/73300/merge
pr/public/dotnet/runtime/refs/pull/73020/merge

Looking at commits from this time range (8/2-8/3) it's very likely the regression was introduced by https://github.com/dotnet/runtime/pull/73095/files, which seeks to remove the host packages conditionally and was merged at a time between the above two PRs likely merge commits. (just a guess though)

This is not an infrastructure issue, so as such I am closing it. I'm not 100% sure on the benefits of tracking it as a known issue either given it's been broken for 60+ days but if you would like to handle it thusly you can open a similar tracking issue in the runtime repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants