Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable baseline manifest generation #43797

Merged

Conversation

dsplaisted
Copy link
Member

@dsplaisted dsplaisted commented Sep 30, 2024

Approval template

Customer impact

.NET workload commands such as dotnet workload install or dotnet workload update will report a garbage collection failure as a warning. dotnet workload uninstall will hit the same garbage collection failure, but treat it as an error and the command will fail. The failure will look something like:

Workload uninstallation failed: Workload version 9.0.100-rtm.24467.1 was not found.

Note that this only happens for non-stabilized, rtm-branded builds, and only for file-based (zip) installs, not MSI-based installs.

Regression

Functionally yes, though what triggered this was changing the branding of the SDK from rc.2 to rtm, which is handled differently in the feature band calculation.

Testing

Manual testing

Risk

Low - there's inherent risk in code that deals with versions and special-cases based on things like whether the version is stabilized or labeled as -rtm, as we won't exercise those cases until late in the release. Still, the changes for this PR involve removing something that was causing the issue entirely, and adding a check to ignore it if an invalid version is found, which should be low-risk changes.

PR Description

This PR is a simplified version of #43737, targeted at 9.0.100. We will target the other PR at 9.0.200.

In #43483, we hit a test failure where a workload uninstall command was failing. On investigation, this was because during the SDK build when the baseline workload set was being created, its feature band was being calculated incorrectly. This led to the workload set being included in a folder such as sdk-manifests\9.0.100-rtm.24476\workloadsets\9.0.100-rtm.24476.1, using 9.0.100-rtm.24476 for the feature band instead of 9.0.100 which would have been correct.

When the SDK scanned for workload sets it would find this as workload set 9.0.100-rtm.24476.1, but when during garbage collection it tried to load the workload set, it would calculate its feature band as 9.0.100 and look for the workload set under the sdk-manifests\9.0.100 folder, and fail when it couldn't be found. This meant garbage collection would always fail. For most workload commands this would be reported as a warning, but for the uninstall command it would cause the whole command to fail.

This PR fixes the issue in two ways:

  • Stop generating a baseline workload set
  • When scanning for workload sets, ignore any where the feature band folder name isn't a valid feature band, or where there is an inconsistency between the feature band in the folder name and the feature band of the workload set

Stop generating a baseline workload set

#43737 fixes the workload set feature band calculation when generating the baseline workload set. However, for 9.0.100, workload sets aren't enabled by default, so the simpler fix is to just stop generating the baseline workload set at all.

Ignore workload sets with invalid or mismatched feature bands when scanning for available workload sets

This is a change to SdkDirectoryWorkloadManifestProvider. When scanning for workload sets, it ignores any that it would not be able to find later based on the workload set version due to a mismatched feature band folder. This will avoid errors if the SDK later tries to load that workload set and can't find it.

However, this does somewhat reduce the diagnosibility of issues, as it will silently ignore any workload sets that are in the wrong feature band folder. The SdkDirectoryWorkloadManifestProvider doesn't currently have any way to write warnings or similar messages to a log.

@dsplaisted
Copy link
Member Author

System.Security.Cryptography.CryptographicException : There are no more endpoints available from the endpoint mapper.

@baronfel This was in Microsoft.NET.Build.Containers.UnitTests.RegistryTests.InsecureRegistry. Is that an error you've seen before?

@baronfel
Copy link
Member

baronfel commented Oct 1, 2024

It's been happening all over the place - there are a few dnceng threads on it and a Known Build Issue. From what I remember it's a Windows OS crypto resource constraint issue that resolves on a reboot.

@marcpopMSFT marcpopMSFT requested a review from joeloff October 1, 2024 20:53
@joeloff
Copy link
Member

joeloff commented Oct 2, 2024

We already have support for generating workload set MSIs. We currently only use it in the workload set repo, but should be possible to add generating ones for the baseline when the property is flipped. Probably something we can look at for servicing.

Co-authored-by: Jacques Eloff <[email protected]>
@lewing
Copy link
Member

lewing commented Oct 2, 2024

System.Security.Cryptography.CryptographicException : There are no more endpoints available from the endpoint mapper.

@baronfel This was in Microsoft.NET.Build.Containers.UnitTests.RegistryTests.InsecureRegistry. Is that an error you've seen before?

dotnet/dnceng#3844

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Workloads Servicing-approved untriaged Request triage from a team member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants