-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent NuGet push order in .NET release publishing #23820
Comments
Hi @filipnavara, thanks for raising this issue. We don't see this as a workloads-specific issue, instead we see this as an issue with the way dotnet as an organization publishes packages on NuGet. We push a huge number of packages with each servicing release between workloads, various .NET Runtimes, and library releases, and several of these have similar timing issues as you describe here. Even if we had some ordering where we uploaded all workload packs prior to uploading the manifests, there's nothing enforcing on NuGet's end that indexing/CDN distribution/etc. keep that same ordering. The best advice I can give you is to delay a bit of time and try again in the general case. |
The problem for individual NuGets is different since we are in control of the versions (which we update through explicit Dependabot PR, dotnet/arcade or other mechanism). However, with workloads there is no control unless we enforce a usage of a full custom manifest. So the moment the new manifest is pushed to NuGet it breaks our CI pipelines. During the publishing of .NET 6.0.2 the breakage lasted approximately 2 hours. I am rising the issue in an attempt to find a way how to minimize the disruption in future. While I can easily identify what is happening and what was the cause of the errors others may not. |
I agree that workloads are a special case here because of the way advertising manifests are automatically updated during restore. Maybe we can skip a manifest during that automatic update until it is at least X hours old? |
FWIW we hit it again when 6.0.200 SDK was being published. We install the SDK as part of the CI pipeline. So in addition to the order of NuGets being pushed the SDK acquisition process also gets the newer SDK first [before the NuGets are uploaded]. That also doesn't seem quite right. |
Hitting this presently with the current rollout for the
On closer inspection, it appears that "packs": {
"Microsoft.iOS.Sdk.net7": {
"kind": "sdk",
"version": "16.2.1024",
"alias-to": {
"any": "Microsoft.iOS.Sdk"
}
},
"Microsoft.iOS.Sdk.net6": {
"kind": "sdk",
"version": "16.2.19",
"alias-to": {
"any": "Microsoft.iOS.Sdk"
}
}, And it appears that I suggest that there be some validation added such that a manifest can't be published until all of its dependent packages have been published first. |
Also note that this breaks our CI/CD builds, since the current workloads are installed with each build. We can't build until the workload install succeeds. |
FYI, it's working again now that the indexing has completed. Hope the validation can be added so we don't encounter it again next time. |
@baronfel @marcpopMSFT what are our options here? Is this something that can be controlled via nuget publishing today and we need to opt in? Or is this a wider discussion we need to have? |
@rbhanda in case he has any ideas. We could potentially control the order of publish to nuget for the runtime workloads so that the packs go first and then the manifests. We could also coordinate a delayed manifest publish. We'd still have a problem if someone caught one of the manifests but not the others since there are half a dozen of them and they got into a partial state. Does nuget.org have the ability to publish first but not make them visible so we could do that all at once? Note that the above issue was the maui workload publishing and I don't know if that's managed by Rahul or my the maui team. |
Yeah, the underlying issue here is that workloads conceptually provide a mechanism to tie multiple packages together as a unit and our nuget publishing pipeline has no concept of that dependency at any level. To really handle this we'd need something like a staged transaction to the nuget database. |
NuGet can "unlist" packages which hides them from search results etc (and theoretically also from workloads assuming we correctly implement this). As far as I know nuget.org doesn't support pushing a package in the unlisted state but there's an API to unlist that we could call as soon as we push the manifest package and then relist it later once all the other packages are pushed. Or ask the NuGet team to add a |
CC @JonDouglas @aortiz-msft for the interesting feature request (either batching or publish unlisted). I'll ping @rbhanda offline and see if publishing, unlisting, and then later listing is a potential option here. |
This is definitely something that we can look into though, as noted, the package index status seems to be crux of the issue. |
This would be a NuGet.org ask so tagging @joelverhagen and @clairernovotny |
As @baronfel mentioned availability order of packages on NuGet.org is not guaranteed, even if packages are pushed in some clever (e.g. reverse dependency) order. This is because our asynchronous validation pipeline (malware scanning, signing, and more) can take a different amount of time per package. Imagine there is a package in the middle of the dependency graph that takes a bit longer to malware scan or is owned by another team that runs their push at a slightly different time (this happens more than you'd think). It's sort of a messy problem. It's not currently feasible to fix the availability order unless the actor pushing the package polls for package availability before continuing with other package pushes. If done naively this would have horrible throughput (avg validation time * total number of packages to push, so many hours for big .NET releases). It would be possible to essentially do a topological sort to identify sets of packages that can be pushed together but this is all a hack/workaround for the feature gap on NuGet.org. This general feature request is already tracked here NuGet/NuGetGallery#3931. Feel free to add additional comments or upvote. I think the described solution ("staging") is the Proper fix for the problem, but it's a lot of work and needs some exploration with stakeholders to make sure the staging works as needed. I can say it's not on our radar for the next 6 months given our other priorities and team capacity.
Responding to @marcpopMSFT's #23820 (comment), it is possible to publish packages as unlisted. This can be done by using the "unlist" gesture (nuget.exe, .NET CLI, API endpoint) immediately after pushing, while the package is still in the validating state. This will always be the case since validation takes 2+ minutes and you can unlist immediately after the push request completes. Then, after you have detected that packages should be listed (i.e. the fully dependency graph is available), you can use the "relist" gesture (no CLI support, but has API support).
Responding to @akoeplinger's #23820 (comment), yes that's right. As mentioned to Marc above, you can unlist immediately after push so it can work today. Feel free to open an issue about enhancing the push protocol to include a "listed = false" parameter. This would remove the additional round trip, eliminate any crazy race condition caused by a slow/errored unlist or hyper fast validation, and provide parity with the UI upload flow which currently allows uploading a package as unlisted. FYI, the best way to detect if a version is available is to use this API endpoint (a |
Thanks for the detailed response @joelverhagen. Glad to know there's a 2 minute validation window during which we can unlist. I met with our release team and we have a low cost proposal for the next release. The plan is to publish all .NET packages excluding the manifest packages, poll for availability (which I understand they already do), and then publish the manifest packages as a separate step afterwards. There is potential delay from the polling and there is still the potential for customer impact since the dozen or so manifests would still potentially light up at different times for different customers but it would reduce that window from 30+ minutes down to a much smaller window. If that goes well, we'll continue with that option. If there are still issues, we can explore the option of pushing unlisted and then listing all of the manifests together. I'll be meeting with some Maui folks later today and suggesting they follow the same pattern as their publish is a separate process from the core .NET package publish today. |
Hitting this again presently.
Related to rollout of https://www.nuget.org/packages/Microsoft.NET.Sdk.iOS.Manifest-7.0.100/16.2.1040 |
Working again now. |
@marcpopMSFT this seems to be happening again with today's release, is the proposed mitigation you outlined supposed to happen this release already or in some future release? |
Working again now. |
@akoeplinger @liejuntao001 can you confirm which component was out of order? We're working with release teams but it's mostly a manual process to ensure that the manifests get published last and that may have been missed this release. |
Some reports I saw:
|
@marcpopMSFT can you share some details on the "poll for availability" step? Is there an example of how the .NET release team did it? This is not implemented in any of the MAUI release pipelines yet. |
I saw these from our Azure pipeline builds: 2023-04-11T14:22:03.8990170Z Workload installation failed: microsoft.netcore.app.runtime.mono.android-x64::6.0.16 is not found in NuGet feeds https://api.nuget.org/v3/index.json". 2023-04-11T15:10:42.2530970Z Workload installation failed: microsoft.ios.sdk::16.2.46 is not found in NuGet feeds https://api.nuget.org/v3/index.json". 2023-04-11T15:43:07.5734510Z Workload installation failed: microsoft.ios.sdk::16.2.46 is not found in NuGet feeds https://api.nuget.org/v3/index.json". later worked at |
The fact we have some reports of:
Seems like maybe the .NET release side wasn't working either? |
I just checked with our release team. It looks like they had not recorded the outcome of our planned release ordering in their release steps and so had missed delaying the manifest publish until after all of the other packages are published. They plan on updating that process and hopefully this will be improved in May. For summary, the plan is to take the manifests out of the normal release process, publish all other packages, and then publish the manifests. This doesn't guarantee that everything will work as it depends on when the packages become available on nuget.org but will greatly reduce the window where things might be broken (current publish takes >30 minutes and the manifests were publishing near the beginning of that process creating a large window where all workload commands would fail). We're still asking for a feature from nuget to link packages together and make them all available at the same time. |
@marcpopMSFT this plan didn't quite work for us in practice -- we published the Android manifest last, and yet it was the first available on NuGet.org. It is just such a smaller package, I think it made it through the queue quickly. So, I'm interested in how we "poll for availability" on NuGet. We could also put a 30 min. delay in the release pipeline, but I'm not sure that is a 100% solution. I believe NuGet was quite busy this last release day, and it took around an hour for things to resolve. |
Is there any way an internal validation check could be added to |
We found an https://github.com/xamarin/AndroidX/blob/main/build/scripts/update-config.csx#L216-L224 It could also run in a script via curl, etc. |
I think that HttpClient example will download the whole package if it is found. I tried sending a HEAD request to that nuget v2 api url but it didn't work (returned 404 for an existing package)... We'll probably need to use the v3 api instead which is a bit more involved since you need to request the index.json and then follow the RegistrationBaseURL: https://learn.microsoft.com/en-us/nuget/api/registration-base-url-resource |
It might be worth raising this with the nuget.org team and seeing what they recommend. http polling could potentially work but feels a bit clunky. For workload install, we install directly over the existing manifest and only then know what packs to look for. We'd have to do something complicated like install the manifests to a staging location, check if the packs are available, then try a different manifest, and repeat until we found one that had packs. That's complexity I'd like to avoid. We could potentially generate some sort of meta-manifest that includes the manifest versions and the pack versions in it and check for all of them being available first. That's only marginal complexity over some other plans we've been discussing but it does make that file more complicated (ie it'll have dozens of versions instead of just the ~8 manifest versions). |
If a specific publishing order is required for your scenario, then polling is the best option today. As mentioned above #23820 (comment), this maybe is not needed if publishing as unlisted is good enough, and then relisting later (some explicit, user defined "release time").
Using V3 is recommended, yes. I recommend doing a HEAD on the .nupkg URL: If you want to avoid writing the protocol client yourself, you can use the client SDK, which does the service index caching and URL building for you: https://learn.microsoft.com/en-us/nuget/reference/nuget-client-sdk#list-package-versions |
Is today Jun 13, 2023 another release day? Error:
|
@liejuntao001 yes, today is another monthly servicing release |
This problem also shows up with version wildcards and transitive dependencies in our regular dev pipelines, nuget really needs a transactional publish dotnet/performance#3164 |
Any updates on this issue? This is still causing lost work hours for me and my team every time a workload update is released. With the nature of how NuGet works, I understand that despite the order that packages are pushed they may well become available in a different order. @marcpopMSFT I understand resisting complexity, but surely the need for an always working solution outweighs the any issues caused by any added complexity. This issue has become increasingly frustrating. @joelverhagen Perhaps NuGet might add a flag for packages to make them unavailable unless their complete dependency tree is resolvable. |
@rbhanda who manages the release. We should be pushing all of the packs before pushing any of the manifests to avoid this but I believe there can still be CDN availability issues. I'm not sure how to solve for sure without a bunch of nuget.org side features. @JonDouglas to comment on the possibility of either pushing unlisted packages and then listing or pushing multiple related packages as a unit. |
today's another release day?
can push manifest package in completely separate "next" step in ci, contingent on "push everything else in random order" step? |
No. It's always second Tuesday of the month. |
must be some other explanation for this error
|
it was updated two weeks ago xamarin/xamarin-macios#20348. our workaround was to download https://github.com/xamarin/xamarin-macios/blob/main/NuGet.config in current directory, then run |
Describe the bug
When new .NET service release is pushed a lot of packages are pushed to the NuGet feed. Among those packages are the workload manifests and the workload packages referenced from those manifests. There is currently no enforced order in which these assets are uploaded which can cause the manifest feed to refer to non-existent packages. This will cause temporary failures on
dotnet workload install
command.Example:
The text was updated successfully, but these errors were encountered: