-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Errors installing the SDK during builds #34015
Comments
I couldn't figure out the best area label to add to this issue. Please help me learn by adding exactly one area label. |
Data from the CI pipelines
Evaluated 100 builds |
@wli3, I think
|
Data from today
Evaluated 100 builds
|
cc @janvorli, @safern, FWIW, I did some digging in my PRs and I thought I should share my findings: In some cases, this error is due to wrong version being referenced, so it occurs on Linux, macOS and Windows build. In other cases, from the top comment:
it occurs only on Linux machines, where GnuTLS (which has a bug or in other words has incompatible TLS handshake with some protocols/servers) is used as cURL backend, as opposed to OpenSSL based cURL. Therefore, I think increasing the retries to even 100 in dotnet-install script in dotnet/sdk repo would not fix it. We can update the docker containers to use the desired cURL. |
New repro (Linux musl arm64 release library build): |
Can you elaborate on what is the wrong version here? Not quite sure which part of the stack your referring to.
Do you know which images these are that we need to look into updating? |
#34015 (comment), in this run, it failed on Linux and Windows, because that day, perf leg were failing because those were requiring the SDK version, which was really non-existing. I think it was later fixed.
I think it is bit more complicated as it does not always fail with the GnuTLS, but whenever it fails on Linux, as far as i can see, it is always with GnuTLS. At that time, you can quickly repro it with GnuTLS-backed curl locally, while wget and OpenSSL-backed curl will happily download the file. Disclaimer, this information is empirical at best, so please treat it like so. However, it is maybe something to keep an eye on. :) |
@donJoseLuis who's team is driving the fixes here as there is some more context.
Understood. Appreciate the incite. We've also frankly been scratching our heads over this one too. It actually being two issues (general flakiness that can be improved with retry and fundamental issue with some images) would explain the data I've been seeing. Specifically that the persistent failures are concentrated in dotnet/runtime where we have a higher investment in docker and lots of Linux images but that the failure does also show up in other repos in batches. |
To confirm or rule out GnuTLS is a rootcause:
|
This issue is where dev test builds of their official build was updating the information of the latest SDK in the specific channel, but since it was a build from a dev branch, it didn't publish any assets. I believe that is getting fixed already. |
I believe I encountered this (or at least a similar issue) on my PR (#33307):
|
I have also experienced the failure with attempt to download the SDK resulting in HTTP error 404. When that happened, I've tried to open the URL in my local browser on my Windows machine and I was also getting the HTTP error 404. So this had clearly no relation to GnuTLS. |
Hello @jaredpar , let us know if the situation is ameliorated, after dotnet/sdk#11001. |
@donJoseLuis will do. The other TLS issues we had yesterday are muddying the data a bit. Have to probably wait another day to see if it's resolved. I will definitely be looking at the data though. |
Regarding the previous conversation about curl issue; if we switch the order of if/elif conditions here: https://github.com/dotnet/install-scripts/blob/ad55554a0b84244ff8c68579df8b84af30b9abfc/src/dotnet-install.sh#L711-L714 (wget-first) and these error disappear; then it would mean that all unix-y agents just need curl v7.47.0 or above. Note that the fixes that went into curl v7.47.0 applied to all TLS libraries (including libressl used by curl on macOS). ps - dotnet-install script could anyway provide an option to select tool of choice (curl vs. wget) to avoid such code changes. |
@eerhardt I have a few things to chat about w/ the Mac MMS folks, but 3 is really just not a lot of data, if we can get more that'd be great. |
Converted this issue to a live tracking one (see top post). This still happens happens, but less often. |
Happened again in this official build: https://dev.azure.com/dnceng/internal/_build/results?buildId=895670&view=results. Unsure what's the current state of this issue. Is someone actively working on a mitigation? cc @marcpopMSFT |
@bozturkMSFT |
@donJoseLuis @bozturkMSFT are driving improvements into the install script at the moment. We should probably retry the initial link rather than falling back to the legacy link as that never seems to work. |
It would probably make sense to retry initial URL for network failures but not 404 and perhaps others. I'm not sure if we can get that granularity. |
It only works for earlier versions of .NET. It is the “legacy” URL format. I think the switch happened sometime around 2.0 |
so for 3.0+ it would make sense to only try the initial url, right? e.g. never fall-back for master and 5.0 |
Yes, I think that would be fine. The URL naming change came in the summer of 2017 - dotnet/cli#6764. For versions above that, it doesn't make sense to try the legacy URL format. |
Greetings @ViktorHofer.
|
Thanks @donJoseLuis.
Should we move this issue over then?
The impact is failing public and internal builds which at the end of the day results in loss of productivity of the team and community. Combined with bad timing, in the worst case this could cause some of our releases to be delayed because of build failures. |
Thanks @ViktorHofer |
Initial cost estimate: 1 week
Initial contacts: @trylek, @ViktorHofer
Getting an error from
curl
during download.Runfo Tracking Issue: Errors installing the SDK during builds
Build Result Summary
The text was updated successfully, but these errors were encountered: