Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDL timeouts on build #5219

Closed
2 tasks done
JunTaoLuo opened this issue Apr 7, 2020 · 32 comments
Closed
2 tasks done

SDL timeouts on build #5219

JunTaoLuo opened this issue Apr 7, 2020 · 32 comments
Assignees

Comments

@JunTaoLuo
Copy link

JunTaoLuo commented Apr 7, 2020

  • This issue is blocking
  • This issue is causing unreasonable pain

It seems like the timeout limit for the SDL step during build is set to 1 hour. We have been hitting this timeout limit: https://dev.azure.com/dnceng/internal/_build/results?buildId=591024&view=logs&j=7d9eef18-6720-5c1f-4d30-89d7b76728e9&t=f511b583-5060-5810-7549-865816347c8e. Is there an issue with the SDL tool or should we increase the limit?

@JohnTortugo
Copy link
Contributor

/cc @sunandabalu

@hoyosjs
Copy link
Member

hoyosjs commented Apr 9, 2020

Right now ASP is not using any binary artifact checks, but SDL checks still download all artifacts produced. This took half an hour. Probably the easiest would be to disable asset download with the changes that got merged in #5158 which would save a ton of time (download and extraction of all nuget/zips).

@Pilchie
Copy link
Member

Pilchie commented Apr 9, 2020

We're also seeing OutOfMemory exceptions in later runs.

I'm escalating this to blocking, and 3 of our last 4 internal builds have failed due to SDL validation issues.

@markwilkie
Copy link
Member

To unblock, it's probably best to turn off SDL until root cause can be done.

@pranavkm
Copy link
Contributor

pranavkm commented Apr 9, 2020

@markwilkie would setting https://github.com/dotnet/aspnetcore/blob/master/.azure/pipelines/ci.yml#L819 to false suffice? (/cc @dotnet/aspnet-build)

@JohnTortugo
Copy link
Contributor

JohnTortugo commented Apr 9, 2020

@markwilkie would setting https://github.com/dotnet/aspnetcore/blob/master/.azure/pipelines/ci.yml#L819 to false suffice? (/cc @dotnet/aspnet-build)

Yes, it should.

@sunandabalu
Copy link
Member

@markwilkie would setting https://github.com/dotnet/aspnetcore/blob/master/.azure/pipelines/ci.yml#L819 to false suffice? (/cc @dotnet/aspnet-build)

Yes, that should turn off SDL runs

@jaredpar
Copy link
Member

jaredpar commented Apr 9, 2020

Do we have an expectation of throughput from the SDL team for these phases? Even stronger do we have a commitment from them that these phases can execute in a specific period of time?

At this point SDL is tied into our official builds which mean they factor into our two hour build time commitment. That means they need to be extremely fast in order for us to hit our goals, minutes at most. Anything aproaching an hour will cause us to miss our build times.

One item i think we should consider is moving this out to a separate build definition. It can run in parallel with official builds

pranavkm added a commit to dotnet/aspnetcore that referenced this issue Apr 9, 2020
Based on the discussion here: dotnet/arcade#5219
Re-enabling tracked by #20690
@markwilkie
Copy link
Member

I agree that we're going to have to address this - one way or another. We're already pull out all non-source SDL stuff to post build, but the the rest remains.

The decision we'll need to make is between forced build break so we don't build debt vs. build time and reliability.

cc/ @mmitche and @jcagme for visibility

@jaredpar
Copy link
Member

jaredpar commented Apr 9, 2020

@markwilkie

The decision we'll need to make is between forced build break so we don't build debt vs. build time and reliability.

Not sure what you're saying here. Can you elaborate?

@markwilkie
Copy link
Member

It could be that SDL is not as fast as we'd like, and if we moved it out of the build, then we'll build (some) debt by nature that we're human.

@sunandabalu
Copy link
Member

sunandabalu commented Apr 9, 2020

Looking at the last few failures in SDL, its seems to be consistently failing with #5220. While we fix this, you can turn off downloading and extracting of artifacts as @hoyosjs mentioned above.

The time-out mentioned in this issue seems transient and odd, usually execute sdl takes ~19minutes(to run both the configured tools) but it was stuck in Policheck for 15 minutes.

@alexperovich
Copy link
Member

This build: https://dev.azure.com/dnceng/internal/_build/results?buildId=591024&view=logs&j=7d9eef18-6720-5c1f-4d30-89d7b76728e9&t=34312c87-f0f7-51d1-6eae-738ee1c68839 took 32 minutes to download 8.2 GB.
This build: https://dev.azure.com/dnceng/internal/_build/results?buildId=591166&view=logs&j=7d9eef18-6720-5c1f-4d30-89d7b76728e9&t=34312c87-f0f7-51d1-6eae-738ee1c68839 took 6.5 minutes to download 8.3 GB.

This smells like a throttling issue.

That being said, 8 GB is a massive amount of artifacts to download. There is no way the SDL step needs binlogs or symbol packages.

@sunandabalu
Copy link
Member

This build: https://dev.azure.com/dnceng/internal/_build/results?buildId=591024&view=logs&j=7d9eef18-6720-5c1f-4d30-89d7b76728e9&t=34312c87-f0f7-51d1-6eae-738ee1c68839 took 32 minutes to download 8.2 GB.
This build: https://dev.azure.com/dnceng/internal/_build/results?buildId=591166&view=logs&j=7d9eef18-6720-5c1f-4d30-89d7b76728e9&t=34312c87-f0f7-51d1-6eae-738ee1c68839 took 6.5 minutes to download 8.3 GB.

This smells like a throttling issue.

That being said, 8 GB is a massive amount of artifacts to download. There is no way the SDL step needs binlogs or symbol packages.

Agreed, hence the feature to turn off artifact download if needed was added.

@alexperovich
Copy link
Member

So, the fix is to merge this PR? dotnet/aspnetcore#20691

@Pilchie
Copy link
Member

Pilchie commented Apr 9, 2020

No, that PR stops aspnetcore from running SDL at all until this is understood.

@alexperovich
Copy link
Member

Ahh, okay. Then we just need to update to latest arcade and enable this feature.

@sunandabalu
Copy link
Member

Ahh, okay. Then we just need to update to latest arcade and enable this feature.

That will turn off downloads, yes and will remove this pain but we do need to address #5220 for those who want to keep downloading and extracting.

@alexperovich
Copy link
Member

#5220 won't fix this problem for repos that need to keep downloading the artifacts. It will still take a large amount of time to download if azure devops is throttling us. If we can't stop the throttling then the sdl step needs a higher timeout.

@hoyosjs
Copy link
Member

hoyosjs commented Apr 9, 2020

There's a parameter to filter what gets downloaded. For the most part a lot of artifacts are unnecessary (think packages and blobs). Test assets and all that seems unnecessary.

@sunandabalu
Copy link
Member

#5220 won't fix this problem for repos that need to keep downloading the artifacts. It will still take a large amount of time to download if azure devops is throttling us. If we can't stop the throttling then the sdl step needs a higher timeout.

yes we do need to increase the timeout too.

@markwilkie
Copy link
Member

Remember - to unblock, please turn off SDL.

@Pilchie
Copy link
Member

Pilchie commented Apr 9, 2020

We're trying, but running out of disk space on our ubuntu builds 😢

@jaredpar
Copy link
Member

jaredpar commented Apr 9, 2020

@Pilchie

Can you point me to one of the builds? I want to see if it's related to other out of space issues we're seeing.

@MattGal

I'm wondering if this out of disk is related to our two other out of disk space issues:

@Pilchie
Copy link
Member

Pilchie commented Apr 9, 2020

See also dotnet/aspnetcore#20704

@alexperovich
Copy link
Member

If there is a chance your tests go over 10GB then that could very easily cause issues. The azure devops hosted pools are only guaranteed 10GB. If you need more this job should be switched to one of our managed pools.

@Pilchie
Copy link
Member

Pilchie commented Apr 9, 2020

We're trying to determine that in the dotnet/aspnetcore#20704

wtgodbe pushed a commit to dotnet/aspnetcore that referenced this issue Apr 10, 2020
Based on the discussion here: dotnet/arcade#5219
Re-enabling tracked by #20690
@alexperovich
Copy link
Member

@Pilchie is this still critical? From what I can tell the builds are unblocked, and the fix for this is just upgrading the sdk.

@Pilchie
Copy link
Member

Pilchie commented Apr 13, 2020

We are unblocked because we disabled the job in our builds.

@markwilkie
Copy link
Member

BTW - we're planning on moving SDL out of the build entirely and to a promotion ring. cc/ @jcagme

@alexperovich
Copy link
Member

Build is unblocked. Work is in progress to improve the SDL steps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants