-
Notifications
You must be signed in to change notification settings - Fork 653
Kudu suddenly taking a long time to deploy to temp #1693
Comments
Update: please read if you are running into this issueI created a new entry in the Known Issues page to capture everything that someone needs to know about this issue: https://github.com/projectkudu/kudu/wiki/Known-issues Previous content: We deployed Kudu 47.40908.1797.0 last Tuesday (you can see version by going to root of Kudu). Try setting This way we can isolate whether the issue is related to the new Kudu, or possibly some unrelated environmental thing. |
I've just done a deployment and it's back down to 5 minutes using Kudu 47.40826.1772 so looks like it is most likely related to that. Are there release notes so I can see what I am missing out on by sticking back a version? |
Hmmm, that's concerning, as nothing fundamental has changed. I guess the one thing is the switch to the new msbuild, which also implies the new Roslyn-based compiler, etc... Can you try going back to the newest Kudu, but set |
Also, you can try manually building clean from Kudu Console, comparing |
New Kudu with old msbuild is 5 minutes also. |
It's concerning that the new msbuild/csc could be that much slower for your solution. Is there anything unusual about it? Can you describe what it's using? e.g. is it MVC, Web API, etc... and what version? Also, can you share your site name, either directly or indirectly? This may help us investigate. |
It's just a bog-standard MVC4 app targeting .Net 4.5. Site is cms-staging-weu-site. |
Is the repo public? If not, any chance you can give me access to the repo such that I could try deploying it on my own test site? We have test repos with all kind of variations of MVC and framework, but we have not seen this kind of extreme slow perf. |
Forget the above, the problem is back again. Will try a few things and repost. The repo isn't public as it is a commercial application and I'm not sure our account levels at our source hosts would allow me to grant access to an external user. However, if you've got a private repo you could grant me temporary access to, I would be happy to push to that. |
Some further testing: This is occurring across 3 different site instances, located in West Europe, East US, and East Asia, so it isn't localised. Two of the deployments that have taken a long time are:
Example of a good deployment:
Comparing deployment logs between deployments 230b and 53a6, when overriding msbuild with msbuild 14, it outputs warnings, which didn't occur in the good deployment. There doesn't seem to be any single item that takes longer, every item just seems to take longer. |
It is worth noting that my repository also contains another MVC4 site and an asmx web service. These are deployed in to their own Web Apps on the same Shared server. When the main site is misbehaving, they take about 20 minutes to build. When the main site is using the framework msbuild, these are fine too without having their msbuilds overridden, so I suspect their speed problems are just the result of resource starvation due to whatever problem the main site is having. |
I see. So the theory here is that the new Roslyn compiler (VBCSCompiler.exe) is not slower, but it eats up way more memory, causing the machine to thrash and be super slow if it doesn't have enough. Note that you can monitor the CPU and memory usage of your overall VM over the last 24 hours using https://portal.azure.com/:
This could confirm the theory if the slow builds correlate with high usage. |
The CPU usage on the app service plan increased from 50% at 9am, peaking at 98% at 10am, and dropping down slowly after that. This approximately correlates with the big 50 minute build at 10 and a number of less slow, but still slower than they should be, builds either side of this. On an additional likely related note, I've just had one deployment fail with the following error: D:\Windows\Microsoft.NET\Framework\v4.0.30319\Microsoft.CSharp.targets(316,9): error MSB6006: "Csc.exe" exited with code -1. [D:\home\site\repository\Product\Production\Model\Model.csproj] |
This happened because it was going so slow that Kudu failed to detect activity for 60 seconds and aborted the build. Well, I think we understand the issue now, and your best bet is to use the pre-Roslyn compiler. We'll try to bring up the memory usage to the Roslyn team, but it's unlikely to be something they can just fix. They have a completely different architecture from before, and presumably it needs more memory to do its thing :( |
I've just compared the csproj for the main site where I am having troubles and the MVC Rest site which is fine, below are the main differences that I can see.
<Import Project="$(MSBuildExtensionsPath32)\Microsoft\VisualStudio\v$(VisualStudioVersion)\TypeScript\Microsoft.TypeScript.targets" Condition="Exists('$(MSBuildExtensionsPath32)\Microsoft\VisualStudio\v$(VisualStudioVersion)\TypeScript\Microsoft.TypeScript.targets')" /> I am assuming that Visual Studio is not installed on the VM, so this could be our culprit? Any idea what a suitable alternative might be? |
So you mean 14.0 for the Rest site and not 4.0? 12.0 is VS 2013, and 14.0 is VS 2015. VS is not installed, but the various SDKs are. e.g. using Kudu console you can see that the file But maybe I'm not following your reasoning here. If something needed was missing, things would fail to build. Here, everything builds, but the new toolset eats up more memory. So I don't think the issue relates to missing files. |
No, rest site is definitely 4.0, I was confused by it too! Admittedly, I was grabbing at straws, the presence of VisualStudio in the path just threw me off. If you or the Roslyn guys want the code base to test with, you can contact me at alan dot parr at rexson dot co dot uk and we can find a way to get it to you. You've just got to promise not to laugh at my poor coding practices though :) For extra info, I am using the Roslyn compiler locally in VS2015 without issue on x64 Windows 8.1. |
Right, 4.0 is VS 2012, 12.0 is VS 2013, and 14.0 is VS 2015. Because we're really good at numbering things ;) Even though you're using it without issues locally. I would guess that it is also takes much more memory than with the old toolset. But when you have the memory, it's fast and unnoticeable. |
I checked to confirm, while building the main site, msbuild is only going up to 20mb. |
If you want to more closely replicate what happens in Azure, try to do a Publish to File System (some details here). Also, it might be hard to catch the peak memory usage from Task Manager, because if it's not out of memory, it may be very brief. |
I'll give it a try. Just gave it a try, worked fine. As you mentioned it might not be visible, but I had my eye on task manager the whole time and it didn't go over 32MB from what I could see. |
I don't think we've quite got to the bottom of this one. I am now experiencing the long builds and 100% CPU using the old msbuild. This is affecting all projects, not just the main site. Will do some further tests, including reverting Kudu build again, on Monday. |
That's quite unexpected. Where are you seeing the CPU being at 100%? Is that from the Preview Portal? |
Right now, CPU usage is very low on your Medium worker, and memory usage is only 69%. Would be interesting to check at the very time you deploy. |
Hello David, unfortunately we experiencing the same difficulties as AlanParr. We have several resource groups in Southeast Asia and each one contains 9 webapps, they are connected to different branches of our private github repository. During last 1.5 weeks it's not possible to deploy anything there. We faced the same behaviour with slow performance on our environments, we tried to change our app service plans to more powerful and there is no difference, even on P2 plan. Also we opened ticket with SR number We tried to make the same environments in neighbor geo-locations, created new app service plans and no one didn't help. We created custom deployment script with next steps:
Each build attempt has the different result (failed or success), building time is always different it could take 30 minutes, 1 hour, and 5-7 hours per one website when normal time is 5-10 minutes. Last build was successful for two resources after 7 hours of building, others were failed due to time out. {
"log_time": "2015-09-20T09:25:22.8826931Z",
"id": "",
"message": "Command 'starter.cmd deploy.cmd' was aborted due to no output nor CPU activity for 60 seconds. You can increase the SCM_COMMAND_IDLE_TIMEOUT app setting (or WEBJOBS_IDLE_TIMEOUT if this is a WebJob) if needed.\\r\\nstarter.cmd deploy.cmd",
"type": 1,
"details_url": null
},
{
"log_time": "2015-09-20T09:25:22.9139291Z",
"id": "",
"message": "One or more errors occurred.",
"type": 2,
"details_url": null
},
{
"log_time": "2015-09-20T09:25:22.9295511Z",
"id": "",
"message": "Command 'starter.cmd deploy.cmd' was aborted due to no output nor CPU activity for 60 seconds. You can increase the SCM_COMMAND_IDLE_TIMEOUT app setting (or WEBJOBS_IDLE_TIMEOUT if this is a WebJob) if needed.\\r\\nstarter.cmd deploy.cmd",
"type": 2,
"details_url": null
} I've created dummy webapp illegal.azurewebsite.net. and another resource group: |
Hi Maksim, I actually investigated your case yesterday based on your ticket, and had some findings which I sent to support team. TL;DR: the slowness is caused by out-of-memory conditions on the Medium VM. You need to reduce memory usage (suggestions below), or move to a Large. I’ve identified a few reasons for the high memory usage. b**-develop-no***** site using super high memoryI don’t know what this site is doing, but it has a strange memory usage pattern: it starts low, then quickly climbs to over 2GB (within a minute or two). And then it suddenly drops back to almost nothing (like 10MB). And it instantly starts climbing again. This is happening continually. 2GB usage for one w3wp process is very significant on a VM with a total of 3.5GB. Higher usage during builds due to new compilerThis is what I mentioned above, and it is a recent regression, in the sense that we recently switched to use the new toolset by default. Please have them try the suggestion below: set MSBUILD_PATH=D:\Windows\Microsoft.NET\Framework\v4.0.30319\MSBuild.exe in the Azure App Settings. Too many sites on the same App Service PlanThey have 36 sites running in that App hosting plan (counting staging sites, which are separate sites). The strange thing is that you have an App Hosting Plan named b*****-dev and another named b**_-develop. Yet it seems all the sites are in the first one, even if the site names start with b**-develop-_. I wonder if this is unintentional, and you meant to put them in the other Plan, which is currently unused? |
If needed, you can contact me directly at david.ebbo AT microsoft |
Hi David I just tried to chart the period of time when I was doing the build on Friday but the portal won't let me change the date range. It was between 16:30 and 17:15 GMT on Friday on the Medium worker. |
I've reverted back to old Kudu on the Medium worker and the build is back down to 5 minutes. On Monday, I will try continuously deploying one after the other for as much of the day as I can to see how things stack up. I will leave my 2 Small workers running the current Kudu build for comparison. |
Scale out probably won't help, as each VM will be running the same code, with the same memory constraints. But scaling up will, as it gives the VM(s) more resources. The purpose of App Service plans is to run difference sites in different pools of VMs, which you can scale up/out independently of each other. The higher memory issue with the new compiler doesn't seem to affect all apps. Once we get a good repro that has the issue and that we can run ourselves, we can hopefully get to the next level on investigation. |
I've been running some additional tests today, which unfortunately have not cleared things up in the slightest! I've set up the following configuration:
As you can see, the first couple of builds are what we have been seeing, running the new MSBuild takes significantly longer than old MSBuild on current Kudu, which in turns takes longer than old MSBuild on old Kudu. Then from Build 3 onwards, everything is fine on all environments. It is worth noting that before these troubles began, all instances were small. I've just upped them to Medium to try and reduce the impact of the MSBuild problems on the Production sites due to resource contention. The builds are faster than the usual 5 minutes, most likely due to upping the instance sizes.
My current thinking is that reverting to Kudu is a red herring as it doesn't consistently improve things. It may be that changing the Kudu version is resetting something on the instance that is the real root of the problem, perhaps coupled with MSBuild itself, although I am starting to wonder if MSBuild isn't also a red herring. The only truly common factor seems to be the Azure App Service itself, although if that were the case, I would've thought a great deal more customers would be affected by this. If I can run any further tests, please let me know. I'm really eager to get to the bottom of this, as I'm sure you are. |
@AlanParr thanks for doing all this testing! The key thing to remember is that this issue is not about the new compiler being much slower, but primarily about it taking more memory. What causes it to be slower is if the higher memory usage causes the VM to be low on memory and start thrashing. At that point, everything running on the VM will become much slower. That's likely why the test results are inconsistent. We you'll want to do is correlate them to the VM memory usage, which you can get from the new Portal (under App Service Plan). Any change you can share the test repo to a private bitbucket repo and give me access (davidebbo)? Hopefully, I'll then be able to repro the behavior. |
I'll get that sorted and email you the details. |
@davidebbo We're experiencing the exact same things as @AlanParr for the past week as well as getting random Kudo errors such as:
After resolving the 1st and 2nd issues which have never happened before deployments take at least 30 minutes, the same as @AlanParr. The 3rd issue we haven't corrected yet as it sometimes works and sometimes doesn't, but when it happens EVERY website on the standard websites goes down. Also we are experiencing the following issues:
@davidebbo Should we also try going back a version for Kudo and see if that works? |
In case someone missed the fix that was indicated above, like I did. In Azure Web Apps portal, go to the Azure App Settings and set in the configuration the following key value setting. This will use the older non Roslyn based compiler. Reason as indicated above:
|
@cthames When all this began I was on a Small instance. As well as the MSBUILD_PATH fix, I've also found moving up to a Medium does sometimes reduce the impact of the issue when it occurs. |
@AlanParr Yeah, I just confirmed we're on Standard Small for all 8. We thought we had 4 on Small and 4 on Medium but that doesn't seem to be the case anymore. I'll need to look into but maybe we can't have different instance sizes within the same App Service Plan anymore. Still two things are a little disconcerting.
|
@cthames A few things:
|
Ok, I have some potentially good news. I talked to some compiler people and spent a ton of time trying things. Here are the key finding:
|
Have performed a few deployments with this change and so far so good. Builds on all instances took between 4 and 6 minutes for the first build, 2-4 for subsequent builds. CPU usage is quite variable, ranging from 14% to a maximum of 24% but it's not spiking as before and doesn't seem to be causing any issues. Note: Rather than change SCM_BUILD_ARGS on all sites, I added the setting in to my deployment script at the end of the msbuild argument string just before SCM_BUILD_ARGS. Same effect, less work :) |
@davidebbo I've confirmed the |
Great, thanks for confirming. I have updated the known issues page with this new guidance. We are going to update Kudu to pass this flag automatically, hopefully in the next few days. However, if you have a custom deployment scripts, you'll need to make that change in there (or stay with the App Setting). |
To be more specific, the fix we will make is to modify kuduscript to always pass |
how exactly do I apply the fix for this? I tried adding /p:UseSharedCompilation=false as one of the paramters for msbuild in the deploy.cmd file but the deploy still takes about 20 minutes. |
I've also added it as an app setting through the azure web app as well as setting the msbuild version back.. still having issues with the length of the deployment.. is there anything else I should be looking for? |
Just look at your deployment output to see if the switch is getting passed to msbuild. If it is, then you are likely dealing with an unrelated issue. |
is there anything else that I should be looking for? I'm not really sure where to start.. it is an Orchard project which is a pretty big solution.. do you think that could be why it is taking this long? |
Did you verify that |
I did, okay I will open a new issue now. |
Is this issue resolved can we get rid of the |
Yes, you should not have to pass this flag now. |
Hi @davidebbo, A similar behaviour is happening on our customer environment. We've been advised by Azure Support that Site Precompilation should fix the problem. Interestingly they provided instructions for .NET 3.0?. We are deploying ASP.NET Web Apis (.NET Framework 4.5.2) We are going to try pre-compilation, however, the customer would like to get a definitive answer from MS regarding this. Based on what you've seen, is pre-compilation the best approach? Is Thanks, |
@pacodelacruz that is an old issue that is no longer relevant, so my guess is that you are running into an unrelated condition. |
Thanks @davidebbo. According to MS support, the problem is that vbcscompiler.exe is taking more than 60 mins to complete, and it is also occurring randomly, it does not happen on every instance. We'll update the thread if we find out the root cause. |
Normally, from pushing to build complete takes about 5 minutes. In the last week or so, it is taking between 15 and 30 minutes.
Initially, I thought it was because I had overridden MSBUILD_PATH to use the C#6 msbuild, however I have removed this from one site and the build still takes the same amount of time. Reviewing the Kudu trace log shows it is taking nearly 20 minutes to copy the build output to temp, the solution isn't that big so it really shouldn't take very long.
Relevant snippet of Kudu Trace:
2015-09-14T11:40:23 Copying all files to temporary location below for package/publish:
2015-09-14T11:40:23 D:\local\Temp\8d2bcf7a334374d.
2015-09-14T11:57:47 KuduSync.NET from: 'D:\local\Temp\8d2bcf7a334374d' to: 'D:\home\site\wwwroot'
The text was updated successfully, but these errors were encountered: