-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A better cache clean-up and expiration policy #4980
Comments
Related: #2308 |
Yeah, I'm not a fan of an explicit clean as the sole mechanism. We have other discussions on how to hide package restore even more by making I think having a |
@terrajobst You still would need some process to clean the cache up? I think and explicit command to purge is the best way. Some of the requirements:
Today we do not keep any metadata around source, signature or |
The way I'd design the cache is like this:
Not sure (1) is viable for perf reasons. If it's not, only bump the dates every so often. The meta point is that a cache without expiration control isn't a cache -- it's a leak. |
Perf is one consideration. |
It has to be automatic which means it has to be part of restore. |
Ideally yes but without impacting restore perf :) |
Agreed, but it seems there must be a compromise between spamming my drive and impacting my package restore times. Maybe we can address 90% of this if .NET Core wouldn’t be modeled as a large number of packages though. |
@terrajobst - what's the current thinking here? Is there anything beyond early discussion in this direction? All ears. |
It’s all early discussions for .NET Core 3.0 |
FWIW I put this little console app in a scheduled task. It deletes any package not accessed the past 30 days: using System;
using System.IO;
using System.Linq;
namespace NugetCacheCleaner
{
class Program
{
static void Main(string[] args)
{
int minDays = 30;
if (args.Length > 0 && int.TryParse(args[0], out minDays)) { }
string nugetcache = $"{Environment.GetFolderPath(Environment.SpecialFolder.UserProfile)}\\.nuget\\packages\\";
DirectoryInfo info = new DirectoryInfo(nugetcache);
long totalDeleted = 0;
foreach (var folder in info.GetDirectories())
{
foreach (var versionFolder in folder.GetDirectories())
{
var folderInfo = versionFolder.GetFiles("*.*", SearchOption.AllDirectories).GroupBy(i => 1).Select(g => new { Size = g.Sum(f => f.Length), LastAccessTime = g.Max(f => f.LastAccessTime) }).First();
var lastAccessed = DateTime.Now - folderInfo.LastAccessTime;
if (lastAccessed > TimeSpan.FromDays(minDays))
{
Console.WriteLine($"{versionFolder.FullName} last accessed {Math.Floor(lastAccessed.TotalDays)} days ago");
try
{
versionFolder.MoveTo(Path.Combine(versionFolder.Parent.FullName, "_" + versionFolder.Name)); //Attempt to rename before deleting
versionFolder.Delete(true);
totalDeleted += folderInfo.Size;
}
catch { }
}
}
if (folder.GetDirectories().Length == 0)
folder.Delete(true);
}
var mbDeleted = (totalDeleted / 1024d / 1024d).ToString("0");
Console.WriteLine($"Done! Deleted {mbDeleted} Mb");
}
}
} |
If you're lazy like me, you might want to use @dotMorten's code through this |
A bit tangential to this issue but I would like to make a comment on the placement of the NuGet cache on the file system. The cache is inside the On Windows the tilde While Microsoft is providing less and less specific guidelines on how to use these known folders I would be so bold as to say that applications should not create their own folder in the user profile folder. This is the user's space and dumping application data here is not a good user experience. Also, adding a dot in front of a folder does not make any sense on Windows. These folders are by default hidden on Unix-like operating systems but not so on Windows. In fact, you are unable to use Windows Explorer to create or rename a folder when the name of the folder starts with a dot. The correct placement of a cache on Windows is in the subfolder More generally and on Windows, please move all dot-prefixed application data folders and files away from the user profile folder and into the local application data folder where they belong. 🙂 |
Totally agree.
Agree as well. |
@Liversage |
Insight: We only need to clear the cache when a new nuget is added to itIf we are not adding to the cache, it's certainly not getting bigger, so what's the point in checking it? Also, when we are adding to the cache, the cost of deleting old nugets may be eclipsed by downloading new nugets (unpredictable over the Internet), unzipping, rewriting This strikes the balance @terrajobst mentioned:
Option: Run cache cleanup when threshold is reachedWhen unpacking a new nuget:
Where the size threshold Yes, this means the cache can grow over past the threshold, but we also don't want to delete nugets that might reasonably still be in use. This is a heuristic meant to help the common case, after all. Users can still clear the cache with:
Several options on when to add sizes and delete:
Option: Delete in background threadIf performance is still a factor, we can additionally delete in a background thread:
Some care will need to be done to not kill an in-progress delete. Here are two options:
|
From an email discussion regarding this topic: "...Everyone’s nuget cache and .nuget\packages folder is filling up as we upgrade packages. The only solution we have currently is to delete the whole packages folder and re-download everything, which can be a huge hit on DevOps especially while working remotely. It would great if we had a way to say “delete all the nuget packages except for these packages @ the specified version”. |
Interesting, the restore time when it did not need to modify last modified times are about the same as when it does update the last write time. This doesn't line up with my microbenchmarks where checking the last write time, without updating it, was only very slightly slower than All your tests with the modification are consistently slower than the test without your changes, which gives strong evidence that this code change does impact performance, but the results don't have the same trend as the microbenchmarks, so I don't feel like I understand what's really going on here. I think we need more information. Maybe microbenchmarks of |
Running these benchmarks on both
Yes, I'll try to put that together and let you know. |
Perhaps running the script more times, or changing the script to execute more runs, will increase our confidence. This is exactly what BenchmarkDotNet does when it detects what it's running has high standard deviation. Low standard deviation benchmarks run 15 times, high standard deviation benchmarks get up to 100 runs. The randomness/variability of starting an exe is the same between the dev and feature branch builds of nuget.exe, so larger number of runs will smooth it out. I didn't do a statistical analysis to check if the mean of the before-vs-after runs are within the "margin of error", but it certainly looks like your feature branch is systematically slower than the dev branch. Unless someone can point out a reason why this method of testing is systematically unfair to one nuget.exe and not the other, then I believe that a larger number of tests is sufficient. Next time I can also calculate standard deviation to check if the mean values of the two nuget.exes are within "margin of error". That's the limit of my statistical skills, however. In another issue you mentioned that your feature branch is changing the timestamp of files not only in the global packages folder, but also read-only fallback folders. This is not only a bug, but also means that your feature branch is doing more work than is necessary, and therefore we could improve performance by fixing the bug. An alternative is to ensure that benchmarks are using a nuget.config that clears fallback folders (the example doesn't show how to clear, but it's the same as package sources, use
For what it's worth, I believe that in December I'll have capacity to work on this for a few days. I don't want to "steal credit" for your work, so I can find other work to do if you'd like to drive this to completion yourself and have a PR/commit with only your name. Otherwise I can copy your branch, so your name is in the commits and try to complete it myself. Having said that, December is a month away, so if we don't encounter too many more problems you might be able to complete it before then anyway. |
I have ran the script dozen of times on either branches and get wildly different results. If my CPU is already at 20% usage from other apps just before running the script, perf numbers go x2 easily.
Right, I've made a bad hack to fix this temporarily mfkl/NuGet.Client@5d2c27a. This change won't fix the result deviations though, only using benchmarkdotnet would I'm afraid.
Thanks, I appreciate your concern, but I don't really care about credits for this. As long as Software gets better, I'm happy :-) |
I stumbled over this issue in relation to another topic and I would like to give just a small input for consideration on clean up policies. I wasn't able to quickly grasp all details of the already long ongoing discussion so I dare to just drop my input in the hope it fits into the current plans. NuGet supports floating versions which can be quite useful if you have continuous deployment of your packages (in our case we have still old fashioned nightlies for some package deploys). If you go for a semver like It would be great if the cleanup policy could be shorter by default for such pre-release packages. I would expect in most of the cases you would always only need the maybe one or two latest pre-release version of one tag for a particular library version in the local cache. |
Needed to manually delete some large older packages from my cache today to make room on my disk, and found this issue. As Immo said, this is a leak, and as such I am disappointed that it hasn't been prioritized more highly. There are two potential problems I can see with the spec as proposed:
/cc @zivkan |
Thanks to @mfkl for implementing the "core" part that will update the last access time of one file in each package on restore: NuGet/NuGet.Client#4222 I'm guessing it's going to ship in .NET SDK 6.0.300 and VS 17.2. Once it's shipped, and we start using it for all our solutions, then we'll be able to use tools like https://github.com/chgill-MSFT/NuGetCleaner |
Personally I'd prefer to have an option that kept the last n number of versions downloaded of each package instead of simple date-based expiry. |
|
edit by @zivkan: a first step has been implemented, allowing package "last used in restore" date to be updated: NuGet/NuGet.Client#4222
edit by NuGet Client Triage:
original post below:
The user local NuGet cache has a tendency to grow unconditionally. Which results in funny tweets like this from @davidfowl:
Can we track when an item was last hit by package restore and apply some expiration policy? I don't have good data, but I bet a good chunk of the bloat aren't individual packages with a single version but a small number of packages with many versions. If were to just delete older versions that weren't needed for restore for a while, we're probably getting to something saner.
Thoughts?
The text was updated successfully, but these errors were encountered: