-
-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retain URL hash cache, cache file hashes #2940
Conversation
However setting |
That would create a race condition, other processes could sneak changes into the cache in that interval, as small as it would be. |
This is also likely the cause a huge amount of IOPs, which when using magnetic storage is charged per million (pretty cheap though), where as ssd they're part of the hourly. I was considering rebuilding with ssd, but I'll hold off to see what the impact of this change is. Likely we're going to be able to increase our indexing rate as well! It's likely the FS updating the Access Time is firing the notify. We could mount our cache using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay then, let's keep it as is. This is a big improvement, the few times a new download is actually saved in the cache are negligible.
Really curious how much this affects performance in the end.
Motivation
We'd like to reduce the CPU usage of the Inflator component of the infrastructure. Toward that end, we added some more debug statements and observed
netkan.exe
's output carefully, which turned up some surprising issues...Background
NetFileCache
uses aDictionary
to map from the URL hash (first 8 characters of the download URL's SHA1 interpreted in hexadecimal) to the file path in cache. This is intended to find cached files quickly without checking every single file. To try to protect this mapping from unexpected changes, we listen to events fromFileSystemWatcher
and clear this dictionary whenever they fire, which results in it being regenerated the next time it's needed.The
DownloadAttributeTransformer
is in charge of populating file hashes and other properties based on the contents of each ZIP, some of which are pretty large:The Inflator's download cache persists across passes, and in a typical pass every download is already cached. If modders have been very active, the number of new or changed downloads may approach the high single digits.
Problems
FileSystemWatcher.Changed
seems to fire even if you just read info about a file in the cache. In one test using my 50 GB download cache to inflate a cached module, it triggered 186499 times!! This means that theDictionary
mapping was almost never used more than once; it would benull
at the start of virtually any cache access, which would force it to be regenerated, then it would be used once, and then it would be almost immediately wiped out again. This amounts to searching every file for every access, a waste of CPU and disk.Changes
log.Debug
statements are added tonetkan.exe
to make it easier to see what it's doingFileSystemWatcher.Changed
anymore. This will avoid repeated unnecessary purges of the file cache data, hopefully amounting to a significant savings for a long-running Inflator process. Similarly, other operations that previously cleared the cache object are now changed to merely make an appropriate small update to it, such asStore
andRemove
.Dictionary
. If we need the same hash for the same file again, we check theDictionary
first and use it if found, skipping the computationally expensive recalculation of the hash from the file on disk. TheDictionary
objects are also updated or cleared anytime we perform a similar operation for the main cache data.Overall this should result in a significant savings in the Inflator's CPU consumption.