-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
configuration option for maximum dune cache size #8274
Comments
Unfortunately a lot of the work that goes into Rather than adding slow size checks to dune build itself, this might be better served by a dedicated shared cache server. You can imagine when we have a server handling a distributed cache we could allow it to manage the size better than we currently do. cc @rgrinberg |
I was curious and took a look at what ccache does. From what I read (https://ccache.dev/manual/4.3.html#_cache_size_management), they maintain counters for size and number of cached files for each of 16 subdirectories in the cache. The stated reason for multiple files is performance and concurrency. |
cc @snowleopard any opinion on this? |
We've recently implemented some eager cache trimming in Jenga, which is not exactly what is being asked here but along the lines of making trimming happen during the builds, as opposed to being scheduled. Personally, I'd welcome some work in this direction, though ideally it would happen after our internal migration from Jenga to Dune is complete (maybe in a few months), as I expect any changes around caching to be pretty disruptive right now. |
I started having a look at this.
|
@emillon On Unix we have If the cache limit is set, we could try running these commands when dune exits (not sure about watch mode). It would then do a rough comparison and tell the user that they should run We would then make |
Sorry, for that part I meant that there might be FS-specific operations to give estimates without calling |
We could cache the stat calls that |
The right fix is probably to keep track of the size of the cache as we're writing to it. That's an invasive change that we shouldn't undertake at the moment. In the meantime, I would suggest that we implement eager cache trimming and see how far that gets us. @pmwhite do you think you could import eager cache trimming? |
Yeah, once it is implemented, I can try importing it. |
Unless I'm misremembering, I think it's already implemented. |
Yes I was about to ask for clarification about what is meant by "eager cache trimming". |
We implemented the "eager cache trimming" feature in Jenga internally. When Jenga runs an action, it deletes the previous versions of the action's targets from the cache, it they are unused in other workspaces. It works pretty well in practice, especially if you keep tweaking a test over and over (in which case you often end up with dozens of old versions of the test runner binary in the cache). We plan to implement "eager cache trimming" in Dune too and upstream it in the next month or so. |
Any news on this? or the more general question of bounding cache disk usage? |
We've implemented eager cache trimming in our internal version of Dune, but have been struggling with upstreaming any of our changes (we planned to organise an upstreaming workshop with Tarides but it had to be postponed until 2025). Regarding "bounding cache disk usage": our approach is to run the cache trimming job regularly (once per hour). I'm not sure this approach is going to work externally, since there is no universal way to set up regular jobs. We might want a more portable and bespoke solution externally. |
Dune could try to treat the symptoms:
(of course all the above values should be configurable) This would avoid wasting too much time when dune build is called repeatedly during development, and still avoid the most common problems (running out or low on disk space). Other options are OS/FS specific:
The most portable OS-level solution would be to have a separate userid or groupid for dune, and then FS-level quotas could be enforced for this (although still likely require root privileges to set up, which means it won't be possible to set it up in shared environments) |
Desired Behavior
We would like for dune to have a config option for a maximum cache size. This size can be maintained as builds proceed so that the user need not worry about how quickly the cache grows or maintain out-of-band processes to periodically trim the cache.
Motivating use case
At Ahrefs, we have a pool of persistent build hosts to handle many build jobs in parallel, and take advantage of a persistent git monorepo clone and cached build files re-used in subsequent build job invocations. We run a buildkite agent on every build host. As well as re-using the state of the environment after some setup steps, we benefit from using dune cache. The problem is that the cache can grow without bound unless it is trimmed periodically. While we could introduce a build step either before or after a build job to run the trim operation, this eliminates much of the benefit of improved speed from using the cache. When trimming to 50GB the process can take a couple of minutes and that just adds to the total time to complete the build job.
To deal with the issue of an ever-expanding cache without introducing extra time-consuming steps in our pipeline, we currently schedule a dedicated trim pipeline to run on all known agents. The issue with this is that there's no way to know the appropriate scheduling for these out-of-band processes and they end up periodically blocking availability of the agents, particularly when load is high and they are most needed. Furthermore, we have found the step of querying and/or maintaining the list of agents to be brittle. Buildkite, like similar build scheduling tools is designed to hand out work to any one available agent, not script something to run on all of them. Agents may be added, disconnected, disabled, or re-enabled at any time so it doesn't make sense to run any given process across the whole pool.
This brings us to this request. The very nice to have (and in the spirit of the original design for the dune cache) would be for the trimming to happen as-needed in real-time so that the user is guaranteed some upper limit on the size without needing to always run the trim process and incur some cost each time.
The text was updated successfully, but these errors were encountered: