-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mrs_realloc: avoid copies when space is available #2133
Conversation
This should improve the situation with running sort(1) with very large files (#2131) |
Unfortunately, this doesn't appear to help much for sort. I haven't yet tried using utrace() probes to see how often we're allocating a new buffer now vs. before, though. |
ee18917
to
f0ab240
Compare
After switching to an offset check this might actually help a little. |
It helps, but only a little. :) Sorting the original cscope identifier list takes about 45s with a modified sort, and OOMs when I use unmodified sort, with or without this libc patch. If I enable utrace MRS probes and sort only the first million lines of the list, I see 353695 UTRACE_MRS_REALLOC records without this patch, versus 280436 with the patch, so a ~20% drop. With async revocation disabled (to making timing less noisy), I don't see any significant difference in runtime with or without the patch. Not enough to fix the pathological behaviour, but a worthy change anyway IMO. |
I wonder if there are a class of reallocs that stock jemalloc, etc. are able to handle in-place that CHERI cannot because we are only able to realloc if the bounds were overly-broad due to precision. That is, I wonder if it might make sense to do some bounds inflation for PAGE_SIZE or greater allocations? Perhaps dtrace on stock aarch64 would give a quick way to determine how many reallocs() have to copy, and what sizes those copy vs non-copy map to? (Maybe you could figure this out with utrace() as well?) |
Some experiments and code reading suggest that jemalloc always returns tightly bounded pointers, whereas snmalloc only ensures that pointers are bounded to the true allocation size (i.e., whatever malloc_usable_size() would presumably return). In that case, I would expect Brooks' optimization to be much more effective, since for large enough sizes the allocation granules are going to be powers of 2. I repeated this test with snmalloc as the malloc implementation, and my test finishes much more quickly. However, snmalloc appears to be triggering a bug in the async revoker such that revocation never finishes. I note that snmalloc allocates large amounts of VA space, but the revoker should be able to skip over such ranges very quickly, so something else is going wrong. I will debug further this week, but this result suggests that we should try to make jemalloc behave more like snmalloc in this respect. |
f0ab240
to
b2b15d5
Compare
I think there are two things we could look at here:
|
We've talked about adding mremap() to FreeBSD before; I don't think there's any opposition to that. I don't really understand the last parenthetical - could you please elaborate? |
I'd bet that most users of mremap with |
Assuming we're talking about the Linux interface, Then, I think you're proposing that mremap() with flags == 0 should fail unless the new size is smaller than the old one. I guess that's fine, it would be straightforward to require a flag as you suggest. |
Indeed, I had misremembered that the default was not moving and should have re-read the flag descriptions. I do think if we implement it we should implement the linux interface more or less not the never-implemented version documented in the 4.3(?) system manual. |
I'll take a stab at changing jemalloc as described, let's see how that goes first. |
b2b15d5
to
f0f1bd4
Compare
This is #2147. |
When the passed pointer is valid and already has enough space, return it unless it's at least twice the requested size. Some software uses realloc to extend allocations by small, fixed amounts. In a naive implementation of realloc that blindly performs a new allocation, copies the data, and then frees the old allocation this is O(n^2). For small values of `n` this isn't too noticable, but asymptotically it's quite bad and revocation makes it much worse. With current mrs tuning, we eventually reach a point where approximately every 4th increment triggers a revocation pass. We can make this somewhat less visible (or fix it entirely for snmalloc with it's power-of-two sized buckets) by omitting unnecessary allocations and copies.
f0f1bd4
to
815b236
Compare
When the passed pointer is valid and already has enough space, return it unless it's at least twice the requested size.
Some software uses realloc to extend allocations by small, fixed amounts. In a naive implementation of realloc that blindly performes a new allocation, copies the data, and the frees the old allocation, this is O(n^2). For small values of
n
this isn't too noticable, but for asymptotically it's quite bad and revocation makes it much worse. With current tuning, we eventually reach a point where approximately every 4th allocating triggers a revocation pass.We can make this somewhat less visible (or fix it entierly for snmalloc with it's power-of-two sized buckets) by omitting unnecessicary allocations and copies.
Bias towards improveing linear increase performance over space saving and only reallocate to shrink the allocation if we'd save at least 50% space. If we chose a smaller value then we end up doing full reallocs constantly with an allocator like snmalloc.