Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tiny Vecs are dumb. #72227

Merged
merged 1 commit into from
May 19, 2020
Merged

Tiny Vecs are dumb. #72227

merged 1 commit into from
May 19, 2020

Conversation

nnethercote
Copy link
Contributor

Currently, if you repeatedly push to an empty vector, the capacity
growth sequence is 0, 1, 2, 4, 8, 16, etc. This commit changes the
relevant code (the "amortized" growth strategy) to skip 1 and 2, instead
using 0, 4, 8, 16, etc. (You can still get a capacity of 1 or 2 using
the "exact" growth strategy, e.g. via reserve_exact().)

This idea (along with the phrase "tiny Vecs are dumb") comes from the
"doubling" growth strategy that was removed from RawVec in #72013.
That strategy was barely ever used -- only when a VecDeque was grown,
oddly enough -- which is why it was removed in #72013.

(Fun fact: until just a few days ago, I thought the "doubling" strategy
was used for repeated push case. In other words, this commit makes
Vecs behave the way I always thought they behaved.)

This change reduces the number of allocations done by rustc itself by
10% or more. It speeds up rustc, and will also speed up any other Rust
program that uses Vecs a lot.

In theory, the change could increase memory usage, but in practice it
doesn't. It would be an unusual program where very small Vecs having a
capacity of 4 rather than 1 or 2 would make a difference. You'd need a
lot of very small Vecs, and/or some very small Vecs with very
large elements.

r? @Amanieu

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 15, 2020
@nnethercote
Copy link
Contributor Author

Some local instruction count results, for check-full and debug-full runs:

many-assoc-items-check
        avg: -3.8%      min: -3.8%      max: -3.8%
many-assoc-items-debug
        avg: -3.7%      min: -3.7%      max: -3.7%
issue-46449-debug
        avg: -3.4%      min: -3.4%      max: -3.4%
unused-warnings-check
        avg: -3.1%      min: -3.1%      max: -3.1%
deeply-nested-debug
        avg: -2.8%      min: -2.8%      max: -2.8%
unused-warnings-debug
        avg: -2.8%      min: -2.8%      max: -2.8%
regression-31157-debug
        avg: -2.7%      min: -2.7%      max: -2.7%
cargo-debug
        avg: -2.7%      min: -2.7%      max: -2.7%
webrender-debug
        avg: -2.6%      min: -2.6%      max: -2.6%
hyper-2-debug
        avg: -2.6%      min: -2.6%      max: -2.6%
syn-debug
        avg: -2.5%      min: -2.5%      max: -2.5%
encoding-check
        avg: -2.5%      min: -2.5%      max: -2.5%
regex-check
        avg: -2.4%      min: -2.4%      max: -2.4%
piston-image-debug
        avg: -2.4%      min: -2.4%      max: -2.4%
clap-rs-debug
        avg: -2.3%      min: -2.3%      max: -2.3%
syn-check
        avg: -2.2%      min: -2.2%      max: -2.2%
webrender-check
        avg: -2.1%      min: -2.1%      max: -2.1%
cranelift-codegen-debug
        avg: -2.1%      min: -2.1%      max: -2.1%
ripgrep-check
        avg: -2.1%      min: -2.1%      max: -2.1%
futures-debug
        avg: -2.1%      min: -2.1%      max: -2.1%
encoding-debug
        avg: -2.1%      min: -2.1%      max: -2.1%
packed-simd-debug
        avg: -2.0%      min: -2.0%      max: -2.0%
hyper-2-check
        avg: -2.0%      min: -2.0%      max: -2.0%
packed-simd-check
        avg: -2.0%      min: -2.0%      max: -2.0%
piston-image-check
        avg: -1.9%      min: -1.9%      max: -1.9%
regex-debug
        avg: -1.9%      min: -1.9%      max: -1.9%
cranelift-codegen-check
        avg: -1.8%      min: -1.8%      max: -1.8%
cargo-check
        avg: -1.8%      min: -1.8%      max: -1.8%
futures-check
        avg: -1.8%      min: -1.8%      max: -1.8%
tokio-webpush-simple-debug
        avg: -1.8%      min: -1.8%      max: -1.8%
unicode_normalization-debug
        avg: -1.7%      min: -1.7%      max: -1.7%
clap-rs-check
        avg: -1.7%      min: -1.7%      max: -1.7%
serde-debug
        avg: -1.6%      min: -1.6%      max: -1.6%
webrender-wrench-check
        avg: -1.6%      min: -1.6%      max: -1.6%
ripgrep-debug
        avg: -1.5%      min: -1.5%      max: -1.5%
serde-check
        avg: -1.5%      min: -1.5%      max: -1.5%
webrender-wrench-debug
        avg: -1.5%      min: -1.5%      max: -1.5%
regression-31157-check
        avg: -1.5%      min: -1.5%      max: -1.5%
tokio-webpush-simple-check
        avg: -1.4%      min: -1.4%      max: -1.4%
html5ever-debug
        avg: -1.4%      min: -1.4%      max: -1.4%
unicode_normalization-check
        avg: -1.3%      min: -1.3%      max: -1.3%
await-call-tree-debug
        avg: -1.3%      min: -1.3%      max: -1.3%
html5ever-check
        avg: -1.1%      min: -1.1%      max: -1.1%
ucd-debug
        avg: -1.1%      min: -1.1%      max: -1.1%
await-call-tree-check
        avg: -1.0%      min: -1.0%      max: -1.0%

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented May 15, 2020

⌛ Trying commit dac8c474a4a0cc81ba4936bf0d55841c79e1fb0d with merge 291f8c65f12a6ed4401e0c6cb477f3429c32b9ac...

@Amanieu
Copy link
Member

Amanieu commented May 15, 2020

I wonder if we can do the same thing for HashMap: rust-lang/hashbrown#47. I left that issue open because I wasn't sure how often small (1-3 elements) hashmaps are used.

@leonardo-m
Copy link

I left that issue open because I wasn't sure how often small (1-3 elements) hashmaps are used.

In Python even small associative arrays are used often, in Rust I think they are quite less common (for various reasons, one reason is the lack of handy hashmap literal syntax).

@bors
Copy link
Contributor

bors commented May 15, 2020

☀️ Try build successful - checks-azure
Build commit: 291f8c65f12a6ed4401e0c6cb477f3429c32b9ac (291f8c65f12a6ed4401e0c6cb477f3429c32b9ac)

@rust-timer
Copy link
Collaborator

Queued 291f8c65f12a6ed4401e0c6cb477f3429c32b9ac with parent 0271499, future comparison URL.

@ollie27
Copy link
Member

ollie27 commented May 15, 2020

Fun fact: until just a few days ago, I thought the "doubling" strategy
was used for repeated push case.

I thought the same. It turns out that used to be the case until #50739. We will definitely need a regression test for this.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit 291f8c65f12a6ed4401e0c6cb477f3429c32b9ac, comparison URL.

@Amanieu
Copy link
Member

Amanieu commented May 15, 2020

max-rss shows a regressions in the results, but I'm not sure how reliable that indicator is or how much we care about it.

@Amanieu
Copy link
Member

Amanieu commented May 15, 2020

I wonder how much more speedup we can get by skipping straight to 8 elements, but I am hesitant since we don't seem to have a good way of tracking memory usage.

@hanna-kruppe
Copy link
Contributor

I'm not sure how representative the rustc results really are, since SmallVec is used heavily wherever profiling revealed that many vectors with few elements were allocated. In other words, rustc went through a lot of effort to not have tons of small (say, 1-3 elements) vectors in the first place, while many other programs with similar workloads probably haven't.

@nnethercote
Copy link
Contributor Author

@Amanieu: The bad news is that max-rss is highly unreliable, alas. E.g. look at this noise run, which compares two revisions with no signficant differences.

The good news is that I can get accurate peak heap memory measurements with DHAT. I can do some measurements with skip-to-4 and skip-to-8 on Monday.

@nnethercote
Copy link
Contributor Author

rustc went through a lot of effort to not have tons of small (say, 1-3 elements) vectors in the first place, while many other programs with similar workloads probably haven't.

It's true that rustc uses SmallVec a lot. But this means that the rustc measurements might underestimate the speed improvements that other programs might get. Think about vecs with two elements:

  • If you use SmallVec<[T; 2]>, you've already eliminated the allocations and this PR has no effect.
  • If you use Vec<T>, this PR changes the number of allocations from 2 to 1, saving time.

For vecs with four or more elements, the number of allocations in the Vec<T> case is reduced by 2, even better.

@nnethercote
Copy link
Contributor Author

Having said that, I can see that rustc's heavy use of SmallVec might underplay the memory usage impact as well.

But in general I'm not worried about memory usage. Vec makes no promises about capacities. If you have a program where the capacity of 1 and 2 length Vec's has a critical impact on memory usage (e.g. due to having many short vectors and/or short vectors with very large elements) then you should use a collection type that provides clear guarantees about capacity, such as SmallVec. And I think such cases will be rare, anyway.

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented May 16, 2020

rustc went through a lot of effort to not have tons of small (say, 1-3 elements) vectors in the first place, while many other programs with similar workloads probably haven't.

It's true that rustc uses SmallVec a lot. But this means that the rustc measurements might underestimate the speed improvements that other programs might get. Think about vecs with two elements:

* If you use `SmallVec<[T; 2]>`, you've already eliminated the allocations and this PR has no effect.

* If you use `Vec<T>`, this PR changes the number of allocations from 2 to 1, saving time.

For vecs with four or more elements, the number of allocations in the Vec<T> case is reduced by 2, even better.

Fair point, but there are also factors pulling in the opposite direction. For Vecs with 1 element (also a common case in some parts of rustc), you don't save any reallocations. Even with 2 elements, you "only" save one reallocation instead of two, so depending on the distribution of Vec sizes in a program you'll see a proportionally smaller speed-up (imagine an application spending about as much time on reallocating Vecs as rustc does today, but more of it on tiny Vecs). And of course, the impact on memory usage is not entirely separate from performance (e.g. in a typical size-class-based allocator, Vecs with 1-2 small elements may waste half of each cache line that could hold other useful data).

None of this is to say I think this is a bad change overall, it's just not quite so obvious to me how the effects on rustc will translate to other code bases and I really wish we had a good benchmark suite for runtime (not just compile time) of Rust applications.

@ChrisJefferson
Copy link
Contributor

Would it be worth making the size increase dependent on the size of the unserlying object? In particular, if it is very small (1 or 2 bytes), you probably lose nothing at all starting at an 8 or 16 byte allocation (depending on memory manager)

@nnethercote
Copy link
Contributor Author

@ChrisJefferson: that's a good idea. And if the elements are really big (256B? 1024B?) we could ratchet down the minimum capacity. I will do some measurements.

@nnethercote nnethercote force-pushed the tiny-vecs-are-dumb branch from dac8c47 to 37dd3b6 Compare May 18, 2020 04:34
@nnethercote
Copy link
Contributor Author

Here are the results for four different minimum non-zero sizes: 1 (original), 2, 4, and 8.

CACHEGRIND: INSTRUCTIONS EXECUTED (M)
            Tiny1     Tiny2          Tiny4          Tiny8
cargo       27,675M   27,420(-255)   27,179(-241)   27,131(-48)
cranelift   21,654M   21,441(-213)   21,264(-177)   21,246(-18)
futures      2,799M    2,774(-25)     2,748(-26)     2,737(-11)
many-assoc   6,433M    6,392(-41)     6,189(-203)    6,275(+86)
regex        3,005M    2,969(-36)     2,933(-36)     2,922(-11)
webrender   11,130M   11,007(-123)   10,895(-112)   10,871(-24)

Tiny2 is a clear improvement. Tiny4 is roughly 2x better than Tiny4 on all except many-assoc-items where it is almost 6x better. Tiny8 is a little better than Tiny4 on all except many-assoc-items where it is notably worse.

DHAT: PEAK HEAP SIZE (MB)
            Tiny1  Tiny2     Tiny4     Tiny8     
cargo       248    250(+2)   258(+8)   276(+18)  
cranelift   314    315(+1)   321(+6)   336(+15)  
futures      30     30(+0)    30(+0)    32(+ 2)  
many-assoc  101    102(+1)   104(+2)   109(+ 5)  
regex        53     53(+0)    54(+1)    57(+ 3)  
webrender   151    152(+1)   155(+3)   164(+ 9)

Tiny2 gives a tiny increase, Tiny4 a little bigger, and Tiny8 much more.

Things to note:

  • DHAT measures the requested size, not the actual size, which is often larger than the requested size due to rounding up. This means that the numbers above represent an upper limit for the peak increase.
  • The heap is only part of total memory usage; there is also stacks, static data, and code. max-rss figures for the above benchmarks are roughly double the peak heap, which means the max-rss increases will be roughly half the proportions in this table.

Tiny4 looks like the sweet spot, giving near-maximal speed improvements for a modest peak memory cost. I have uploaded code containing Tiny4 with slight modifications:

  • It skips to 8 if the element size is 1.
  • It skips to 1 if the element size is > 1024.

I measured that too and it gave results incredibly similar to Tiny4, but it could help with some cases that might show up sometimes.

@nnethercote
Copy link
Contributor Author

r? @Amanieu

@nnethercote nnethercote force-pushed the tiny-vecs-are-dumb branch from 37dd3b6 to 3cbc23e Compare May 18, 2020 04:42
Currently, if you repeatedly push to an empty vector, the capacity
growth sequence is 0, 1, 2, 4, 8, 16, etc. This commit changes the
relevant code (the "amortized" growth strategy) to skip 1 and 2 in most
cases, instead using 0, 4, 8, 16, etc. (You can still get a capacity of
1 or 2 using the "exact" growth strategy, e.g. via `reserve_exact()`.)

This idea (along with the phrase "tiny Vecs are dumb") comes from the
"doubling" growth strategy that was removed from `RawVec` in rust-lang#72013.
That strategy was barely ever used -- only when a `VecDeque` was grown,
oddly enough -- which is why it was removed in rust-lang#72013.

(Fun fact: until just a few days ago, I thought the "doubling" strategy
was used for repeated push case. In other words, this commit makes
`Vec`s behave the way I always thought they behaved.)

This change reduces the number of allocations done by rustc itself by
10% or more. It speeds up rustc, and will also speed up any other Rust
program that uses `Vec`s a lot.
@nnethercote nnethercote force-pushed the tiny-vecs-are-dumb branch from 3cbc23e to f4b9dc3 Compare May 18, 2020 05:27
@ChrisJefferson
Copy link
Contributor

With regards the explicit check for size 1, it looks to me like when you allocate memory in Rust you get back the "actually allocated" amount of space -- is that right? In that case, assuming that the malloc never returns less than 8 bytes of space there is no need for an explicit check. However, there seems to be various different memroy interfaces with different properties in this area.

(I was trying to do this myself, but got stuck trying to add debugging output inside liballoc, sorry)

@nnethercote
Copy link
Contributor Author

With regards the explicit check for size 1, it looks to me like when you allocate memory in Rust you get back the "actually allocated" amount of space -- is that right?

That's a good question!

The short answer is "no", and for the purposes of this PR, it means that choosing a capacity of 8 for single-byte elements is reasonable, and so I don't need to make any changes to this PR's code.

The long answer is murkier.

RawVec uses AllocRef::alloc to allocate new storage, and AllocRef::grow to reallocate existing storage. In the normal case, the AllocRef trait is implemented by Global. First, the docs.

  • The current docs for AllocRef::alloc are here. They say "On success, returns a MemoryBlock meeting the size and alignment guarantees of layout. The returned block may have a larger size than specified by layout.size()". Which suggests that the returned size need not be the actual size.
  • The current docs for AllocRef::grow are here. They say something stronger: "Returns a new MemoryBlock containing a pointer and the actual size of the allocated memory." The "actual" suggests it should return the actual size, not the requested size.

Next, the implementation.

  • The current implementation of Global::alloc is here. It returns the requested size.
  • The current implementation of Global::grow is here. It returns the requested size.

In conclusion: Global currently always returns the requested size, and the docs are a bit contradictory and unclear about whether that's the right thing to do. And this example shows that if you push two single-byte elements onto an empty Vec you get a capacity of 1 and then 2, which fits with that.

There is also GlobalAlloc, which is separate, and doesn't implement the AllocRef trait. I'm not sure the relationship between Global and GlobalAlloc.

It's a shame that there doesn't seem to be a way to accurately get the actual size of an allocation with Global. (One could write an implementation of AllocRef that always accurately returns the actual size, but that wouldn't help in the general case.) As you suggest, we could do better with the capacity sizing that way.

@Amanieu: have I got all that right? Any additional thoughts?

@RalfJung
Copy link
Member

RalfJung commented May 20, 2020

AFAIK this is mostly historic -- __rust_{alloc,realloc} is entirely internal. But when the GlobalAlloc trait was devised, it was more important to enable switching the global allocator on stable than it was to be able to expose all features of all allocators.

After all, it took a while to devise AllocRef.

Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request May 20, 2020
…owup, r=Amanieu

Adjust the zero check in `RawVec::grow`.

This was supposed to land as part of rust-lang#72227. (I wish `git push` would
abort when you have uncommited changes.)

r? @Amanieu
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request May 20, 2020
…owup, r=Amanieu

Adjust the zero check in `RawVec::grow`.

This was supposed to land as part of rust-lang#72227. (I wish `git push` would
abort when you have uncommited changes.)

r? @Amanieu
@upsuper
Copy link
Contributor

upsuper commented May 28, 2020

In theory, the change could increase memory usage, but in practice it
doesn't. It would be an unusual program where very small Vecs having a
capacity of 4 rather than 1 or 2 would make a difference. You'd need a
lot of very small Vecs, and/or some very small Vecs with very
large elements.

I'd like to chime in and mention that, this does indeed increase memory usage in practice, not theoretically.

In the past, Rust has exactly this behavior, and when we worked on integrating Servo's CSS engine into Gecko (the Stylo project), we found this to be one important reason (among several others) contributing to a much larger (actually double the) memory consumption on Stylo than Gecko with Gmail. See Gecko bug 1392314.

The reason is that, Gmail had lots of background-image for emojis, and background-image supports multiple values in it, and an image item is a relatively large type in the style system, so the overallocation of the Vec hurt quite a bit, and we ended up explicitly disable overallocating in the code.

In my experience, it is actually very common at least in browser development that you have a Vec which in majority of time holds zero or one item, and in very rare case it goes to two, and only in some extreme cases can it go even further. And in that kind of scenario, probably it is tolerable that item size is large as you are unlikely to reallocate a lot.

If Vec is going to overallocate at the very start, I think it would be good to at least have document mentioning that, if a Vec is unlikely go beyond one or two items, it is advised to use with_capacity to avoid possible overallocating.

WDYT?

@ChrisJefferson
Copy link
Contributor

This patch should help with that problem, as it only over-allocates if the objects allocated are not too big.

I do wonder if the point where objects are considered "large" (1024 bytes) could be too large? Something much smaller (32 or even 16 bytes) might reduce memory wastage?

@mati865
Copy link
Contributor

mati865 commented May 28, 2020

That's sounds like perfect scenario for SmallVec/TinyVec TBH. Although https://doc.rust-lang.org/std/vec/struct.Vec.html#capacity-and-reallocation could be slightly expanded for really small vectors.

@upsuper
Copy link
Contributor

upsuper commented May 28, 2020

This patch should help with that problem, as it only over-allocates if the objects allocated are not too big.

Good point. I didn't notice that.

I do wonder if the point where objects are considered "large" (1024 bytes) could be too large? Something much smaller (32 or even 16 bytes) might reduce memory wastage?

Based on the analysis in the related bug, each item was like 192 bytes, and having to allocate 4 making it 768 bytes aligned by the allocator to 1k, and there are ~12k such Vecs, costing 12MB of memory.

With no overallocating, 192 bytes would take exactly 192 bytes (because of the allocator strategy), costing only ~2MB.

That's sounds like perfect scenario for SmallVec/TinyVec TBH.

Not necessarily. SmallVec/TinyVec in that case would need extra space on the stack or other struct, which can lead to extra memory copy and further bloated structs which holding it. This is especially a problem when the most common case is zero item.

@nnethercote
Copy link
Contributor Author

I will repeat my comment from above: "Vec makes no promises about capacities. If you have a program where the capacity of 1 and 2 length Vec's has a critical impact on memory usage (e.g. due to having many short vectors and/or short vectors with very large elements) then you should use a collection type that provides clear guarantees about capacity."

@the8472
Copy link
Member

the8472 commented May 29, 2020

Note that this change doesn't just affect dumb collect() implementations that push repeatedly without a good capacity estimate. This would also change the behavior of small extend() calls with an ExactSizeIterator or TrustedLen. That's probably not a significant edge-case but it is something where we have better information that could be acted on.

E.g. this new behavior could be limited to cases where needed_extra_capacity == 1.

@redbaron
Copy link

Facebook did some research for their FBVector and picked 1.5 growth factor. They explicitly moved away from 2x growth because of cache unfriendliness.

I know next to nothing about memory allocation in rust, but thought it worth mentioning in case it is applicable here.

@nnethercote
Copy link
Contributor Author

Facebook did some research for their FBVector and picked 1.5 growth factor. They explicitly moved away from 2x growth because of cache unfriendliness.

The argument for 1.5x is about memory reuse, rather than cache friendliness. It's a silly argument, IMO. That document says this:

When there's a request to grow the vector, the vector (assuming no in-place resizing, see the appropriate section in this document) will allocate a chunk of memory next to its current chunk, copy its existing data to the new chunk, and then deallocate the old chunk.

The text I have emphasised is false. Modern allocators typically have size classes, which means that allocations of different sizes (e.g. 128 bytes vs 192 bytes) have no chance of being put next to each other. jemalloc, which Facebook uses, is such an allocator.

Indeed, the next section of that document then goes on to talk about jemalloc's size classes, without realizing that they invalidate the reasoning in the previous section.

@RalfJung
Copy link
Member

RalfJung commented May 31, 2020

But it seems like with a growth factor of 1.5, we get a guarantee that at least 66% of the allocated memory is actually used (assuming only pushes), whereas a growth factor of 2 makes that 50%? That could waste a lot of memory.

@nnethercote
Copy link
Contributor Author

Sure, there's a trade-off between frequency of reallocation and potential for unused memory no matter what growth factor you use.

It's the "2x is the theoretical worst possible number" argument that I object to. It's false, and it also overlooks the benefits of 2x: 2x is simple, it is the cheapest possible multiplication, and powers of two are most likely to result in allocation sizes that match allocator size classes.

@redbaron
Copy link

redbaron commented Jun 1, 2020

The argument for 1.5x is about memory reuse, rather than cache friendliness

Isn't it cache friendly to reuse memory?

It's the "2x is the theoretical worst possible number" argument that I object to.

I guess it is easy to measure if it has any visible effect on performance and/or memory usage.

@ChrisJefferson
Copy link
Contributor

ChrisJefferson commented Jun 1, 2020

While we can tune this, until the memory allocator can accurately tell us the "true" amount of allocated memory, numbers like 1.5x are also much more likely to lead to entirely wasted memory. On most systems if you allocate (say) 600 bytes, you are probably getting 1024 anyway, and once you get up to page size you get whole pages. (NOTE: That is my belief from reading one memory manager long ago. I haven't actually read carefully the different mallocs which rust uses, which is I suppose proving my point, we don't know what we "actually" get).

@nnethercote
Copy link
Contributor Author

Yes. Here are the size classes from an older version of jemalloc.

  • Small: [8], [16, 32, 48, ..., 128], [192, 256, 320, ..., 512], [768, 1024, 1280, ..., 3840]
  • Large: [4 KiB, 8 KiB, 12 KiB, ..., 4072 KiB]
  • Huge: [4 MiB, 8 MiB, 12 MiB, ...]

The spacings increase more smoothly in recent versions (I can't find an exact list right now) but the general idea still holds.

@upsuper
Copy link
Contributor

upsuper commented Jun 1, 2020

You can find the latest size classes from jemalloc's man page.

@RalfJung
Copy link
Member

RalfJung commented Jun 1, 2020

Jemalloc is not the default allocator for Rust though. Do you know how the platform allocators that we use by default behave? Seems to make most sense to measure with the default setup.

@mati865
Copy link
Contributor

mati865 commented Jun 1, 2020

@RalfJung it is. At least when building Rust for Linux on it's CI:


Manual builds use system allocator.

@RalfJung
Copy link
Member

RalfJung commented Jun 1, 2020

@mati865 rustc uses jemalloc. But the default allocator for Rust projects in general is the system allocator, not jemalloc -- rustc (a specific program written in Rust) just happens to overwrite that default.

Most users of Vec are going to be outside rustc, so rustc's choice of allocator is insignificant here IMO.

@mati865
Copy link
Contributor

mati865 commented Jun 1, 2020

But perf.rlo can only measure Rust performance. So all "official" measurements are done with jemalloc.

@RalfJung
Copy link
Member

RalfJung commented Jun 1, 2020

That is correct. So what?

The question was if 2x or 1.5x (or something else) is the better growth factor for Vec in general Rust programs (not for rustc specifically). The fact you stated does nothing to help resolve that question, as far as I can see.

@ollie27
Copy link
Member

ollie27 commented Jun 1, 2020

There is a dedicated issue for the growth strategy of Vec: #29931. This discussion should probably be happening there.

udoprog added a commit to udoprog/uniset that referenced this pull request Jun 4, 2020
Layer::grow relies on reserving exactly as many bytes as specified in
the argument. And it apparently has worked as long as the argument was a
power of 2, which it was.

This has changed for small vectors since:
rust-lang/rust#72227

The fix is to use `reserve_exact` instead.
@rw
Copy link

rw commented Aug 5, 2020

This micro-optimization makes Vecs harder to reason about, in my opinion.

IIUC, given elements of size 1024 bytes, these Vecs will start off life in the memory allocator pools meant for 4*1024 = 4096 bytes. That is a big change from starting off in the pools for 1024 byte objects.

Instead, what about calculating the cutoff such that the preallocation fits in to a cache line? Or just recommend that users call Vec::with_capacity?

cc @nnethercote
edit: cc #29931

bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 15, 2020
Do not inline finish_grow

Fixes rust-lang#78471.

Looking at libgkrust.a in Firefox, the sizes for the `gkrust.*.o` file is:
- 18584816 (text) 582418 (data) with unmodified master
- 17937659 (text) 582554 (data) with rust-lang#72227 reverted
- 17968228 (text) 582858 (data) with `#[inline(never)]` on `grow_amortized` and `grow_exact`, but that has some performance consequences
- 17927760 (text) 582322 (data) with this change

So in terms of size, at least in the case of Firefox, this patch more than undoes the regression. I don't think it should affect performance, but we'll see.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.