Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Rc and Arc mem::forget safe #27174

Merged
merged 2 commits into from
Jul 30, 2015
Merged

Make Rc and Arc mem::forget safe #27174

merged 2 commits into from
Jul 30, 2015

Conversation

Gankra
Copy link
Contributor

@Gankra Gankra commented Jul 21, 2015

@rust-highfive
Copy link
Collaborator

r? @nikomatsakis

(rust_highfive has picked a reviewer for you, use r? to override)

@alexcrichton
Copy link
Member

Is it worth evaluating the performance impact and only doing these checks on 32-bit?

@alexcrichton alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Jul 21, 2015
@brson
Copy link
Contributor

brson commented Jul 21, 2015

I'm also not clear on the impact on 64-bit platforms, the original report says it's a problem on both.

@brson
Copy link
Contributor

brson commented Jul 21, 2015

Can you add test cases to prove that these do what they say?

@brson brson added the relnotes Marks issues that should be documented in the release notes of the next release. label Jul 21, 2015
@Gankra
Copy link
Contributor Author

Gankra commented Jul 21, 2015

You can't do this only on 32-bit because LLVM will optimize mem::forget loops into a single add.

@Gankra
Copy link
Contributor Author

Gankra commented Jul 21, 2015

I was going to say that it seems weird to have a test that counts to usize::MAX (so long), but maybe LLVM can take us straight to the hoop.

@alexcrichton
Copy link
Member

You can't do this only on 32-bit because LLVM will optimize mem::forget loops into a single add.

I can see how the previous "bump refcount" code might optimize to this, but with the saturating_add instead of just += 1 this may be enough to throw off LLVM's optimizations (especially in the Arc case).

@Gankra
Copy link
Contributor Author

Gankra commented Jul 21, 2015

I don't understand what you're proposing. That this change prevents llvm
from fast-forwarding the exploit to a practical level, and therefore it's
fine to not make this change on some platforms...?
On Jul 20, 2015 11:43 PM, "Alex Crichton" [email protected] wrote:

You can't do this only on 32-bit because LLVM will optimize mem::forget
loops into a single add.

I can see how the previous "bump refcount" code might optimize to this,
but with the saturating_add instead of just += 1 this may be enough to
throw off LLVM's optimizations (especially in the Arc case).


Reply to this email directly or view it on GitHub
#27174 (comment).

@alexcrichton
Copy link
Member

Oh, right! Duh.

May not optimize to one addition on the Arc case though?

@Gankra
Copy link
Contributor Author

Gankra commented Jul 21, 2015

Yes I'll do some tests.

That said, I'm not comfortable with leaving memory safety holes in Rust on the premise that they're impractical (edit: particularly if they rely on a "sufficiently stupid compiler" argument) and that preventing them has overhead.

I would be interested in including a cfg(allow_preposterous_unsafety) for those who Just Don't Care.

@Gankra
Copy link
Contributor Author

Gankra commented Jul 21, 2015

Arc does indeed fail to be optimized to a single add today. With some back of the envelope math, that means that on 64-bit with a single thread just incrementing (probably honestly the best case -- the contention from multiple threads incrementing is probable worse) on a 4ghz processor it would take 150 years to overflow.

Assuming a sufficiently stupid compiler.

@Gankra
Copy link
Contributor Author

Gankra commented Jul 21, 2015

However, that said, do we want to include a test for this if compiler optimization shifts and platform specific details mean this could become a 150 year test?

@nikomatsakis
Copy link
Contributor

I'd like to see the performance measured, but I'm not sure what's a reasonable benchmark. Perhaps something in servo uses Arc? Microbenchmarks certainly showed small (but not negligible) impact. I think all things considered I'm in favor of landing this, I feel like I can imagine this being a problem in some code somewhere, particularly on 32 bit. But it always seems better to go in with data.

(That said, I'm not sure I find the arguments about "sufficiently smart compilers" all that persuasive: it assumes a mem-forget in a loop, right? That doesn't seem like a very realistic scenario, as compared to say some kind of bug that causes a leak for every web connection. But I don't know, it could happen.)

@alexcrichton alexcrichton added the I-needs-decision Issue: In need of a decision. label Jul 21, 2015
@Gankra
Copy link
Contributor Author

Gankra commented Jul 21, 2015

@pcwalton has suggested we test the perf impact by pointing Servo at Wikipedia and profiling with -p 1.

@Gankra
Copy link
Contributor Author

Gankra commented Jul 23, 2015

The results on Servo:

Wikipedia home: https://gist.github.com/Gankro/bed5fee96667c25a4f1c
Wikipedia guardians of the galaxY: https://gist.github.com/Gankro/61b2141c205b3ceb4d9d

Talking on #servo this seemed to be "in the noise".

Micro benchmarks from the original thread by niko (checking instead if x < 0 which presumably shouldn't make a difference?) showed noticeable but fairly low impact (10-20% in the Arc case):

https://gist.github.com/nikomatsakis/8ad11f16fd2d24b51b25

I can run any benchmarks people want to see but I must admit I am not the best at creating "good" performance evaluations.

@Gankra
Copy link
Contributor Author

Gankra commented Jul 23, 2015

CC @larsberg

@alexcrichton
Copy link
Member

Thanks for doing the analysis @gankro! I think this is fine to merge, but I'd like to briefly discuss it in triage next week.

@larsberg
Copy link

I think you've got the wrong lars berg, maybe you want @larsbergstrom?

@Gankra
Copy link
Contributor Author

Gankra commented Jul 24, 2015

Oops, sorry @larsberg!

@dgrunwald
Copy link
Contributor

As discussed in the internals thread, the saturating solution requires that all wider-than-address-space stores that accepting arbitrary types cannot offer a safe API. Is this really a good trade-off?

For example, I could imagine a CompressingVec<T> that offers a similar public API as Vec<T> but uses gzip compression on the bytes stores in it. Do we want to forbid people from writing types like CompressingVec<T> and offering a safe API?

If we don't create and document a rule forbidding such wider-than-address-space stores in safe rust, the assumption that a saturated refcount can never again reach zero is invalid, and Rc stays unsafe.

@Gankra
Copy link
Contributor Author

Gankra commented Jul 26, 2015

@dgrunwald I don't really see why anyone would assume they could do such a thing.

@dgrunwald
Copy link
Contributor

Why not? If I have ownership of a value and can move it; why shouldn't I be able to move its bits to disk, and back later?
An out-of-memory store seems like a reasonable thing to write in a systems programming language; I don't think it's obvious that we want to forbid those just so that Rc can make an assumption on the number of live values.

@Gankra
Copy link
Contributor Author

Gankra commented Jul 27, 2015

Is this something many (any) languages support at all? Is this something anyone actually does in practice? It sounds like a big headache for not much gain.

Note that I don't believe this prevent you from mem-mapping files and doing magic compression with a custom FS or whatever. You just have to have everything uniquely addressable by a pointer, which is 2^64 bytes (18 exabytes) on modern platforms!

@pcwalton
Copy link
Contributor

Do you have a citation on this? Has anybody ever written a std::vector that transparently compresses its contents? A quick Google search brings up no such example of C++ code in existence.

I'm really wary of introducing an extra test-and-branch for every single Rc manipulation for a theoretical use case. My inclination is to tell anybody who wants to write such a container to just use trait bounds to whitelist the kinds of values that can be stored within it.

@dgrunwald
Copy link
Contributor

I don't think the exact example of a zipping vector exists anywhere, but that's just an example from a large number of possible larger-than-memory stores (a more likely usecase would be using a large memory mapped file where only sections of it are mapped into memory).

I have actually written a generic vector with run-length-compression in C#. That one was using object.Equals() and cloning C# references and would be doable in safe rust using Eq + Clone -- but in cases without Clone, the equivalent type that works by comparing the underlying bits (and thus doesn't need any bounds) might be nice-to-have.

Edit: Also, i don't think test-and-predictable-branch is going to be any more expensive than test-and-conditional-move (which is what a saturating add compiles to on x64).

@Gankra
Copy link
Contributor Author

Gankra commented Jul 29, 2015

Discussing it today, while we concluded that no one cares about the external memory thing, we do think it makes sense to be able to identify two values as identical and compress them by storing once and maintaining a count of how many times its stored. As such I will update this to instead abort -- if you get to this point your code is tragically broken anyway. We can later decide to instead permanently saturate (adding branches to Drop) if this proves to be onerous. In practice this is all irrelevant since triggering this code will basically never happen. If you have a legit leak you'll OOM first. You need to be calling mem::forget.

@alexcrichton
Copy link
Member

We talked about this libs triage meeting today and the consensus was that we should merge with the abort strategy (as explained by @gankro)

@alexcrichton alexcrichton removed the I-needs-decision Issue: In need of a decision. label Jul 29, 2015
@bluss
Copy link
Member

bluss commented Jul 29, 2015

Sounds good

@Gankra
Copy link
Contributor Author

Gankra commented Jul 29, 2015

@bors r=alexcrichton

@bors
Copy link
Contributor

bors commented Jul 29, 2015

📌 Commit 8fda0b0 has been approved by alexcrichton

@Gankra Gankra force-pushed the rc-sat branch 2 times, most recently from 1d7c787 to c9effc8 Compare July 30, 2015 00:01
@Gankra
Copy link
Contributor Author

Gankra commented Jul 30, 2015

@bors r=alexcrichton

@bors
Copy link
Contributor

bors commented Jul 30, 2015

📌 Commit 22e2100 has been approved by alexcrichton

@bors
Copy link
Contributor

bors commented Jul 30, 2015

⌛ Testing commit 22e2100 with merge db2af71...

@bors bors merged commit 22e2100 into rust-lang:master Jul 30, 2015
@comex
Copy link
Contributor

comex commented Aug 4, 2015

@gankro Any plans to refine this to (safely) avoid the check on 64-bit ABIs?

@Gankra
Copy link
Contributor Author

Gankra commented Aug 4, 2015

@comex there is no safe way to avoid the check on Rc, llvm is too smart -- it's possible to ditch on Arc if anyome can provide me concrete docs on the C11 atomics that the compiler isn't allowed to combine several fetch_adds into one.

@comex
Copy link
Contributor

comex commented Aug 4, 2015

I believe it's possible to do it with the same trick as test::black_box (or an inlined version of that function): an empty asm! block. For instance, if you passed the pointer through there, LLVM would have no way of knowing if multiple increases went to the same Rc object, so it couldn't combine them.

@Gankra
Copy link
Contributor Author

Gankra commented Aug 4, 2015

@comex I might be happy with this as long as we had a test that verifies that llvm never starts optimizing away empty asm... the black-box thing is a tenuous hack to my knowledge.

@bluss
Copy link
Member

bluss commented Aug 4, 2015

@comex it sounds like that might cause more damage than it does good (preventing optimizations).

@comex
Copy link
Contributor

comex commented Aug 4, 2015

@bluss It shouldn't; as long as the function containing asm is inlined, the only constraint given to LLVM is that the pointer should be in a register, which it's almost certain to be anyway. Without any other extended effects (e.g. "memory"), clobber registers, or outputs, it should be able to safely assume that nothing other than that register is changed.

@glaebhoerl
Copy link
Contributor

Incidentally, I recall a discussion way back when about the possibility of optimizing space usage by using (u32, u32) instead of (usize, usize) for the reference counts. Back then, this idea was rejected on the grounds that it would be memory-unsafe if the counts were to overflow. Now that it turns out we need to explicitly check for overflow anyways, though, this reasoning no longer applies. The question is now only whether we consider it important to allow a reference-counted box to have over 4 billion references. (It's quite possible that we do; I just wanted to point out the possibility.)

This is tangentially related to rust-lang/rfcs#1232 in that both changes have the effect of halving the space overhead on 64-bit platforms. It also occurs to me that if we were to consider optimizing this overhead important enough to make Rc generic over whether it supports weak references, then perhaps we should also consider making it generic over the type used for the reference counts - so you could, for example, use Rc<SomeTypeWithAlignof1, NoWeak, u8> with an overhead of only a single byte per box if you didn't need weak references and were completely sure that there would never be more than 255 references to a given box at runtime.

(My personal sense, as I also commented on the other thread, is that these would all be premature optimizations until and unless there were empirical evidence showing concrete gains to be had. But the possibility is there.)

@Gankra
Copy link
Contributor Author

Gankra commented Aug 8, 2015

@glaebhoerl I agree with your analysis -- YAGNI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes Marks issues that should be documented in the release notes of the next release. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.