Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace LockCell with atomic types #56614

Merged
merged 2 commits into from
Jan 9, 2019
Merged

Replace LockCell with atomic types #56614

merged 2 commits into from
Jan 9, 2019

Conversation

Zoxc
Copy link
Contributor

@Zoxc Zoxc commented Dec 7, 2018

Split from #56509

r? @michaelwoerister

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 7, 2018
pub fn consider_optimizing<T: Fn() -> String>(&self, crate_name: &str, msg: T) -> bool {
#[inline(never)]
#[cold]
pub fn consider_optimizing_cold<T: Fn() -> String>(&self, crate_name: &str, msg: T) -> bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does splitting this into two functions really translate to faster compile-times?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used in layout computation. Not sure how hot it is, but it probably won't hurt. Especially if we want to add more MIR optimizations (and debug them)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway this PR is mostly about getting rid of LockCell, I just found this on the way

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. It does hurt code quality though as it makes things more complicated and thus harder to read and maintain. I personally prefer not to do things like this without evidence that it carries it's weight.

@michaelwoerister
Copy link
Member

@bors try

@bors
Copy link
Contributor

bors commented Dec 10, 2018

⌛ Trying commit ed78d93 with merge bcc6e63...

bors added a commit that referenced this pull request Dec 10, 2018
Replace LockCell with atomic types

Split from #56509

r? @michaelwoerister
@bors
Copy link
Contributor

bors commented Dec 10, 2018

☀️ Test successful - status-travis
State: approved= try=True

@ghost
Copy link

ghost commented Dec 10, 2018

Just a suggestion: Instead of LockCell<T> and AtomicT, you could use Crossbeam's AtomicCell<T>:
https://docs.rs/crossbeam/0.5.0/crossbeam/atomic/struct.AtomicCell.html

AtomicCell<T> works exactly like Cell<T>, except it's also thread-safe:

  • If T fits into an AtomicUsize or AtomicBool, atomic operations are used behind the scenes.

  • If T doesn't fit into any primitive atomic type, one of the hidden global spinlocks is used. Those spinlocks support optimistic reads, which means uncontended reads are much faster than they would be with a normal spinlock, Mutex, or RwLock (code, behchmarks).

let mut ret = true;
if let Some(ref c) = self.optimization_fuel_crate {
if c == crate_name {
assert_eq!(self.query_threads(), 1);
let fuel = self.optimization_fuel_limit.get();
let fuel = self.optimization_fuel_limit.load(SeqCst);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is pretty racy I think.

This should all be in a cas loop and the other section (for print_fuel) should be a fetch add, right?

Granted it was racy before even with locks, but still.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It asserts that the compiler is running in single-threaded mode, so it should be fine. It would be nicer though if the two fields in question would be behind a single lock.

@michaelwoerister
Copy link
Member

@stjepang This looks great. Yes, that sounds like the way to go. We already have crossbeam-utils in our crate-whitelist.

@michaelwoerister
Copy link
Member

@rust-timer build bcc6e63

@rust-timer
Copy link
Collaborator

Success: Queued bcc6e63 with parent 3a75e80, comparison URL.

bors bot added a commit to crossbeam-rs/crossbeam that referenced this pull request Dec 11, 2018
262: Move AtomicCell into crossbeam-utils r=jeehoonkang a=stjepang

At this point, I feel pretty confident about the current state of `AtomicCell` in the sense that we probably won't make any breaking changes to its public API in the future.

It's also a reasonably small utility without dependencies that deserves living in `crossbeam-utils`. The Rust compiler would like to use it in place of its `LockCell`, and already has `crossbeam-utils` whitelisted. See [this PR](rust-lang/rust#56614 (comment)).

Let's move `AtomicCell` into `crossbeam-utils` so that people can use it without depending on whole `crossbeam`.

Co-authored-by: Stjepan Glavina <[email protected]>
@rust-timer
Copy link
Collaborator

Finished benchmarking try commit bcc6e63

@michaelwoerister
Copy link
Member

The changes here don't seem to make much of a difference performance-wise. I'd say, as a consequence, we should optimize for code quality:

  • replace LockCell with crossbeam's AtomicCell,
  • put optimization_fuel_limit and out_of_fuel into one regular Lock together,
  • don't split consider_optimizing into hot and cold paths.

bors bot added a commit to crossbeam-rs/crossbeam that referenced this pull request Dec 11, 2018
262: Move AtomicCell into crossbeam-utils r=stjepang a=stjepang

At this point, I feel pretty confident about the current state of `AtomicCell` in the sense that we probably won't make any breaking changes to its public API in the future.

It's also a reasonably small utility without dependencies that deserves living in `crossbeam-utils`. The Rust compiler would like to use it in place of its `LockCell`, and already has `crossbeam-utils` whitelisted. See [this PR](rust-lang/rust#56614 (comment)).

Let's move `AtomicCell` into `crossbeam-utils` so that people can use it without depending on whole `crossbeam`.

Co-authored-by: Stjepan Glavina <[email protected]>
@ghost
Copy link

ghost commented Dec 11, 2018

I've just put AtomicCell into crossbeam-utils v0.6.3 so that you can use it:
https://docs.rs/crossbeam-utils/0.6.3/crossbeam_utils/atomic/struct.AtomicCell.html

@Zoxc
Copy link
Contributor Author

Zoxc commented Dec 11, 2018

@stjepang The use of volatile reads doesn't inspire confidence. Some #[inline] attributes are also lacking.
The only benefits from using crossbeam-utils would be guaranteed u64 support, which we do want.

@ghost
Copy link

ghost commented Dec 11, 2018

The use of volatile reads doesn't inspire confidence.

FWIW, @RalfJung has reviewed those volatile reads and written comments above them.

Some #[inline] attributes are also lacking.

The code is parametrized over <T> so #[inline] attributes didn't help in benchmarks. But we can always add them if needed.

The only benefits from using crossbeam-utils would be guaranteed u64 support, which we do want.

Another benefit is that reads are very fast (faster than Mutex and naive spinlock).

Additionally, if you don't need .get(), AtomicCell<T> doesn't require T: Copy (just like Cell<T>).

@RalfJung
Copy link
Member

@stjepang Do you mean the comments in https://github.com/crossbeam-rs/crossbeam/pull/234/files? Those are different accesses though, are they not?

The volatile accesses I looked at all have a comment saying "this is UB but we do it anyway because performance".^^

@ghost
Copy link

ghost commented Dec 11, 2018

@RalfJung Sorry, my bad! I confused this volatile read with the CAS that issues a SeqCst fence. If you'd like to review this situation, though, that'd be great! :)

We're using a volatile read to read a value that might be concurrently written to. However, we can detect when a data race happens by reading the sequence number. If the value was overwritten while we were reading it, then we try reading again in a loop. The Chase-Lev deque uses a very similar technique.

If we wanted to be pedantic, we'd use multiple AtomicU8 reads to read the value, but that'd be slow (because multiple atomic operations), and we have no AtomicU8 on stable Rust. What are you thoughts on this? (I have deja vu that we already had this discussion somewhere, but can't find it)

@michaelwoerister
Copy link
Member

The only benefits from using crossbeam-utils would be guaranteed u64 support

The main benefit I see is being able to remove LockCell in favor of something that is maintained outside the compiler.

@Zoxc
Copy link
Contributor Author

Zoxc commented Dec 12, 2018

The main benefit I see is being able to remove LockCell in favor of something that is maintained outside the compiler.

This PR already does that and uses libstd's atomic types. Using crossbeam-utils wouldn't be an improvement in that regard.

@michaelwoerister
Copy link
Member

michaelwoerister commented Dec 12, 2018

AtomicCell has a slightly nicer interface because it doesn't require giving Ordering arguments. However, if it is still kind of unstable, libstd's atomic types are probably the better choice. @stjepang and @RalfJung, would you say it is production-ready?

@Zoxc
Copy link
Contributor Author

Zoxc commented Dec 12, 2018

I didn't notice the lack of Ordering arguments. I want those.

@michaelwoerister
Copy link
Member

I depends. If I don't really need them (like in the cases modified in this PR), I prefer them not to be exposed in the interface.

@ghost
Copy link

ghost commented Dec 12, 2018

@stjepang and @RalfJung, would you say it is production-ready?

I would say yes.

AtomicCell has a slightly nicer interface because it doesn't require giving Ordering arguments.

Not only that, AtomicCell<T> can't accept Ordering arguments because using Relaxed orderings might cause data races when T: !Copy.

let a = AtomicCell::new(vec![]);

// Thread #1
a.store(vec![1, 2, 3], Relaxed);

// Thread #2
println!("{:?}", a.take(Relaxed)); // unsynchronized read!

@RalfJung
Copy link
Member

RalfJung commented Dec 12, 2018

We're using a volatile read to read a value that might be concurrently written to. However, we can detect when a data race happens by reading the sequence number. If the value was overwritten while we were reading it, then we try reading again in a loop. The Chase-Lev deque uses a very similar technique.

This sounds a lot like sequence locks to me, which are a notorious problem for language-level memory models. AFAIK, they cannot be implemented in C and hence not in Rust, either.

So the reason you don't do a relaxed load is because this is arbitrarily-sized data?

Following our own definitions, this is UB because of a read-write race. Following LLVM's definitions, the read is fine but returns undef, and it is impossible to check whether data is undef, so the code is most likely still UB (I haven't had the chance to look at it though).

However, given taht volatile loads cannot be duplicated, the undef probably cannot be materialized as such and hence this is likely to work in practice. But we are bending the rules of both Rust and LLVM here.

If we keep wanting the LLVM memory model, maybe we should just change the documentation and say we are using the LLVM model? The relevant difference is that read-write races are not UB, they just return indeterminate data (which is the same as uninitialized data and padding bits). You are not allowed to do anything with that data, and you should probably wrap it in a MaybeUninitialized, but the read itself is not immediate UB.

For the LLVM side, can we at least get an LLVM developer on the record saying that volatile accesses never return undef, that they act like a load followed by a freeze? That seems like a reasonable assumption (and, curiously, because it would be the only way to actually write out a freeze in LLVM, it would suddenly make volatile actually meaningful in the context of concurrency), but I know LLVM far too little to estimate whether it holds true. Ideally we'd get them to add that to the LangRef...

Cc @rkruppe

@RalfJung
Copy link
Member

Not only that, AtomicCell can't accept Ordering arguments because using Relaxed orderings might cause data races when T: !Copy.

So which ordering does it use, always release/acquire?

@ghost
Copy link

ghost commented Dec 12, 2018

This sounds a lot like sequence locks to me, which are a notorious problem for language-level memory models. AFAIK, they cannot be implemented in C and hence not in Rust, either.

Yes, this is a sequence lock. We based our implementation on the paper you've linked.

So the reason you don't do a relaxed load is because this is arbitrarily-sized data?

Exactly.

Following our own definitions, this is UB because of a read-write race. Following LLVM's definitions, the read is fine but returns undef, and it is impossible to check whether data is undef, so the code is most likely still UB (I haven't had the chance to look at it though).

Yeah, we don't use the data if a race happens (it's mem::forgotten). The Chase-Lev deque (crossbeam-deque, which is used by rayon-core and tokio-threadpool) uses a similar trick and forgets data when a data race happens. Goroutine queues in Go's scheduler do the same thing, too.

You are not allowed to do anything with that data, and you should probably wrap it in a MaybeUninitialized, but the read itself is not immediate UB.

MaybeUninitialized is not available on stable yet. Can I use ManuallyDrop instead?

So which ordering does it use, always release/acquire?

I was a big proponent of guaranteeing SeqCst semantics, but after a lengthy dicussion we decided to guarantee that AtomicCell uses at least acquire/release. The reason is, @jeehoonkang says we cannot really guarantee SeqCst because SeqCst semantics are broken in the C/C++11 memory model (paper, talk) in the sense that there's no valid compilation scheme for some architectures (including some ARMs).

@RalfJung
Copy link
Member

Yeah, we don't use the data if a race happens (it's mem::forgotten).

Ah I see, the data is literally not looked at at all if a race happened, and the detection happens through another channel. The undef/poison is fine then and you can ignore anything I said above about freeze.

The good news is that I think you don't even need a volatile read! You don't care if racy reads yield undef. But you do need the property that read-writes read undef instead of being UB, which LLVM guarantees but Rust does not. (And you do need MaybeUninitialized.) We should better clarify this before other backends are written...

I suspect Rust picked the C11 model because it is much better studied, and that is still true. I know of hardly any academic work on memory models that are as permissive with read-write races as LLVM is. However, there is some very recent work that I should read that takes this into account.

Can I use ManuallyDrop instead?

No. ManuallyDrop doesn't help a bit.

after a lengthy dicussion we decided to guarantee that AtomicCell uses at least acquire/release

I think SeqCst is overrated anyway and release-acquire is what you want almost all the time, so fine for me. ;)


I took a look at the AtomicCell code. If I understood it correctly, a write proceeds as follows:

  • Acquire "lock" by swapping stamp with 1 (doing an acquire-load but a relaxed store, for whatever reason)
  • Release fence
  • Non-atomic store for the actual data
  • Store old_timestamp+2 in stamp (release write)

A read in the fast case proceeds as

  • Acquire-load the stamp
  • Non-atomic load for actual data
  • Acquire fence
  • Relaxed-load of stamp, checking it didn't change

I just spent quite a while wondering about what the correctness argument here is, and why the fence is needed, and why the initial swap is Acquire as opposed to Relaxed. I think I got it now, but I also think the library could need some comments so that the reader doesn't have to reverse-engineer the correctness argument by themselves.

Generally, the library has way fewer comments than is adequate for such subtle concurrent data structures. In Arc, every single atomic access/fence and every Ordering choice is carefully justified. I suggest doing the same here (and for any concurrent data structure, really).

@michaelwoerister
Copy link
Member

OK, let's merge the current version of the PR. The single-threaded implementation of the Atomic types seems harmless enough and I can see that having memory ordering available might be beneficial in some situations. Thanks for updating the code around optimization fuel, @Zoxc!

@bors r+

@bors
Copy link
Contributor

bors commented Jan 9, 2019

📌 Commit 9b47acf has been approved by michaelwoerister

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 9, 2019
@bors
Copy link
Contributor

bors commented Jan 9, 2019

⌛ Testing commit 9b47acf with merge 664c779...

bors added a commit that referenced this pull request Jan 9, 2019
Replace LockCell with atomic types

Split from #56509

r? @michaelwoerister
@bors
Copy link
Contributor

bors commented Jan 9, 2019

☀️ Test successful - status-appveyor, status-travis
Approved by: michaelwoerister
Pushing 664c779 to master...

@bors bors merged commit 9b47acf into rust-lang:master Jan 9, 2019
@Zoxc Zoxc deleted the query-perf2 branch January 9, 2019 16:57
Centril added a commit to Centril/rust that referenced this pull request Jan 15, 2019
Optimize try_mark_green and eliminate the lock on dep node colors

Blocked on rust-lang#56614

r? @michaelwoerister
Centril added a commit to Centril/rust that referenced this pull request Jan 17, 2019
Optimize try_mark_green and eliminate the lock on dep node colors

Blocked on rust-lang#56614

r? @michaelwoerister
Centril added a commit to Centril/rust that referenced this pull request Jan 17, 2019
Optimize try_mark_green and eliminate the lock on dep node colors

Blocked on rust-lang#56614

r? @michaelwoerister
Centril added a commit to Centril/rust that referenced this pull request Jan 17, 2019
Optimize try_mark_green and eliminate the lock on dep node colors

Blocked on rust-lang#56614

r? @michaelwoerister
bors added a commit that referenced this pull request Jan 18, 2019
Optimize try_mark_green and eliminate the lock on dep node colors

Blocked on #56614

r? @michaelwoerister
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants