-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Add a thread local storage module, std::tls #461
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,331 @@ | ||
- Start Date: (fill me in with today's date, YYYY-MM-DD) | ||
- RFC PR: (leave this empty) | ||
- Rust Issue: (leave this empty) | ||
|
||
# Summary | ||
|
||
Introduce a new thread local storage module to the standard library, `std::tls`, | ||
providing: | ||
|
||
* Scoped TLS, a non-owning variant of TLS for any value. | ||
* Owning TLS, an owning, dynamically initialized, dynamically destructed | ||
variant, similar to `std::local_data` today. | ||
|
||
# Motivation | ||
|
||
In the past, the standard library's answer to thread local storage was the | ||
`std::local_data` module. This module was designed based on the Rust task model | ||
where a task could be either a 1:1 or M:N task. This design constraint has | ||
[since been lifted][runtime-rfc], allowing for easier solutions to some of the | ||
current drawbacks of the module. While redesigning `std::local_data`, it can | ||
also be scrutinized to see how it holds up to modern-day Rust style, guidelines, | ||
and conventions. | ||
|
||
[runtime-rfc]: https://github.com/rust-lang/rfcs/blob/master/text/0230-remove-runtime.md | ||
|
||
In general the amount of work being scheduled for 1.0 is being trimmed down as | ||
much as possible, especially new work in the standard library that isn't focused | ||
on cutting back what we're shipping. Thread local storage, however, is such a | ||
critical part of many applications and opens many doors to interesting sets of | ||
functionality that this RFC sees fit to try and wedge it into the schedule. The | ||
current `std::local_data` module simply doesn't meet the requirements of what | ||
one may expect out of a TLS implementation for a language like Rust. | ||
|
||
## Current Drawbacks | ||
|
||
Today's implementation of thread local storage, `std::local_data`, suffers from | ||
a few drawbacks: | ||
|
||
* The implementation is not super speedy, and it is unclear how to enhance the | ||
existing implementation to be on par with OS-based TLS or `#[thread_local]` | ||
support. As an example, today a lookup takes `O(log N)` time where N is the | ||
number of set TLS keys for a task. | ||
|
||
This drawback is also not to be taken lightly. TLS is a fundamental building | ||
block for rich applications and libraries, and an inefficient implementation | ||
will only deter usage of an otherwise quite useful construct. | ||
|
||
* The types which can be stored into TLS are not maximally flexible. Currently | ||
only types which ascribe to `'static` can be stored into TLS. It's often the | ||
case that a type with references needs to be placed into TLS for a short | ||
period of time, however. | ||
|
||
* The interactions between TLS destructors and TLS itself is not currently very | ||
well specified, and it can easily lead to difficult-to-debug runtime panics or | ||
undocumented leaks. | ||
|
||
* The implementation currently assumes a local `Task` is available. Once the | ||
runtime removal is complete, this will no longer be a valid assumption. | ||
|
||
## Current Strengths | ||
|
||
There are, however, a few pros to the usage of the module today which should be | ||
required for any replacement: | ||
|
||
* All platforms are supported. | ||
* `std::local_data` allows consuming ownership of data, allowing it to live past | ||
the current stack frame. | ||
|
||
## Building blocks available | ||
|
||
There are currently two primary building blocks available to Rust when building | ||
a thread local storage abstraction, `#[thread_local]` and OS-based TLS. Neither | ||
of these are currently used for `std::local_data`, but are generally seen as | ||
"adequately efficient" implementations of TLS. For example, an TLS access of a | ||
`#[thread_local]` global is simply a pointer offset, which when compared to a | ||
`O(log N)` lookup is quite speedy! | ||
|
||
With these available, this RFC is motivated in redesigning TLS to make use of | ||
these primitives. | ||
|
||
# Detailed design | ||
|
||
Three new modules will be added to the standard library: | ||
|
||
* The `std::sys::tls` module provides platform-agnostic bindings the OS-based | ||
TLS support. This support is intended to only be used in otherwise unsafe code | ||
as it supports getting and setting a `*mut u8` parameter only. | ||
|
||
* The `std::tls` module provides a dynamically initialized and dynamically | ||
destructed variant of TLS. This is very similar to the current | ||
`std::local_data` module, except that the implicit `Option<T>` is not | ||
mandated as an initialization expression is required. | ||
|
||
* The `std::tls::scoped` module provides a flavor of TLS which can store a | ||
reference to any type `T` for a scoped set of time. This is a variant of TLS | ||
not provided today. The backing idea is that if a reference only lives in TLS | ||
for a fixed set of time then there's no need for TLS to consume ownership of | ||
the value itself. | ||
|
||
This pattern of TLS is quite common throughout the compiler's own usage of | ||
`std::local_data` and often more expressive as no dances are required to move | ||
a value into and out of TLS. | ||
|
||
The design described below can be found as an existing cargo package: | ||
https://github.com/alexcrichton/tls-rs. | ||
|
||
## The OS layer | ||
|
||
While LLVM has support for `#[thread_local]` statics, this feature is not | ||
supported on all platforms that LLVM can target. Almost all platforms, however, | ||
provide some form of OS-based TLS. For example Unix normally comes with | ||
`pthread_key_create` while Windows comes with `TlsAlloc`. | ||
|
||
This RFC proposes introducing a `std::sys::tls` module which contains bindings | ||
to the OS-based TLS mechanism. This corresponds to the `os` module in the | ||
example implementation. While not currently public, the contents of `sys` are | ||
slated to become public over time, and the API of the `std::sys::tls` module | ||
will go under API stabilization at that time. | ||
|
||
This module will support "statically allocated" keys as well as dynamically | ||
allocated keys. A statically allocated key will actually allocate a key on | ||
first use. | ||
|
||
### Destructor support | ||
|
||
The major difference between Unix and Windows TLS support is that Unix supports | ||
a destructor function for each TLS slot while Windows does not. When each Unix | ||
TLS key is created, an optional destructor is specified. If any key has a | ||
non-NULL value when a thread exits, the destructor is then run on that value. | ||
|
||
One possibility for this `std::sys::tls` module would be to not provide | ||
destructor support at all (least common denominator), but this RFC proposes | ||
implementing destructor support for Windows to ensure that functionality is not | ||
lost when writing Unix-only code. | ||
|
||
Destructor support for Windows will be provided through a custom implementation | ||
of tracking known destructors for TLS keys. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On Windows, you can set a callback for when a thread exits, and you can then iterate over all TLS keys and destroy them if they are set. See https://github.com/ChromiumWebApps/chromium/blob/master/base/threading/thread_local_storage_win.cc#L42 for how to do it without using DllMain. Iterating over all possible TLS keys is quadratic instead of linear if they are sparsely used, but it should be possible to write code that directly accesses the Windows TEB to avoid this if desired. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Indeed! I ran across that same trick when whipping up the sample implementation. I definitely agree that the implementation can be improved as well! |
||
|
||
## Scoped TLS | ||
|
||
As discussed before, one of the motivations for this RFC is to provide a method | ||
of inserting any value into TLS, not just those that ascribe to `'static`. This | ||
provides maximal flexibility in storing values into TLS to ensure any "thread | ||
local" pattern can be encompassed. | ||
|
||
Values which do not adhere to `'static` contain references with a constrained | ||
lifetime, and can therefore not be moved into TLS. They can, however, be | ||
*borrowed* by TLS. This scoped TLS api provides the ability to insert a | ||
reference for a particular period of time, and then a non-escaping reference can | ||
be extracted at any time later on. | ||
|
||
In order to implement this form of TLS, a new module, `std::tls::scoped`, will | ||
be added. It will be coupled with a `scoped_tls!` macro in the prelude. The API | ||
looks like: | ||
|
||
```rust | ||
/// Declares a new scoped TLS key. The keyword `static` is required in front to | ||
/// emphasize that a `static` item is being created. There is no initializer | ||
/// expression because this key initially contains no value. | ||
/// | ||
/// A `pub` variant is also provided to generate a public `static` item. | ||
macro_rules! scoped_tls( | ||
(static $name:ident: $t:ty) => (/* ... */); | ||
(pub static $name:ident: $t:ty) => (/* ... */); | ||
) | ||
|
||
/// A structure representing a scoped TLS key. | ||
/// | ||
/// This structure cannot be created dynamically, and it is accessed via its | ||
/// methods. | ||
pub struct Key<T> { /* ... */ } | ||
|
||
impl<T> Key<T> { | ||
/// Insert a value into this scoped TLS slot for a duration of a closure. | ||
/// | ||
/// While `cb` is running, the value `t` will be returned by `get` unless | ||
/// this function is called recursively inside of cb. | ||
/// | ||
/// Upon return, this function will restore the previous TLS value, if any | ||
/// was available. | ||
pub fn set<R>(&'static self, t: &T, cb: || -> R) -> R { /* ... */ } | ||
|
||
/// Get a value out of this scoped TLS variable. | ||
/// | ||
/// This function takes a closure which receives the value of this TLS | ||
/// variable, if any is available. If this variable has not yet been set, | ||
/// then None is yielded. | ||
pub fn with<R>(&'static self, cb: |Option<&T>| -> R) -> R { /* ... */ } | ||
} | ||
``` | ||
|
||
The purpose of this module is to enable the ability to insert a value into TLS | ||
for a scoped period of time. While able to cover many TLS patterns, this flavor | ||
of TLS is not comprehensive, motivating the owning variant of TLS. | ||
|
||
### Variations | ||
|
||
Specifically the `with` API can be somewhat unwieldy to use. The `with` function | ||
takes a closure to run, yielding a value to the closure. It is believed that | ||
this is required for the implementation to be sound, but it also goes against | ||
the "use RAII everywhere" principle found elsewhere in the stdlib. | ||
|
||
Additionally, the `with` function is more commonly called `get` for accessing a | ||
contained value in the stdlib. The name `with` is recommended because it may be | ||
possible in the future to express a `get` function returning a reference with a | ||
lifetime bound to the stack frame of the caller, but it is not currently | ||
possible to do so. | ||
|
||
The `with` functions yields an `Option<&T>` instead of `&T`. This is to cover | ||
the use case where the key has not been `set` before it used via `with`. This is | ||
somewhat unergonomic, however, as it will almost always be followed by | ||
`unwrap()`. An alternative design would be to provide a `is_set` function and | ||
have `with` `panic!` instead. | ||
|
||
## Owning TLS | ||
|
||
Although scoped TLS can store any value, it is also limited in the fact that it | ||
cannot own a value. This means that TLS values cannot escape the stack from from | ||
which they originated from. This is itself another common usage pattern of TLS, | ||
and to solve this problem the `std::tls` module will provided support for | ||
placing owned values into TLS. | ||
|
||
These values must not contain references as that could trigger a use-after-free, | ||
but otherwise there are no restrictions on placing statics into owned TLS. The | ||
module will support dynamic initialization (run on first use of the variable) as | ||
well as dynamic destruction (implementors of `Drop`). | ||
|
||
The interface provided will be similar to what `std::local_data` provides today, | ||
except that the `replace` function has no analog (it would be written with a | ||
`RefCell<Option<T>>`). | ||
|
||
```rust | ||
/// Similar to the `scoped_tls!` macro, except allows for an initializer | ||
/// expression as well. | ||
macro_rules! tls( | ||
(static $name:ident: $t:ty = $init:expr) => (/* ... */) | ||
(pub static $name:ident: $t:ty = $init:expr) => (/* ... */) | ||
) | ||
|
||
pub struct Key<T: 'static> { /* ... */ } | ||
pub struct Ref<T: 'static> { /* ... */ } | ||
|
||
impl<T: 'static> Key<T> { | ||
/// Access this TLS variable, lazily initializing it if necessary. | ||
/// | ||
/// The first time this function is called on each thread the TLS key will | ||
/// be initialized by having the specified init expression evaluated on the | ||
/// current thread. | ||
/// | ||
/// This function can return `None` for the same reasons of static TLS | ||
/// returning `None` (destructors are running or may have run). | ||
pub fn get(&'static self) -> Option<Ref<T>> { /* ... */ } | ||
} | ||
|
||
impl<T: 'static> Deref<T> for Ref<T> { /* ... */ } | ||
``` | ||
|
||
### Destructors | ||
|
||
One of the major points about this implementation is that it allows for values | ||
with destructors, meaning that destructors must be run when a thread exits. This | ||
is similar to placing a value with a destructor into `std::local_data`. This RFC | ||
attempts to refine the story around destructors: | ||
|
||
* A TLS key cannot be accessed while its destructor is running. This is | ||
currently manifested with the `Option` return value. | ||
* A TLS key *may* not be accessible after its destructor has run. | ||
* Re-initializing TLS keys during destruction may cause memory leaks (e.g. | ||
setting the key FOO during the destructor of BAR, and initializing BAR in the | ||
destructor of FOO). An implementation will strive to destruct initialized | ||
keys whenever possible, but it may also result in a memory leak. | ||
* A `panic!` in a TLS destructor will result in a process abort. This is similar | ||
to a double-failure. | ||
|
||
These semantics are still a little unclear, and the final behavior may still | ||
need some more hammering out. The sample implementation suffers from a few extra | ||
drawbacks, but it is believed that some more implementation work can overcome | ||
some of the minor downsides. | ||
|
||
### Variations | ||
|
||
Like the scoped TLS varations, the primary way this API could be altered would | ||
be to return `Ref<T>` instead of an `Option` from `get`, while then providing a | ||
function to test whether a value is being destroyed. | ||
|
||
# Drawbacks | ||
|
||
* Leaking TLS keys on Windows is certainly not ideal (see the description | ||
above). | ||
* There is no variant of TLS for statically initialized data. Currently the | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A branch + pointer offset is significantly worse than just a pointer offset, even if it's marked as likely to succeed for LLVM. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I definitely agree that there's a performance loss, but after some benchmarking, "significantly worse" may be overstating it a bit, I found the hit to be ~10%:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's far more than 10%. You're not really measuring anything by doing naive micro-benchmarks of branches. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you have some representative examples I could measure? I'd love to get a handle on what sort of impact this has. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The call to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Static TLS access is just a single offset instruction with a constant offset so IMO the only way you're going to get a sane benchmark is generating the assembly for a fetch, increment and set of an integer in TLS (inside a library There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah I'm sorry, I should have clarified. The benchmarks were run with
I didn't find this very useful, so I added There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You could measure the time taken by a no-op loop calling There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've added a few more benchmarks, and these are the results:
While I don't dispute that a dynamically initialized variable has more instructions on the fast path than a statically initialized one, it seems that the impact is quite minor. These numbers make it look like a global variable is a tad bit slower! If the cost of measuring the benchmarking loop is significant in terms of measurements, then I would expect the conclusion to be that the unit being benchmarked is quite fast. I'd also like to reiterate that I would like to support statically initialized TLS in terms of an API, but the ergonomics of doing so make it infeasible today in my personal opinion. Do note that it is entirely implemented in the sample implementation. API-wise, however providing two variants (dynamic/static) also seems somewhat overkill versus providing only one to worry about. I suspect with an extension to the macro syntax in the future (and ergonomic static initialization), we could tweak the macro to something like: I would also expect the number of candidates for a statically initialized TLS variable to be fairly small today. It's pretty rare to work with a data structure that can be statically initialized, so in practice if we provided 2 possibilities I would expect the dynamic variant's usage to far outweigh the static variant's usage. If, however, we see usage going in the other direction, we could certainly tweak the semantics! |
||
`std::tls` module requires dynamic initialization, which means a slight | ||
penalty is paid on each access (a check to see if it's already initialized). | ||
* The specification of destructors on owned TLS values is still somewhat shaky | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Using the C++11 destructor support would be much more robust. It doesn't have weird limitations like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm actually not super familiar with the semantics of C++11 destructors with respect to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If there are 100 TLS variables and each one has a destructor accessing the next for the first time, I think it will lead to leaks with the old POSIX TLS because it only cycles N times (4 on Linux IIRC). AFAIK, that problem was solved for C++11 TLS by just giving it guaranteed sensible semantics (run until completion). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you know of documentation for the C++11 destructor semantics in thread_local? I'll reiterate that we do use the destructor registration functions that it uses when available, I'd like to copy the semantics to the fallback implementation (OS TLS) as much as possible, however. |
||
at best. It's possible to leak resources in unsafe code, and it's also | ||
possible to have different behavior across platforms. | ||
* Due to the usage of macros for initialization, all fields of `Key` in all | ||
scenarios must be public. Note that `os` is excepted because its initializers | ||
are a `const`. | ||
* This implementation, while declared safe, is not safe for systems that do any | ||
form of multiplexing of many threads onto one thread (aka green tasks or | ||
greenlets). This RFC considers it the multiplexing systems' responsibility to | ||
maintain native TLS if necessary, or otherwise strongly recommend not using | ||
native TLS. | ||
|
||
# Alternatives | ||
|
||
Alternatives on the API can be found in the "Variations" sections above. | ||
|
||
Some other alternatives might include: | ||
|
||
* A 0-cost abstraction over `#[thread_local]` and OS-based TLS which does not | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See the point about There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that the sample implementation does indeed use the destructor registration functions, and it does actually have a statically initialized variant, I just felt that it was confusing to expose so many variants of TLS. I'll benchmark this though so we can get a good handle on how big of a hit this is. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah one other thing which led me to start out with dynamic-only is the general lack of statically initialized values in the standard library. As you've found out, you can't statically initialize a
I'm not a super fan of either of these options, and would prefer to hold out for something like I do think we may be able to add a static variant later without much pain, but I'd just prefer to see some more well-supported statically initialized values before that time. |
||
have support for destructors but requires static initialization. Note that | ||
this variant still needs destructor support *somehow* because OS-based TLS | ||
values must be pointer-sized, implying that the rust value must itself be | ||
boxed (whereas `#[thread_local]` can support any type of any size). | ||
|
||
* A variant of the `tls!` macro could be used where dynamic initialization is | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, that's right here. |
||
opted out of because it is not necessary for a particular use case. | ||
|
||
* A [previous PR][prev-pr] from @thestinger leveraged macros more heavily than | ||
this RFC and provided statically constructible Cell and RefCell equivalents | ||
via the usage of `transmute`. The implementation provided did not, however, | ||
include the scoped form of this RFC. | ||
|
||
[prev-pr]: https://github.com/rust-lang/rust/pull/17583 | ||
|
||
# Unresolved questions | ||
|
||
* Are the questions around destructors vague enough to warrant the `get` method | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the pure On platforms still lacking this support, it's still possible to make it memory safe but it does mean that accessing uninitialized TLS in destructors should be considered as a bug even if it's intended / would be sane with C++11 semantics because it may trigger memory leaks ( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For now the sample implementation does have a boolean for this and it doesn't ever flip it back to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's wrong for the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The number of branches in the fast path isn't unimportant. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would you be in favor of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, and with a single integer / enum tracking the state so there can be one branch for the fast path. Special handling of destruction would essentially be free in that case because it would only be an extra cost for initialization (which by definition only happens once). |
||
being `unsafe` on owning TLS? | ||
* Should the APIs favor `panic!`-ing internally, or exposing an `Option`? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also possible to use only
#[thread_local]
with no secondarypthread_key_create
/ synchronization by using C++11 destructor support. On Linux, glibc defines afn __cxa_thread_atexit_impl(dtor: unsafe extern "C" fn(ptr: *mut c_void), ptr: *mut c_void, dso_symbol: *mut i8)
wheredso_symbol
can just be retrived by definingstatic mut __dso_handle: i8
. On OS X, there's afn _tlv_atexit(dtor: unsafe extern "C" fn(ptr: *mut c_void), ptr: *mut c_void);
function. Rust could use the weak symbol trick to call these when available and fall back to a crappier implementation on top of dynamic TLS.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed! If you take a look at the sample implementation you'll see that it does precisely that!